Jiaming Yuan
bd1f3a38f0
Rewrite sparse dmatrix using callbacks. ( #7092 )
...
- Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves.
- Remove use of threaded iterator and IO queue.
- Remove `page_size`.
- Make sure the number of pages in memory is bounded.
- Make sure the cache can not be violated.
- Provide an interface for internal algorithms to process data asynchronously.
2021-07-16 12:33:31 +08:00
Jiaming Yuan
84d359efb8
Support host data in proxy DMatrix. ( #7087 )
2021-07-08 11:35:48 +08:00
Jiaming Yuan
c766f143ab
Refactor external memory formats. ( #7089 )
...
* Save base_rowid.
* Return write size.
* Remove unused function.
2021-07-08 04:04:51 +08:00
Jiaming Yuan
1c8fdf2218
Remove use of device_idx in dh::LaunchN. ( #7063 )
...
It's an unused parameter, removing it can make the CI log more readable.
2021-06-29 11:37:26 +08:00
Jiaming Yuan
7968c0d051
Test on s390x. ( #7038 )
...
* Fix && remove unused parameter.
2021-06-18 14:55:08 +08:00
Jiaming Yuan
86715e4cd4
Support categorical data for dask functional interface and DQM. ( #7043 )
...
* Support categorical data for dask functional interface and DQM.
* Implement categorical data support for GPU GK-merge.
* Add support for dask functional interface.
* Add support for DQM.
* Get newer cupy.
2021-06-18 13:06:52 +08:00
Jiaming Yuan
794fd6a46b
Support v3 cuda array interface. ( #6776 )
2021-03-25 09:58:09 +08:00
Jiaming Yuan
4ee8340e79
Support column major array. ( #6765 )
2021-03-20 05:19:46 +08:00
Jiaming Yuan
f20074e826
Check for invalid data. ( #6742 )
2021-03-04 14:37:20 +08:00
Jiaming Yuan
1e949110da
Use generic dispatching routine for array interface. ( #6672 )
2021-02-05 09:23:38 +08:00
Jiaming Yuan
f2f7dd87b8
Use view for SparsePage exclusively. ( #6590 )
2021-01-11 18:04:55 +08:00
Jiaming Yuan
80065d571e
[dask] Add DaskXGBRanker ( #6576 )
...
* Initial support for distributed LTR using dask.
* Support `qid` in libxgboost.
* Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]`
to avoid duplicated code.
* Define `DaskXGBRanker`.
The dask ranker doesn't support group structure, instead it uses query id and
convert to group ptr internally.
2021-01-08 18:35:09 +08:00
Jiaming Yuan
c120822a24
Fix flaky sparse page dmatrix test. ( #6417 )
2020-11-20 19:15:45 +08:00
Jiaming Yuan
43efadea2e
Deterministic data partitioning for external memory ( #6317 )
...
* Make external memory data partitioning deterministic.
* Change the meaning of `page_size` from bytes to number of rows.
* Design a data pool.
* Note for external memory.
* Enable unity build on Windows CI.
* Force garbage collect on test.
2020-11-11 06:11:06 +08:00
Jiaming Yuan
bed7ae4083
Loop over thrust::reduce. ( #6229 )
...
* Check input chunk size of dqdm.
* Add doc for current limitation.
2020-10-14 10:40:56 +13:00
Jiaming Yuan
14afdb4d92
Support categorical data in ellpack. ( #6140 )
2020-09-24 19:28:57 +08:00
Philip Hyunsu Cho
487ab0ce73
[BLOCKING] Handle empty rows in data iterators correctly ( #5929 )
...
* [jvm-packages] Handle empty rows in data iterators correctly
* Fix clang-tidy error
* last empty row
* Add comments [skip ci]
Co-authored-by: Nan Zhu <nanzhu@uber.com>
2020-07-25 13:46:19 -07:00
Jiaming Yuan
a3ec964346
Accept iterator in device dmatrix. ( #5783 )
...
* Remove Device DMatrix.
2020-07-07 21:44:48 +08:00
Jiaming Yuan
93c44a9a64
Move feature names and types of DMatrix from Python to C++. ( #5858 )
...
* Add thread local return entry for DMatrix.
* Save feature name and feature type in binary file.
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-07-07 09:40:13 +08:00
Jiaming Yuan
1a0801238e
Implement iterative DMatrix. ( #5837 )
2020-07-03 11:44:52 +08:00
Jiaming Yuan
90a9c68874
Implement a DMatrix Proxy. ( #5803 )
2020-06-29 15:03:10 +08:00
Jiaming Yuan
47c89775d6
Accept string for ArrayInterface constructor. ( #5799 )
2020-06-27 00:06:54 +08:00
Jiaming Yuan
c4d721200a
Implement extend method for meta info. ( #5800 )
...
* Implement extend for host device vector.
2020-06-20 03:32:03 +08:00
Jiaming Yuan
38ee514787
Implement fast number serialization routines. ( #5772 )
...
* Implement ryu algorithm.
* Implement integer printing.
* Full coverage roundtrip test.
2020-06-17 12:39:23 +08:00
fis
7c3a168ffd
Revert "Accept string for ArrayInterface constructor."
...
This reverts commit e8ecafb8dc628f45b75b4c2844a236d27e0a6d98.
2020-06-16 20:02:35 +08:00
fis
e8ecafb8dc
Accept string for ArrayInterface constructor.
2020-06-16 20:00:24 +08:00
Rory Mitchell
b47b5ac771
Use hypothesis ( #5759 )
...
* Use hypothesis
* Allow int64 array interface for groups
* Add packages to Windows CI
* Add to travis
* Make sure device index is set correctly
* Fix dask-cudf test
* appveyor
2020-06-16 12:45:59 +12:00
Jiaming Yuan
306e38ff31
Avoid including c_api.h in header files. ( #5782 )
2020-06-12 16:24:24 +08:00
Jiaming Yuan
cacff9232a
Remove column major specialization. ( #5755 )
...
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-06-05 16:19:14 +08:00
Jiaming Yuan
8438c7d0e4
Fix IsDense. ( #5702 )
2020-05-26 08:24:37 +08:00
Jiaming Yuan
eaf2a00b5c
Enhance nvtx support. ( #5636 )
2020-05-06 22:54:24 +08:00
Jiaming Yuan
e726dd9902
Set device in device dmatrix. ( #5596 )
2020-04-25 13:42:53 +08:00
Jiaming Yuan
29a4cfe400
Group aware GPU sketching. ( #5551 )
...
* Group aware GPU weighted sketching.
* Distribute group weights to each data point.
* Relax the test.
* Validate input meta info.
* Fix metainfo copy ctor.
2020-04-20 17:18:52 +08:00
Jiaming Yuan
e1f22baf8c
Fix slice and get info. ( #5552 )
2020-04-18 18:00:13 +08:00
Rory Mitchell
ca4e05660e
Purge device_helpers.cuh ( #5534 )
...
* Simplifications with caching_device_vector
* Purge device helpers
2020-04-15 21:51:56 +12:00
Jiaming Yuan
6671b42dd4
Use ellpack for prediction only when sparsepage doesn't exist. ( #5504 )
2020-04-10 12:15:46 +08:00
Jiaming Yuan
0012f2ef93
Upgrade clang-tidy on CI. ( #5469 )
...
* Correct all clang-tidy errors.
* Upgrade clang-tidy to 10 on CI.
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-04-05 04:42:29 +08:00
Jiaming Yuan
459b175dc6
Split up test helpers header. ( #5455 )
2020-04-03 10:36:53 +08:00
Jiaming Yuan
29c6ad943a
Prevent copying SimpleDMatrix. ( #5453 )
...
* Set default dtor for SimpleDMatrix to initialize default copy ctor, which is
deleted due to unique ptr.
* Remove commented code.
* Remove warning for calling host function (std::max).
* Remove warning for initialization order.
* Remove warning for unused variables.
2020-04-02 07:01:49 +08:00
Rory Mitchell
13b10a6370
Device dmatrix ( #5420 )
2020-03-28 14:42:21 +13:00
Jiaming Yuan
4942da64ae
Refactor tests with data generator. ( #5439 )
2020-03-27 06:44:44 +08:00
Rory Mitchell
b745b7acce
Fix memory usage of device sketching ( #5407 )
2020-03-14 13:43:24 +13:00
Rory Mitchell
3ad4333b0e
Partial rewrite EllpackPage ( #5352 )
2020-03-11 10:15:53 +13:00
Rory Mitchell
a38e7bd19c
Sketching from adapters ( #5365 )
...
* Sketching from adapters
* Add weights test
2020-03-07 21:07:58 +13:00
Jiaming Yuan
f2b8cd2922
Add number of columns to native data iterator. ( #5202 )
...
* Change native data iter into an adapter.
2020-02-25 23:42:01 +08:00
Rory Mitchell
b0ed3f0a66
Remove unnecessary DMatrix methods ( #5324 )
2020-02-25 12:40:39 +13:00
Jiaming Yuan
655cf17b60
Predict on Ellpack. ( #5327 )
...
* Unify GPU prediction node.
* Add `PageExists`.
* Dispatch prediction on input data for GPU Predictor.
2020-02-23 06:27:03 +08:00
Rory Mitchell
bc96ceb8b2
Refactor SparsePageSource, delete cache files after use ( #5321 )
...
* Refactor sparse page source
* Delete temporary cache files
* Log fatal if cache exists
* Log fatal if multiple threads used with prefetcher
2020-02-19 16:43:41 +13:00
Rory Mitchell
b2b2c4e231
Remove SimpleCSRSource ( #5315 )
2020-02-18 16:49:17 +13:00
Rong Ou
e4b74c4d22
Gradient based sampling for GPU Hist ( #5093 )
...
* Implement gradient based sampling for GPU Hist tree method.
* Add samplers and handle compacted page in GPU Hist.
2020-02-04 10:31:27 +08:00