Commit Graph

263 Commits

Author SHA1 Message Date
Igor Moura
d1254808d5 Clean up C++ warnings (#6213) 2020-10-19 23:02:33 +08:00
Jiaming Yuan
ddf37cca30 Unify thread configuration. (#6186) 2020-10-19 16:05:42 +08:00
Jiaming Yuan
bed7ae4083 Loop over thrust::reduce. (#6229)
* Check input chunk size of dqdm.
* Add doc for current limitation.
2020-10-14 10:40:56 +13:00
Rory Mitchell
734a911a26 Loop over copy_if (#6201)
* Loop over copy_if

* Catch OOM.

Co-authored-by: fis <jm.yuan@outlook.com>
2020-10-14 10:23:16 +13:00
Jiaming Yuan
2241563f23 Handle duplicated values in sketching. (#6178)
* Accumulate weights in duplicated values.
* Fix device id in iterative dmatrix.
2020-10-10 19:32:44 +08:00
Jiaming Yuan
14afdb4d92 Support categorical data in ellpack. (#6140) 2020-09-24 19:28:57 +08:00
Jiaming Yuan
210c131ce7 Support categorical data in GPU sketching. (#6137) 2020-09-21 13:53:06 +08:00
Jiaming Yuan
b5f52f0b1b Validate weights are positive values. (#6115) 2020-09-15 09:03:55 +08:00
Jiaming Yuan
80c8547147 Make binary bin search reusable. (#6058)
* Move binary search row to hist util.
* Remove dead code.
2020-08-26 05:05:11 +08:00
Jiaming Yuan
20c95be625 Expand categorical node. (#6028)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-08-25 18:53:57 +08:00
ShvetsKS
24f2e6c97e Optimize DMatrix build time. (#5877)
Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
2020-08-20 01:37:03 +08:00
Qi Zhang
989ddd036f Swap byte-order in binary serializer to support big-endian arch (#5813)
* fixed some endian issues

* Use dmlc::ByteSwap() to simplify code

* Fix lint check

* [CI] Add test for s390x

* Download latest CMake on s390x

* Fix a bug in my code

* Save magic number in dmatrix with byteswap on big-endian machine

* Save version in binary with byteswap on big-endian machine

* Load scalar with byteswap in MetaInfo

* Add a debugging message

* Handle arrays correctly when byteswapping

* EOF can also be 255

* Handle magic number in MetaInfo carefully

* Skip Tree.Load test for big-endian, since the test manually builds little-endian binary model

* Handle missing packages in Python tests

* Don't use boto3 in model compatibility tests

* Add s390 Docker file for local testing

* Add model compatibility tests

* Add R compatibility test

* Revert "Add R compatibility test"

This reverts commit c2d2bdcb7dbae133cbb927fcd20f7e83ee2b18a8.

Co-authored-by: Qi Zhang <q.zhang@ibm.com>
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-08-18 14:47:17 -07:00
Jiaming Yuan
4d99c58a5f Feature weights (#5962) 2020-08-18 19:55:41 +08:00
Philip Hyunsu Cho
487ab0ce73 [BLOCKING] Handle empty rows in data iterators correctly (#5929)
* [jvm-packages] Handle empty rows in data iterators correctly

* Fix clang-tidy error

* last empty row

* Add comments [skip ci]

Co-authored-by: Nan Zhu <nanzhu@uber.com>
2020-07-25 13:46:19 -07:00
Philip Hyunsu Cho
4af857f95d Add explicit template specialization for portability (#5921)
* Add explicit template specializations

* Adding Specialization for FileAdapterBatch
2020-07-22 12:31:17 -07:00
Jiaming Yuan
7c2686146e Dask device dmatrix (#5901)
* Fix softprob with empty dmatrix.
2020-07-17 13:17:43 +08:00
Jiaming Yuan
029a8b533f Simplify the data backends. (#5893) 2020-07-16 15:17:31 +08:00
Jiaming Yuan
dd445af56e Cleanup on device sketch. (#5874)
* Remove old functions.

* Merge weighted and un-weighted into a common interface.
2020-07-14 10:15:54 +08:00
Jiaming Yuan
a3ec964346 Accept iterator in device dmatrix. (#5783)
* Remove Device DMatrix.
2020-07-07 21:44:48 +08:00
Jiaming Yuan
048d969be4 Implement GK sketching on GPU. (#5846)
* Implement GK sketching on GPU.
* Strong tests on quantile building.
* Handle sparse dataset by binary searching the column index.
* Hypothesis test on dask.
2020-07-07 12:16:21 +08:00
Jiaming Yuan
93c44a9a64 Move feature names and types of DMatrix from Python to C++. (#5858)
* Add thread local return entry for DMatrix.
* Save feature name and feature type in binary file.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-07-07 09:40:13 +08:00
Jiaming Yuan
1a0801238e Implement iterative DMatrix. (#5837) 2020-07-03 11:44:52 +08:00
Jiaming Yuan
90a9c68874 Implement a DMatrix Proxy. (#5803) 2020-06-29 15:03:10 +08:00
Jiaming Yuan
47c89775d6 Accept string for ArrayInterface constructor. (#5799) 2020-06-27 00:06:54 +08:00
Jiaming Yuan
c4d721200a Implement extend method for meta info. (#5800)
* Implement extend for host device vector.
2020-06-20 03:32:03 +08:00
fis
7c3a168ffd Revert "Accept string for ArrayInterface constructor."
This reverts commit e8ecafb8dc.
2020-06-16 20:02:35 +08:00
fis
e8ecafb8dc Accept string for ArrayInterface constructor. 2020-06-16 20:00:24 +08:00
Rory Mitchell
b47b5ac771 Use hypothesis (#5759)
* Use hypothesis

* Allow int64 array interface for groups

* Add packages to Windows CI

* Add to travis

* Make sure device index is set correctly

* Fix dask-cudf test

* appveyor
2020-06-16 12:45:59 +12:00
Jiaming Yuan
306e38ff31 Avoid including c_api.h in header files. (#5782) 2020-06-12 16:24:24 +08:00
Jiaming Yuan
3028fa6b42 Implement weighted sketching for adapter. (#5760)
* Bounded memory tests.
* Fixed memory estimation.
2020-06-12 06:20:39 +08:00
Philip Hyunsu Cho
1d22a9be1c Revert "Reorder includes. (#5749)" (#5771)
This reverts commit d3a0efbf16.
2020-06-09 10:29:28 -07:00
Jiaming Yuan
cacff9232a Remove column major specialization. (#5755)
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-06-05 16:19:14 +08:00
Jiaming Yuan
d3a0efbf16 Reorder includes. (#5749)
* Reorder includes.

* R.
2020-06-03 17:30:47 +12:00
Jiaming Yuan
e533908922 Expose device sketching in header. (#5747) 2020-06-02 13:02:53 +08:00
Jiaming Yuan
d19cec70f1 Don't use mask in array interface. (#5730) 2020-06-01 12:17:24 +08:00
Jiaming Yuan
8438c7d0e4 Fix IsDense. (#5702) 2020-05-26 08:24:37 +08:00
Jiaming Yuan
1ba24a7597 Remove redundant sketching. (#5700) 2020-05-24 08:47:20 +08:00
Jiaming Yuan
eaf2a00b5c Enhance nvtx support. (#5636) 2020-05-06 22:54:24 +08:00
Jiaming Yuan
67d267f9da Move device dmatrix construction code into ellpack. (#5623) 2020-05-06 19:43:59 +08:00
Philip Hyunsu Cho
8de7f1928e Fix build on big endian CPUs (#5617)
* Fix build on big endian CPUs

* Clang-tidy
2020-04-29 21:56:34 -07:00
Jiaming Yuan
e726dd9902 Set device in device dmatrix. (#5596) 2020-04-25 13:42:53 +08:00
Jiaming Yuan
29a4cfe400 Group aware GPU sketching. (#5551)
* Group aware GPU weighted sketching.

* Distribute group weights to each data point.
* Relax the test.
* Validate input meta info.
* Fix metainfo copy ctor.
2020-04-20 17:18:52 +08:00
Jiaming Yuan
e1f22baf8c Fix slice and get info. (#5552) 2020-04-18 18:00:13 +08:00
Rory Mitchell
e268fb0093 Use thrust functions instead of custom functions (#5544) 2020-04-16 21:41:16 +12:00
Jiaming Yuan
6671b42dd4 Use ellpack for prediction only when sparsepage doesn't exist. (#5504) 2020-04-10 12:15:46 +08:00
Bobby Wang
ad826e913f [jvm-packages]add feature size for LabelPoint and DataBatch (#5303)
* fix type error

* Validate number of features.

* resolve comments

* add feature size for LabelPoint and DataBatch

* pass the feature size to native

* move feature size validating tests into a separate suite

* resolve comments

Co-authored-by: fis <jm.yuan@outlook.com>
2020-04-07 16:49:52 -07:00
Jiaming Yuan
0012f2ef93 Upgrade clang-tidy on CI. (#5469)
* Correct all clang-tidy errors.
* Upgrade clang-tidy to 10 on CI.

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-04-05 04:42:29 +08:00
Jiaming Yuan
29c6ad943a Prevent copying SimpleDMatrix. (#5453)
* Set default dtor for SimpleDMatrix to initialize default copy ctor, which is
deleted due to unique ptr.

* Remove commented code.
* Remove warning for calling host function (std::max).
* Remove warning for initialization order.
* Remove warning for unused variables.
2020-04-02 07:01:49 +08:00
Jiaming Yuan
6601a641d7 Thread safe, inplace prediction. (#5389)
Normal prediction with DMatrix is now thread safe with locks.  Added inplace prediction is lock free thread safe.

When data is on device (cupy, cudf), the returned data is also on device.

* Implementation for numpy, csr, cudf and cupy.

* Implementation for dask.

* Remove sync in simple dmatrix.
2020-03-30 15:35:28 +08:00
Rory Mitchell
13b10a6370 Device dmatrix (#5420) 2020-03-28 14:42:21 +13:00