Rory Mitchell
29745c6df2
Fix inclusive scan for large sizes ( #6234 )
2020-11-03 17:01:43 +13:00
Igor Moura
5e1e972aea
Clean up warnings ( #6325 )
2020-10-30 23:50:29 +08:00
Jiaming Yuan
b180223d18
Cleanup RABIT. ( #6290 )
...
* Remove recovery and MPI speed tests.
* Remove readme.
* Remove Python binding.
* Add checks in C API.
2020-10-27 08:48:22 +08:00
Igor Moura
d1254808d5
Clean up C++ warnings ( #6213 )
2020-10-19 23:02:33 +08:00
Jiaming Yuan
ddf37cca30
Unify thread configuration. ( #6186 )
2020-10-19 16:05:42 +08:00
Jiaming Yuan
bed7ae4083
Loop over thrust::reduce. ( #6229 )
...
* Check input chunk size of dqdm.
* Add doc for current limitation.
2020-10-14 10:40:56 +13:00
Rory Mitchell
734a911a26
Loop over copy_if ( #6201 )
...
* Loop over copy_if
* Catch OOM.
Co-authored-by: fis <jm.yuan@outlook.com>
2020-10-14 10:23:16 +13:00
Jiaming Yuan
2241563f23
Handle duplicated values in sketching. ( #6178 )
...
* Accumulate weights in duplicated values.
* Fix device id in iterative dmatrix.
2020-10-10 19:32:44 +08:00
Jiaming Yuan
14afdb4d92
Support categorical data in ellpack. ( #6140 )
2020-09-24 19:28:57 +08:00
Jiaming Yuan
210c131ce7
Support categorical data in GPU sketching. ( #6137 )
2020-09-21 13:53:06 +08:00
Jiaming Yuan
b5f52f0b1b
Validate weights are positive values. ( #6115 )
2020-09-15 09:03:55 +08:00
Jiaming Yuan
80c8547147
Make binary bin search reusable. ( #6058 )
...
* Move binary search row to hist util.
* Remove dead code.
2020-08-26 05:05:11 +08:00
Jiaming Yuan
20c95be625
Expand categorical node. ( #6028 )
...
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-08-25 18:53:57 +08:00
ShvetsKS
24f2e6c97e
Optimize DMatrix build time. ( #5877 )
...
Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
2020-08-20 01:37:03 +08:00
Qi Zhang
989ddd036f
Swap byte-order in binary serializer to support big-endian arch ( #5813 )
...
* fixed some endian issues
* Use dmlc::ByteSwap() to simplify code
* Fix lint check
* [CI] Add test for s390x
* Download latest CMake on s390x
* Fix a bug in my code
* Save magic number in dmatrix with byteswap on big-endian machine
* Save version in binary with byteswap on big-endian machine
* Load scalar with byteswap in MetaInfo
* Add a debugging message
* Handle arrays correctly when byteswapping
* EOF can also be 255
* Handle magic number in MetaInfo carefully
* Skip Tree.Load test for big-endian, since the test manually builds little-endian binary model
* Handle missing packages in Python tests
* Don't use boto3 in model compatibility tests
* Add s390 Docker file for local testing
* Add model compatibility tests
* Add R compatibility test
* Revert "Add R compatibility test"
This reverts commit c2d2bdcb7dbae133cbb927fcd20f7e83ee2b18a8.
Co-authored-by: Qi Zhang <q.zhang@ibm.com>
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-08-18 14:47:17 -07:00
Jiaming Yuan
4d99c58a5f
Feature weights ( #5962 )
2020-08-18 19:55:41 +08:00
Philip Hyunsu Cho
487ab0ce73
[BLOCKING] Handle empty rows in data iterators correctly ( #5929 )
...
* [jvm-packages] Handle empty rows in data iterators correctly
* Fix clang-tidy error
* last empty row
* Add comments [skip ci]
Co-authored-by: Nan Zhu <nanzhu@uber.com>
2020-07-25 13:46:19 -07:00
Philip Hyunsu Cho
4af857f95d
Add explicit template specialization for portability ( #5921 )
...
* Add explicit template specializations
* Adding Specialization for FileAdapterBatch
2020-07-22 12:31:17 -07:00
Jiaming Yuan
7c2686146e
Dask device dmatrix ( #5901 )
...
* Fix softprob with empty dmatrix.
2020-07-17 13:17:43 +08:00
Jiaming Yuan
029a8b533f
Simplify the data backends. ( #5893 )
2020-07-16 15:17:31 +08:00
Jiaming Yuan
dd445af56e
Cleanup on device sketch. ( #5874 )
...
* Remove old functions.
* Merge weighted and un-weighted into a common interface.
2020-07-14 10:15:54 +08:00
Jiaming Yuan
a3ec964346
Accept iterator in device dmatrix. ( #5783 )
...
* Remove Device DMatrix.
2020-07-07 21:44:48 +08:00
Jiaming Yuan
048d969be4
Implement GK sketching on GPU. ( #5846 )
...
* Implement GK sketching on GPU.
* Strong tests on quantile building.
* Handle sparse dataset by binary searching the column index.
* Hypothesis test on dask.
2020-07-07 12:16:21 +08:00
Jiaming Yuan
93c44a9a64
Move feature names and types of DMatrix from Python to C++. ( #5858 )
...
* Add thread local return entry for DMatrix.
* Save feature name and feature type in binary file.
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-07-07 09:40:13 +08:00
Jiaming Yuan
1a0801238e
Implement iterative DMatrix. ( #5837 )
2020-07-03 11:44:52 +08:00
Jiaming Yuan
90a9c68874
Implement a DMatrix Proxy. ( #5803 )
2020-06-29 15:03:10 +08:00
Jiaming Yuan
47c89775d6
Accept string for ArrayInterface constructor. ( #5799 )
2020-06-27 00:06:54 +08:00
Jiaming Yuan
c4d721200a
Implement extend method for meta info. ( #5800 )
...
* Implement extend for host device vector.
2020-06-20 03:32:03 +08:00
fis
7c3a168ffd
Revert "Accept string for ArrayInterface constructor."
...
This reverts commit e8ecafb8dc628f45b75b4c2844a236d27e0a6d98.
2020-06-16 20:02:35 +08:00
fis
e8ecafb8dc
Accept string for ArrayInterface constructor.
2020-06-16 20:00:24 +08:00
Rory Mitchell
b47b5ac771
Use hypothesis ( #5759 )
...
* Use hypothesis
* Allow int64 array interface for groups
* Add packages to Windows CI
* Add to travis
* Make sure device index is set correctly
* Fix dask-cudf test
* appveyor
2020-06-16 12:45:59 +12:00
Jiaming Yuan
306e38ff31
Avoid including c_api.h in header files. ( #5782 )
2020-06-12 16:24:24 +08:00
Jiaming Yuan
3028fa6b42
Implement weighted sketching for adapter. ( #5760 )
...
* Bounded memory tests.
* Fixed memory estimation.
2020-06-12 06:20:39 +08:00
Philip Hyunsu Cho
1d22a9be1c
Revert "Reorder includes. ( #5749 )" ( #5771 )
...
This reverts commit d3a0efbf162f3dceaaf684109e1178c150b32de3.
2020-06-09 10:29:28 -07:00
Jiaming Yuan
cacff9232a
Remove column major specialization. ( #5755 )
...
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-06-05 16:19:14 +08:00
Jiaming Yuan
d3a0efbf16
Reorder includes. ( #5749 )
...
* Reorder includes.
* R.
2020-06-03 17:30:47 +12:00
Jiaming Yuan
e533908922
Expose device sketching in header. ( #5747 )
2020-06-02 13:02:53 +08:00
Jiaming Yuan
d19cec70f1
Don't use mask in array interface. ( #5730 )
2020-06-01 12:17:24 +08:00
Jiaming Yuan
8438c7d0e4
Fix IsDense. ( #5702 )
2020-05-26 08:24:37 +08:00
Jiaming Yuan
1ba24a7597
Remove redundant sketching. ( #5700 )
2020-05-24 08:47:20 +08:00
Jiaming Yuan
eaf2a00b5c
Enhance nvtx support. ( #5636 )
2020-05-06 22:54:24 +08:00
Jiaming Yuan
67d267f9da
Move device dmatrix construction code into ellpack. ( #5623 )
2020-05-06 19:43:59 +08:00
Philip Hyunsu Cho
8de7f1928e
Fix build on big endian CPUs ( #5617 )
...
* Fix build on big endian CPUs
* Clang-tidy
2020-04-29 21:56:34 -07:00
Jiaming Yuan
e726dd9902
Set device in device dmatrix. ( #5596 )
2020-04-25 13:42:53 +08:00
Jiaming Yuan
29a4cfe400
Group aware GPU sketching. ( #5551 )
...
* Group aware GPU weighted sketching.
* Distribute group weights to each data point.
* Relax the test.
* Validate input meta info.
* Fix metainfo copy ctor.
2020-04-20 17:18:52 +08:00
Jiaming Yuan
e1f22baf8c
Fix slice and get info. ( #5552 )
2020-04-18 18:00:13 +08:00
Rory Mitchell
e268fb0093
Use thrust functions instead of custom functions ( #5544 )
2020-04-16 21:41:16 +12:00
Jiaming Yuan
6671b42dd4
Use ellpack for prediction only when sparsepage doesn't exist. ( #5504 )
2020-04-10 12:15:46 +08:00
Bobby Wang
ad826e913f
[jvm-packages]add feature size for LabelPoint and DataBatch ( #5303 )
...
* fix type error
* Validate number of features.
* resolve comments
* add feature size for LabelPoint and DataBatch
* pass the feature size to native
* move feature size validating tests into a separate suite
* resolve comments
Co-authored-by: fis <jm.yuan@outlook.com>
2020-04-07 16:49:52 -07:00
Jiaming Yuan
0012f2ef93
Upgrade clang-tidy on CI. ( #5469 )
...
* Correct all clang-tidy errors.
* Upgrade clang-tidy to 10 on CI.
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-04-05 04:42:29 +08:00