479 Commits

Author SHA1 Message Date
Hyunsu Cho
10d3419fa6 Release 1.3.0 2020-12-03 21:35:09 -08:00
Philip Hyunsu Cho
3a83fcb0eb Enforce row-major order in cuPy array (#6459) 2020-12-03 21:29:24 -08:00
Jiaming Yuan
a2c778e2d1 Fix period in evaluation monitor. (#6441) 2020-12-03 21:28:45 -08:00
Hyunsu Cho
f3b060401a Release 1.3.0 RC1 2020-11-21 11:36:08 -08:00
Jiaming Yuan
2ce2a1a4d8
[SKL] Propagate parameters to booster during set_param. (#6416) 2020-11-20 20:37:35 +08:00
Jiaming Yuan
a7b42adb74
Fix dask predict (#6412) 2020-11-20 10:10:52 +08:00
Jiaming Yuan
3ac173fc8b
Fix typo. (#6399) 2020-11-16 16:59:12 -08:00
Nikhil Choudhary
ae1662028a
Fixed few grammatical mistakes in doc (#6393) 2020-11-15 13:48:08 +08:00
Jiaming Yuan
fcd6fad822
[dask] Small cleanup. (#6391) 2020-11-14 22:15:05 +08:00
Jiaming Yuan
4ccf92ea34
[dask] Fix union of workers. (#6375) 2020-11-13 16:55:05 +08:00
Jiaming Yuan
fcfeb4959c
Deprecate positional arguments. (#6365)
Deprecate positional arguments in following functions:

- `__init__` for all classes in sklearn module.
- `fit` method for all classes in sklearn module.
- dask interface.
- `set_info` for `DMatrix` class.

Refactor the evaluation matrices handling.
2020-11-13 11:10:30 +08:00
Jiaming Yuan
c90f968d92
Update Python documents. (#6376) 2020-11-12 17:51:32 +08:00
Jiaming Yuan
6e12c2a6f8
[dask] Supoort running on GKE. (#6343)
* Avoid accessing `scheduler_info()['workers']`.
* Avoid calling `client.gather` inside task.
* Avoid using `client.scheduler_address`.
2020-11-11 18:04:34 +08:00
Jiaming Yuan
e65e3cf36e
Support shared library in system path. (#6362) 2020-11-10 16:04:25 +08:00
Jiaming Yuan
184e2eac7d
Add period to evaluation monitor. (#6348) 2020-11-10 07:47:48 +08:00
Jiaming Yuan
2cc9662005
Support slicing tree model (#6302)
This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone.

* Implement the save_best option in early stopping.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-11-02 23:27:39 -08:00
Rory Mitchell
29745c6df2
Fix inclusive scan for large sizes (#6234) 2020-11-03 17:01:43 +13:00
Jiaming Yuan
7756192906
[dask] Fix prediction on DaskDMatrix with multiple meta data. (#6333)
* Unify the meta handling methods.
2020-11-02 19:18:44 -05:00
Jiaming Yuan
6ff331b705
Fix Python callback. (#6320) 2020-10-30 05:03:44 +08:00
Jiaming Yuan
74ea82209b
Lazy import dask libraries. (#6309)
* Lazy import dask libraries.

* Lint && fix.

* Use short name.
2020-10-28 15:50:11 -07:00
Jiaming Yuan
e8884c4637
Document tree method for feature weights. (#6312) 2020-10-28 13:42:13 -07:00
Jiaming Yuan
b180223d18
Cleanup RABIT. (#6290)
* Remove recovery and MPI speed tests.
* Remove readme.
* Remove Python binding.
* Add checks in C API.
2020-10-27 08:48:22 +08:00
Philip Hyunsu Cho
c8ec62103a
Deprecate LabelEncoder in XGBClassifier; Enable cuDF/cuPy inputs in XGBClassifier (#6269)
* Deprecate LabelEncoder in XGBClassifier; skip LabelEncoder for cuDF/cuPy inputs

* Add unit tests for cuDF and cuPy inputs with XGBClassifier

* Fix lint

* Clarify warning

* Move use_label_encoder option to XGBClassifier constructor

* Add a test for cudf.Series

* Add use_label_encoder to XGBRFClassifier doc

* Address reviewer feedback
2020-10-26 13:20:51 -07:00
Jiaming Yuan
d61b628bf5
Remove RABIT CMake targets. (#6275)
* Now it's built as part of libxgboost.
* Set correct C API error in RABIT initialization and finalization.
* Remove redundant message.
* Guard the tracker print C API.
2020-10-27 01:30:20 +08:00
Philip Hyunsu Cho
677f676172
Use UserWarning for old callback, as DeprecationWarning is not visible (#6270) 2020-10-22 01:10:52 -07:00
Jiaming Yuan
81c37c28d5
Time the CPU tests on Jenkins. (#6257)
* Time the CPU tests on Jenkins.
* Reduce thread contention.
* Add doc.
* Skip heavy tests on ARM.
2020-10-20 17:19:07 -07:00
Jiaming Yuan
52452bebb9
Fix cls typo. (#6247) 2020-10-16 16:40:44 +08:00
Jiaming Yuan
3da5a69dc9
Fix typo in dask interface. (#6240) 2020-10-15 15:26:29 +08:00
Jiaming Yuan
bed7ae4083
Loop over thrust::reduce. (#6229)
* Check input chunk size of dqdm.
* Add doc for current limitation.
2020-10-14 10:40:56 +13:00
Jiaming Yuan
b05073bda5
[dask] Test for data initializaton. (#6226) 2020-10-13 11:08:35 +08:00
Jiaming Yuan
2443275891
Cleanup Python code. (#6223)
* Remove pathlike as XGBoost 1.2 requires Python 3.6.
* Move conditional import of dask/distributed into dask module.
2020-10-12 15:44:41 +08:00
Jiaming Yuan
ab5b35134f
Rework Python callback functions. (#6199)
* Define a new callback interface for Python.
* Deprecate the old callbacks.
* Enable early stopping on dask.
2020-10-10 17:52:36 +08:00
Jiaming Yuan
70ce5216b5
Add high level tests for categorical data. (#6179)
* Fix unique.
2020-10-09 09:27:23 +08:00
Jiaming Yuan
7622b8cdb8
Enable categorical data support on Python DMatrix. (#6166)
* Only pandas is recognized.
2020-09-29 11:22:56 +08:00
Kyle Nicholson
e6a238c020
Update base margin dask (#6155)
* Add `base-margin`
* Add `output_margin` to regressor.

Co-authored-by: fis <jm.yuan@outlook.com>
2020-09-26 21:30:52 +08:00
Philip Hyunsu Cho
bd2b1eabd0
Add back support for scipy.sparse.coo_matrix (#6162) 2020-09-25 00:49:49 -07:00
Jiaming Yuan
78d72ef936
Add DaskDeviceQuantileDMatrix demo. (#6156) 2020-09-24 14:08:28 +08:00
Jiaming Yuan
33d80ffad0
[dask] Support more meta data on functional interface. (#6132)
* Add base_margin, label_(lower|upper)_bound.
* Test survival training with dask.
2020-09-21 16:56:37 +08:00
Jiaming Yuan
cc82ca167a
[dask] Refactor meta data handling. (#6130) 2020-09-18 13:26:40 +08:00
Rory Mitchell
47350f6acb
Allow kwargs in dask predict (#6117) 2020-09-15 13:04:03 +12:00
Boris Feld
24ca9348f7
Fix typo in xgboost.callback.early_stop docstring (#6071) 2020-09-06 13:37:07 +08:00
ShvetsKS
c1ca872d1e
Modin DF support (#6055)
* Modin DF support

* mode change

* tests were added, ci env was extended

* mode change

* Remove redundant installation of modin

* Add a pytest skip marker for modin

* Install Modin[ray] from PyPI

* fix interfering

* avoid extra conversion

* delete cv test for modin

* revert cv function

Co-authored-by: ShvetsKS <kirill.shvets@intel.com>
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-08-29 22:33:30 +03:00
Philip Hyunsu Cho
b3193052b3
Bump version to 1.3.0 snapshot in master (#6052) 2020-08-23 17:13:46 -07:00
Jiaming Yuan
a144daf034
Limit tree depth for GPU hist. (#6045) 2020-08-22 19:34:52 +08:00
Jiaming Yuan
7be2e04bd4
Fix scikit learn cls doc. (#6041) 2020-08-20 19:23:06 -07:00
ShvetsKS
24f2e6c97e
Optimize DMatrix build time. (#5877)
Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
2020-08-20 01:37:03 +08:00
Qi Zhang
989ddd036f
Swap byte-order in binary serializer to support big-endian arch (#5813)
* fixed some endian issues

* Use dmlc::ByteSwap() to simplify code

* Fix lint check

* [CI] Add test for s390x

* Download latest CMake on s390x

* Fix a bug in my code

* Save magic number in dmatrix with byteswap on big-endian machine

* Save version in binary with byteswap on big-endian machine

* Load scalar with byteswap in MetaInfo

* Add a debugging message

* Handle arrays correctly when byteswapping

* EOF can also be 255

* Handle magic number in MetaInfo carefully

* Skip Tree.Load test for big-endian, since the test manually builds little-endian binary model

* Handle missing packages in Python tests

* Don't use boto3 in model compatibility tests

* Add s390 Docker file for local testing

* Add model compatibility tests

* Add R compatibility test

* Revert "Add R compatibility test"

This reverts commit c2d2bdcb7dbae133cbb927fcd20f7e83ee2b18a8.

Co-authored-by: Qi Zhang <q.zhang@ibm.com>
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-08-18 14:47:17 -07:00
Jiaming Yuan
4d99c58a5f
Feature weights (#5962) 2020-08-18 19:55:41 +08:00
Philip Hyunsu Cho
9adb812a0a
RMM integration plugin (#5873)
* [CI] Add RMM as an optional dependency

* Replace caching allocator with pool allocator from RMM

* Revert "Replace caching allocator with pool allocator from RMM"

This reverts commit e15845d4e72e890c2babe31a988b26503a7d9038.

* Use rmm::mr::get_default_resource()

* Try setting default resource (doesn't work yet)

* Allocate pool_mr in the heap

* Prevent leaking pool_mr handle

* Separate EXPECT_DEATH() in separate test suite suffixed DeathTest

* Turn off death tests for RMM

* Address reviewer's feedback

* Prevent leaking of cuda_mr

* Fix Jenkinsfile syntax

* Remove unnecessary function in Jenkinsfile

* [CI] Install NCCL into RMM container

* Run Python tests

* Try building with RMM, CUDA 10.0

* Do not use RMM for CUDA 10.0 target

* Actually test for test_rmm flag

* Fix TestPythonGPU

* Use CNMeM allocator, since pool allocator doesn't yet support multiGPU

* Use 10.0 container to build RMM-enabled XGBoost

* Revert "Use 10.0 container to build RMM-enabled XGBoost"

This reverts commit 789021fa31112e25b683aef39fff375403060141.

* Fix Jenkinsfile

* [CI] Assign larger /dev/shm to NCCL

* Use 10.2 artifact to run multi-GPU Python tests

* Add CUDA 10.0 -> 11.0 cross-version test; remove CUDA 10.0 target

* Rename Conda env rmm_test -> gpu_test

* Use env var to opt into CNMeM pool for C++ tests

* Use identical CUDA version for RMM builds and tests

* Use Pytest fixtures to enable RMM pool in Python tests

* Move RMM to plugin/CMakeLists.txt; use PLUGIN_RMM

* Use per-device MR; use command arg in gtest

* Set CMake prefix path to use Conda env

* Use 0.15 nightly version of RMM

* Remove unnecessary header

* Fix a unit test when cudf is missing

* Add RMM demos

* Remove print()

* Use HostDeviceVector in GPU predictor

* Simplify pytest setup; use LocalCUDACluster fixture

* Address reviewers' commments

Co-authored-by: Hyunsu Cho <chohyu01@cs.wasshington.edu>
2020-08-12 01:26:02 -07:00
jameskrach
bd6b7f4aa7
[Breaking] Fix .predict() method and add .predict_proba() in xgboost.dask.DaskXGBClassifier (#5986) 2020-08-11 16:11:28 +08:00