550 Commits

Author SHA1 Message Date
Jiaming Yuan
7622b8cdb8
Enable categorical data support on Python DMatrix. (#6166)
* Only pandas is recognized.
2020-09-29 11:22:56 +08:00
Rory Mitchell
dda9e1e487
Update GPUTreeshap (#6163)
* Reduce shap test duration

* Test interoperability with shap package

* Add feature interactions

* Update GPUTreeShap
2020-09-28 09:43:47 +13:00
Kyle Nicholson
e6a238c020
Update base margin dask (#6155)
* Add `base-margin`
* Add `output_margin` to regressor.

Co-authored-by: fis <jm.yuan@outlook.com>
2020-09-26 21:30:52 +08:00
Philip Hyunsu Cho
bd2b1eabd0
Add back support for scipy.sparse.coo_matrix (#6162) 2020-09-25 00:49:49 -07:00
Jiaming Yuan
33d80ffad0
[dask] Support more meta data on functional interface. (#6132)
* Add base_margin, label_(lower|upper)_bound.
* Test survival training with dask.
2020-09-21 16:56:37 +08:00
Rory Mitchell
47350f6acb
Allow kwargs in dask predict (#6117) 2020-09-15 13:04:03 +12:00
Jiaming Yuan
b5f52f0b1b
Validate weights are positive values. (#6115) 2020-09-15 09:03:55 +08:00
ShvetsKS
c1ca872d1e
Modin DF support (#6055)
* Modin DF support

* mode change

* tests were added, ci env was extended

* mode change

* Remove redundant installation of modin

* Add a pytest skip marker for modin

* Install Modin[ray] from PyPI

* fix interfering

* avoid extra conversion

* delete cv test for modin

* revert cv function

Co-authored-by: ShvetsKS <kirill.shvets@intel.com>
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-08-29 22:33:30 +03:00
Jiaming Yuan
2fcc4f2886
Unify evaluation functions. (#6037) 2020-08-26 14:23:27 +08:00
Rory Mitchell
9a4e8b1d81
GPUTreeShap (#6038) 2020-08-25 12:47:41 +12:00
Philip Hyunsu Cho
cfced58c1c
[CI] Port CI fixes from the 1.2.0 branch (#6050)
* Fix a unit test on CLI, to handle RC versions

* [CI] Use mgpu machine to run gpu hist unit tests

* [CI] Build GPU-enabled JAR artifact and deploy to xgboost-maven-repo
2020-08-22 23:24:46 -07:00
Jiaming Yuan
b9ebbffc57
Fix plotting test. (#6040)
Previously the test loads a model generated by `test_basic.py`, now we generate
the model explicitly.

* Cleanup saved files for basic tests.
2020-08-22 13:18:48 +08:00
Jiaming Yuan
29b7fea572
Optimize cpu sketch allreduce for sparse data. (#6009)
* Bypass RABIT serialization reducer and use custom allgather based merging.
2020-08-19 10:03:45 +08:00
Jiaming Yuan
90355b4f00
Make JSON the default full serialization format. (#6027) 2020-08-19 09:57:43 +08:00
Qi Zhang
989ddd036f
Swap byte-order in binary serializer to support big-endian arch (#5813)
* fixed some endian issues

* Use dmlc::ByteSwap() to simplify code

* Fix lint check

* [CI] Add test for s390x

* Download latest CMake on s390x

* Fix a bug in my code

* Save magic number in dmatrix with byteswap on big-endian machine

* Save version in binary with byteswap on big-endian machine

* Load scalar with byteswap in MetaInfo

* Add a debugging message

* Handle arrays correctly when byteswapping

* EOF can also be 255

* Handle magic number in MetaInfo carefully

* Skip Tree.Load test for big-endian, since the test manually builds little-endian binary model

* Handle missing packages in Python tests

* Don't use boto3 in model compatibility tests

* Add s390 Docker file for local testing

* Add model compatibility tests

* Add R compatibility test

* Revert "Add R compatibility test"

This reverts commit c2d2bdcb7dbae133cbb927fcd20f7e83ee2b18a8.

Co-authored-by: Qi Zhang <q.zhang@ibm.com>
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-08-18 14:47:17 -07:00
Jiaming Yuan
4d99c58a5f
Feature weights (#5962) 2020-08-18 19:55:41 +08:00
Jiaming Yuan
ee70a2380b
Unify CPU hist sketching (#5880) 2020-08-12 01:33:06 +08:00
jameskrach
bd6b7f4aa7
[Breaking] Fix .predict() method and add .predict_proba() in xgboost.dask.DaskXGBClassifier (#5986) 2020-08-11 16:11:28 +08:00
Jiaming Yuan
801e6b6800
Fix dask predict shape infer. (#5989) 2020-08-08 14:29:22 +08:00
Jiaming Yuan
dde9c5aaff
Fix missing data warning. (#5969)
* Fix data warning.

* Add numpy/scipy test.
2020-08-05 16:19:12 +08:00
Jiaming Yuan
8599f87597
Update JSON schema. (#5982)
* Update JSON schema for pseudo huber.
* Update JSON model schema.
2020-08-05 15:21:11 +08:00
Jiaming Yuan
9c93531709
Update Python custom objective demo. (#5981) 2020-08-05 12:27:19 +08:00
Jiaming Yuan
fa3715f584
[Dask] Asyncio support. (#5862) 2020-07-30 06:23:58 +08:00
Jiaming Yuan
f5fdcbe194
Disable feature validation on sklearn predict prob. (#5953)
* Fix issue when scikit learn interface receives transformed inputs.
2020-07-29 19:26:44 +08:00
Jiaming Yuan
18349a7ccf
[Breaking] Fix custom metric for multi output. (#5954)
* Set output margin to true for custom metric.  This fixes only R and Python.
2020-07-29 19:25:27 +08:00
Jiaming Yuan
75b8c22b0b
Fix prediction heuristic (#5955)
* Relax check for prediction.
* Relax test in spark test.
* Add tests in C++.
2020-07-29 19:24:07 +08:00
Philip Hyunsu Cho
12110c900e
[CI] Make Python model compatibility test runnable locally (#5941) 2020-07-25 16:58:02 -07:00
Jiaming Yuan
bc1d3ee230
Fix r early stop with custom objective. (#5923)
* Specify `ntreelimit`.
2020-07-23 03:28:17 +08:00
Jiaming Yuan
66cc1e02aa
Setup github action. (#5917) 2020-07-22 15:05:25 +08:00
Philip Hyunsu Cho
ac9136ee49
Further improvements and savings in Jenkins pipeline (#5904)
* Publish artifacts only on the master and release branches

* Build CUDA only for Compute Capability 7.5 when building PRs

* Run all Windows jobs in a single worker image

* Build nightly XGBoost4J SNAPSHOT JARs with Scala 2.12 only

* Show skipped Python tests on Windows

* Make Graphviz optional for Python tests

* Add back C++ tests

* Unstash xgboost_cpp_tests

* Fix label to CUDA 10.1

* Install cuPy for CUDA 10.1

* Install jsonschema

* Address reviewer's feedback
2020-07-18 03:30:40 -07:00
Jiaming Yuan
7c2686146e
Dask device dmatrix (#5901)
* Fix softprob with empty dmatrix.
2020-07-17 13:17:43 +08:00
Jiaming Yuan
029a8b533f
Simplify the data backends. (#5893) 2020-07-16 15:17:31 +08:00
Jiaming Yuan
048d969be4
Implement GK sketching on GPU. (#5846)
* Implement GK sketching on GPU.
* Strong tests on quantile building.
* Handle sparse dataset by binary searching the column index.
* Hypothesis test on dask.
2020-07-07 12:16:21 +08:00
Jiaming Yuan
93c44a9a64
Move feature names and types of DMatrix from Python to C++. (#5858)
* Add thread local return entry for DMatrix.
* Save feature name and feature type in binary file.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-07-07 09:40:13 +08:00
Jiaming Yuan
4d277d750d
Relax linear test. (#5849)
* Increased error in coordinate is mostly due to floating point error.
* Shotgun uses Hogwild!, which is non-deterministic and can have even greater
floating point error.
2020-07-03 07:49:53 +08:00
Jiaming Yuan
eb067c1c34
Relax test for shotgun. (#5835) 2020-07-01 19:20:29 +08:00
Rory Mitchell
b47b5ac771
Use hypothesis (#5759)
* Use hypothesis

* Allow int64 array interface for groups

* Add packages to Windows CI

* Add to travis

* Make sure device index is set correctly

* Fix dask-cudf test

* appveyor
2020-06-16 12:45:59 +12:00
Alex
ae18a094b0
Add new skl model attribute for number of features (#5780) 2020-06-15 18:01:59 +08:00
Rory Mitchell
359023c0fa
Speed up python test (#5752)
* Speed up tests

* Prevent DeviceQuantileDMatrix initialisation with numpy

* Use joblib.memory

* Use RandomState
2020-06-05 11:39:24 +12:00
ShvetsKS
cd3d14ad0e
Add float32 histogram (#5624)
* new single_precision_histogram param was added.

Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
Co-authored-by: fis <jm.yuan@outlook.com>
2020-06-03 11:24:53 +08:00
Jiaming Yuan
e49607af19
Add Python binding for rabit ops. (#5743) 2020-06-02 19:47:23 +08:00
Jiaming Yuan
9e1b29944e
Fix loading old model. (#5724)
* Add test.
2020-05-31 14:55:32 +08:00
Jiaming Yuan
35e2205256
[dask] Return GPU Series when input is from cuDF. (#5710)
* Refactor predict function.
2020-05-28 17:51:20 +08:00
Jiaming Yuan
5af8161a1a
Implement Python data handler. (#5689)
* Define data handlers for DMatrix.
* Throw ValueError in scikit learn interface.
2020-05-22 11:53:55 +08:00
Jiaming Yuan
535479e69f
Add JSON schema to model dump. (#5660) 2020-05-15 10:18:43 +08:00
Jiaming Yuan
2c1a439869
Update Python demos with tests. (#5651)
* Remove GPU memory usage demo.
* Add tests for demos.
* Remove `silent`.
* Remove shebang as it's not portable.
2020-05-12 12:04:42 +08:00
Jiaming Yuan
c90457f489
Refactor the CLI. (#5574)
* Enable parameter validation.
* Enable JSON.
* Catch `dmlc::Error`.
* Show help message.
2020-04-26 10:56:33 +08:00
Jiaming Yuan
9c1103e06c
[Breaking] Set output margin to True for custom objective. (#5564)
* Set output margin to True for custom objective in Python and R.

* Add a demo for writing multi-class custom objective function.

* Run tests on selected demos.
2020-04-20 20:44:12 +08:00
Jiaming Yuan
b809f5d8b8
Don't set seed on CLI interface. (#5563) 2020-04-20 12:17:03 +08:00
Jiaming Yuan
e1f22baf8c
Fix slice and get info. (#5552) 2020-04-18 18:00:13 +08:00