5224 Commits

Author SHA1 Message Date
Jiaming Yuan
e8c5c53e2f
Use Predictor for dart. (#6693)
* Use normal predictor for dart booster.
* Implement `inplace_predict` for dart.
* Enable `dart` for dask interface now that it's thread-safe.
* categorical data should be working out of box for dart now.

The implementation is not very efficient as it has to pull back the data and
apply weight for each tree, but still a significant improvement over previous
implementation as now we no longer binary search for each sample.

* Fix output prediction shape on dataframe.
2021-02-09 23:30:19 +08:00
Jiaming Yuan
dbf7e9d3cb
Remove R cache in github action. (#6695)
The cache stores outdated packages with wrong linkage.  Right now there's no way to clear the cache.
2021-02-09 18:53:20 +08:00
Jiaming Yuan
1335db6113
[dask] Improve documents. (#6687)
* Add tag for versions.
* use autoclass in sphinx build.
Made some class methods to be private to avoid exporting documents.
2021-02-09 09:20:58 +08:00
Jiaming Yuan
5d48d40d9a
Fix DMatrix slice with feature types. (#6689) 2021-02-09 08:13:51 +08:00
Jiaming Yuan
218a5fb6dd
Simplify Span checks. (#6685)
* Stop printing out message.
* Remove R specialization.

The printed message is not really useful anyway, without a reproducible example
there's no way to fix it.  But if there's a reproducible example, we can always
obtain these information by a debugger.  Removing the `printf` function avoids
creating the context in kernel.
2021-02-09 08:12:58 +08:00
Jiaming Yuan
4656b09d5d
[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668)
* Add a new API function for predicting on `DMatrix`.  This function aligns
with rest of the `XGBoosterPredictFrom*` functions on semantic of function
arguments.
* Purge `ntree_limit` from libxgboost, use iteration instead.
* [dask] Use `inplace_predict` by default for dask sklearn models.
* [dask] Run prediction shape inference on worker instead of client.

The breaking change is in the Python sklearn `apply` function, I made it to be
consistent with other prediction functions where `best_iteration` is used by
default.
2021-02-08 18:26:32 +08:00
Jiaming Yuan
dbb5208a0a
Use __array_interface__ for creating DMatrix from CSR. (#6675)
* Use __array_interface__ for creating DMatrix from CSR.
* Add configuration.
2021-02-05 21:09:47 +08:00
Jiaming Yuan
1e949110da
Use generic dispatching routine for array interface. (#6672) 2021-02-05 09:23:38 +08:00
Jiaming Yuan
a4101de678
Fix divide by 0 in feature importance when no split is found. (#6676) 2021-02-05 03:39:30 +08:00
Jiaming Yuan
72892cc80d
[dask] Disable gblinear and dart. (#6665) 2021-02-04 09:13:09 +08:00
Jiaming Yuan
9d62b14591
Fix document. [skip ci] (#6669) 2021-02-02 20:43:31 +08:00
Jiaming Yuan
411592a347
Enhance inplace prediction. (#6653)
* Accept array interface for csr and array.
* Accept an optional proxy dmatrix for metainfo.

This constructs an explicit `_ProxyDMatrix` type in Python.

* Remove unused doc.
* Add strict output.
2021-02-02 11:41:46 +08:00
Jiaming Yuan
87ab1ad607
[dask] Accept Future of model for prediction. (#6650)
This PR changes predict and inplace_predict to accept a Future of model, to avoid sending models to workers repeatably.

* Document is updated to reflect functionality additions in recent changes.
2021-02-02 08:45:52 +08:00
Jiaming Yuan
a9ec0ea6da
Align device id in predict transform with predictor. (#6662) 2021-02-02 08:33:29 +08:00
Jiaming Yuan
d8ec7aad5a
[dask] Add a 1 line sample to infer output shape. (#6645)
* [dask] Use a 1 line sample to infer output shape.

This is for inferring shape with direct prediction (without DaskDMatrix).
There are a few things that requires known output shape before carrying out
actual prediction, including dask meta data, output dataframe columns.

* Infer output shape based on local prediction.
* Remove set param in predict function as it's not thread safe nor necessary as
we now let dask to decide the parallelism.
* Simplify prediction on `DaskDMatrix`.
2021-01-30 18:55:50 +08:00
Jiaming Yuan
c3c8e66fc9
Make prediction functions thread safe. (#6648) 2021-01-28 23:29:43 +08:00
Philip Hyunsu Cho
0f2ed21a9d
[Breaking] Change default evaluation metric for binary:logitraw objective to logloss (#6647) 2021-01-29 00:12:12 +09:00
Jiaming Yuan
d167892c7e
[dask] Ensure model can be pickled. (#6651) 2021-01-28 21:47:57 +08:00
Philip Hyunsu Cho
0ad6e18a2a
[CI] Do not mix up stashed executable built for ARM and x86_64 platforms (#6646) 2021-01-27 23:57:26 +09:00
Philip Hyunsu Cho
55ee2bd77f
[CI] Add ARM64 test to Jenkins pipeline (#6643)
* Add ARM64 test to Jenkins pipeline

* Check for bundled libgomp

* Use a separate test suite for ARM64

* Ensure that x86 jobs don't run on ARM workers
2021-01-27 21:51:17 +09:00
Jiaming Yuan
1b70a323a7
Improve string view to reduce string allocation. (#6644) 2021-01-27 19:08:52 +08:00
Jiaming Yuan
bc08e0c9d1
Remove experimental_json_serialization from tests. (#6640) 2021-01-27 17:44:49 +08:00
Jiaming Yuan
8968ca7c0a
Disable s390x and arm64 tests on travis for now. (#6641) 2021-01-27 16:21:40 +08:00
Jiaming Yuan
d19a0ddacf
Move sdist test to action. (#6635)
* Move x86 linux and osx sdist test to action.

* Add Windows.
2021-01-26 08:25:59 +08:00
Jiaming Yuan
740d042255
Add base_margin for evaluation dataset. (#6591)
* Add base margin to evaluation datasets.
* Unify the code base for evaluation matrices.
2021-01-26 02:11:02 +08:00
Jiaming Yuan
4bf23c2391
Specify shape in prediction contrib and interaction. (#6614) 2021-01-26 02:08:22 +08:00
Jiaming Yuan
8942c98054
Define metainfo and other parameters for all DMatrix interfaces. (#6601)
This PR ensures all DMatrix types have a common interface.

* Fix logic in avoiding duplicated DMatrix in sklearn.
* Check for consistency between DMatrix types.
* Add doc for bounds.
2021-01-25 16:06:06 +08:00
Jiaming Yuan
561809200a
Fix document for tree methods. (#6633) 2021-01-25 15:52:08 +08:00
Adam Pocock
fec66d033a
[jvm-packages] JVM library loader extensions (#6630)
* [java] extending the library loader to use both OS and CPU architecture.

* Simplifying create_jni.py's architecture detection.

* Tidying up the architecture detection in create_jni.py
2021-01-25 15:51:39 +08:00
Jiaming Yuan
a275f40267
[dask] Rework base margin test. (#6627) 2021-01-22 17:49:13 +08:00
Jiaming Yuan
7bc56fa0ed
Use simple print in tracker print function. (#6609) 2021-01-21 21:15:43 +08:00
Jiaming Yuan
26982f9fce
Skip unused CMake argument in setup.py (#6611) 2021-01-21 17:25:33 +08:00
Jiaming Yuan
f0fd7629ae
Add helper script and doc for releasing pip package. (#6613)
* Fix `long_description_content_type`.
2021-01-21 14:46:52 +08:00
Bobby Wang
9d2832a3a3
fix potential TaskFailedListener's callback won't be called (#6612)
there is possibility that onJobStart of TaskFailedListener won't be called, if
the job is submitted before the other thread adds addSparkListener.

detail can be found at https://github.com/dmlc/xgboost/pull/6019#issuecomment-760937628
2021-01-21 14:20:32 +08:00
Jiaming Yuan
f8bb678c67
Exclude dmlc test on github action. (#6625) 2021-01-20 18:50:20 +08:00
Jiaming Yuan
d6d72de339
Revert ntree limit fix (#6616)
The old (before fix) best_ntree_limit ignores the num_class parameters, which is incorrect. In before we workarounded it in c++ layer to avoid possible breaking changes on other language bindings. But the Python interpretation stayed incorrect. The PR fixed that in Python to consider num_class, but didn't remove the old workaround, so tree calculation in predictor is incorrect, see PredictBatch in CPUPredictor.
2021-01-19 23:51:16 +08:00
Jiaming Yuan
d132933550
Remove type check for solaris. (#6610) 2021-01-16 02:58:19 +08:00
Jiaming Yuan
d356b7a071
Restore unknown data support. (#6595) 2021-01-14 04:51:16 +08:00
Jiaming Yuan
89a00a5866
[dask] Random forest estimators (#6602) 2021-01-13 20:59:20 +08:00
Jiaming Yuan
0027220aa0
[breaking] Remove duplicated predict functions, Fix attributes IO. (#6593)
* Fix attributes not being restored.
* Rename all `data` to `X`. [breaking]
2021-01-13 16:56:49 +08:00
ShvetsKS
7f4d3a91b9
Multiclass prediction caching for CPU Hist (#6550)
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-01-13 04:42:07 +08:00
Jiaming Yuan
03cd087da1
Remove duplicated DMatrix. (#6592) 2021-01-12 09:36:56 +08:00
Jiaming Yuan
c709f2aaaf
Fix evaluation result for XGBRanker. (#6594)
* Remove duplicated code, which fixes typo `evals_result` -> `evals_result_`.
2021-01-12 09:36:41 +08:00
Jiaming Yuan
f2f7dd87b8
Use view for SparsePage exclusively. (#6590) 2021-01-11 18:04:55 +08:00
Jiaming Yuan
78f2cd83d7
Suppress hypothesis health check for dask client. (#6589) 2021-01-11 14:11:57 +08:00
Jiaming Yuan
80065d571e
[dask] Add DaskXGBRanker (#6576)
* Initial support for distributed LTR using dask.

* Support `qid` in libxgboost.
* Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]`
  to avoid duplicated code.
* Define `DaskXGBRanker`.

The dask ranker doesn't support group structure, instead it uses query id and
convert to group ptr internally.
2021-01-08 18:35:09 +08:00
Jiaming Yuan
96d3d32265
[dask] Add shap tests. (#6575) 2021-01-08 14:59:27 +08:00
Jiaming Yuan
7c9dcbedbc
Fix best_ntree_limit for dart and gblinear. (#6579) 2021-01-08 10:05:39 +08:00
Jiaming Yuan
f5ff90cd87
Support _estimator_type. (#6582)
* Use `_estimator_type`.

For more info, see: https://scikit-learn.org/stable/developers/develop.html#estimator-types

* Model trained from dask can be loaded by single node skl interface.
2021-01-08 10:01:16 +08:00
Jiaming Yuan
8747885a8b
Support Solaris. (#6578)
* Add system header.

* Remove use of TR1 on Solaris

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2021-01-07 09:05:05 +08:00