5563 Commits

Author SHA1 Message Date
Philip Hyunsu Cho
0c67685e43
[CI] Add a helper script to aid Maven release (#7470)
* [CI] Add a helper script to aid Maven release

* Move script to dev/ [skip ci]

* Update command [skip ci]
2021-11-23 00:11:07 -08:00
Harvey
0552ca8021
Fix typo (#7469) 2021-11-23 08:58:45 +08:00
Jiaming Yuan
176110a22d
Support external memory in CPU histogram building. (#7372) 2021-11-23 01:13:33 +08:00
Jiaming Yuan
d33854af1b
[Breaking] Accept multi-dim meta info. (#7405)
This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.
2021-11-18 23:02:54 +08:00
Jiaming Yuan
9fb4338964
Add test for eta and mitigate float error. (#7446)
* Add eta test.
* Don't skip test.
2021-11-18 20:42:48 +08:00
Bobby Wang
7cfb310eb4
Rework transform (#7440)
extract the common part of transform code from XGBoostClassifier
and XGBoostRegressor
2021-11-18 15:48:57 +08:00
Philip Hyunsu Cho
2adf222fb2
[CI] CI cost saving (#7407)
* [CI] Drop CUDA 10.1; Require 11.0

* Change NCCL version

* Use CUDA 10.1 for clang-tidy, for now

* Remove JDK 11 and 12

* Fix NCCL version

* Don't require 11.0 just yet, until clang-tidy is fixed

* Skip MultiClassesSerializationTest.GpuHist
2021-11-17 21:02:20 -08:00
Jiaming Yuan
b0015fda96
Fix R CRAN failures. (#7404)
* Remove hist builder dtor.

* Initialize values.

* Tolerance.

* Remove the use of nthread in col maker.
2021-11-16 10:51:12 +08:00
Jiaming Yuan
55ee272ea8
Extend array interface to handle ndarray. (#7434)
* Extend array interface to handle ndarray.

The `ArrayInterface` class is extended to support multi-dim array inputs. Previously this
class handles only 2-dim (vector is also matrix).  This PR specifies the expected
dimension at compile-time and the array interface can perform various checks automatically
for input data. Also, adapters like CSR are more rigorous about their input.  Lastly, row
vector and column vector are handled without intervention from the caller.
2021-11-16 09:52:15 +08:00
Jiaming Yuan
e27f543deb
Set use_logger in tracker to false. (#7438) 2021-11-16 05:12:42 +08:00
Jiaming Yuan
d4274bc556
Fix typo. (#7433) 2021-11-15 01:28:11 +08:00
Jiaming Yuan
a7057fa64c
Implement typed storage for tensor. (#7429)
* Add `Tensor` class.
* Add elementwise kernel for CPU and GPU.
* Add unravel index.
* Move some computation to compile time.
2021-11-14 18:53:13 +08:00
Kian Meng Ang
d27a11ff87
Fix typos in python package (#7432) 2021-11-14 17:20:19 +08:00
Jiaming Yuan
8cc75f1576
Cleanup Python tests. (#7426) 2021-11-14 15:47:05 +08:00
Jiaming Yuan
38ca96c9fc
[CI] Install igraph as binary. (#7417) 2021-11-12 19:04:28 +08:00
Jiaming Yuan
46726ec176
Expose build info (#7399) 2021-11-12 18:22:46 +08:00
Jiaming Yuan
937fa282b5
Extract string view. (#7416)
* Add equality operators.
* Return a view in substr.
* Add proper iterator types.
2021-11-12 18:22:30 +08:00
Jiaming Yuan
ca6f980932
Check number of trees in inplace predict. (#7409) 2021-11-12 18:20:23 +08:00
Jiaming Yuan
97d7582457
Delay breaking changes to 1.6. (#7420)
The patch is too big to be backported.
2021-11-12 16:46:03 +08:00
Bobby Wang
cb685607b2
[jvm-packages] Rework the train pipeline (#7401)
1. Add PreXGBoost to build RDD[Watches] from Dataset
2. Feed RDD[Watches] built from PreXGBoost to XGBoost to train
2021-11-10 17:51:38 +08:00
Jiaming Yuan
8df0a252b7
[doc] Update document for GPU. [skip ci] (#7403)
* Remove outdated workaround and description.
2021-11-09 02:05:55 +08:00
Jiaming Yuan
d7d1b6e3a6
CPU evaluation for cat data. (#7393)
* Implementation for one hot based.
* Implementation for partition based. (LightGBM)
2021-11-06 14:41:35 +08:00
Jiaming Yuan
6ede12412c
Update dmlc-core and use data iter for GPU sampling tests. (#7398)
* Update dmlc-core.
* New parquet parser in dmlc-core.
* Use data iter for GPU sampling tests.
2021-11-06 05:12:49 +08:00
Jiaming Yuan
c968217ca8
[R] Fix global feature importance and predict with 1 sample. (#7394)
* [R] Fix global feature importance.

* Add implementation for tree index.  The parameter is not documented in C API since we
should work on porting the model slicing to R instead of supporting more use of tree
index.

* Fix the difference between "gain" and "total_gain".

* debug.

* Fix prediction.
2021-11-05 10:07:00 +08:00
Jiaming Yuan
48aff0eabd
[doc][jvm-packages] Update information about Python tracker. [skip ci] (#7396) 2021-11-05 05:55:13 +08:00
Jiaming Yuan
b06040b6d0
Implement a general array view. (#7365)
* Replace existing matrix and vector view.

This is to prepare for handling higher dimension data and prediction when we support multi-target models.
2021-11-05 04:16:11 +08:00
Jiaming Yuan
232144ca09
Add note about CRAN release [skip ci] (#7395) 2021-11-05 00:34:14 +08:00
Jiaming Yuan
4100827971
Pass infomation about objective to tree methods. (#7385)
* Define the `ObjInfo` and pass it down to every tree updater.
2021-11-04 01:52:44 +08:00
Jiaming Yuan
ccdabe4512
Support building gradient index with cat data. (#7371) 2021-11-03 22:37:37 +08:00
Jiaming Yuan
57a4b4ff64
Handle OMP_THREAD_LIMIT. (#7390) 2021-11-03 15:44:38 +08:00
Jiaming Yuan
e6ab594e14
Change shebang used in CLI demo. (#7389)
Change from system Python to environment python3.  For Ubuntu 20.04, only `python3` is
available and there's no `python`.  So at least `python3` is consistent with Python
virtual env, Ubuntu and anaconda.
2021-11-02 22:11:19 +08:00
Jiaming Yuan
a55d43ccfd
Add test for invalid categorical data values. (#7380)
* Add test for invalid categorical data values.

* Add check during sketching.
2021-11-02 18:00:52 +08:00
Jiaming Yuan
c74df31bf9
Cleanup the train function. (#7377)
* Move attribute setter to callback.
* Remove the internal train function.
* Remove unnecessary initialization.
2021-11-02 18:00:26 +08:00
Jiaming Yuan
154b15060e
Move callbacks from fit to __init__. (#7375) 2021-11-02 17:51:42 +08:00
Jiaming Yuan
32e673d8c4
Support building with CTK11.5. (#7379)
* Support building with CTK11.5.

* Require system cub installation for CTK11.4+.
* Check thrust version for segmented sort.
2021-11-02 16:22:26 +08:00
Jiaming Yuan
a13321148a
Support multi-class with base margin. (#7381)
This is already partially supported but never properly tested. So the only possible way to use it is calling `numpy.ndarray.flatten` with `base_margin` before passing it into XGBoost. This PR adds proper support
for most of the data types along with tests.
2021-11-02 13:38:00 +08:00
Jiaming Yuan
6295dc3b67
Fix span reverse iterator. (#7387)
* Fix span reverse iterator.

* Disable `rbegin` on device code to avoid calling host function.
* Add `trbegin` and friends.
2021-11-02 13:35:59 +08:00
Jiaming Yuan
8211e5f341
Add clang-format config. (#7383)
Generated using `clang-format -style=google -dump-config > .clang-format`, with column
width changed from 80 to 100 to be consistent with existing cpplint check.
2021-11-02 13:34:38 +08:00
Jiaming Yuan
0f7a9b42f1
Use double precision in metric calculation. (#7364) 2021-11-02 12:00:32 +08:00
Jiaming Yuan
239dbb3c0a
Move macos test to github action. (#7382)
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2021-10-30 14:40:32 +08:00
Bobby Wang
b81ebbef62
[jvm-packages] Fix json4s binary compatibility issue (#7376)
Spark 3.2 depends on 3.7.0-M11 which has changed some implicited functions'
signatures. And it will result the xgboost4j built against spark 3.0/3.1
failed when saving the model.
2021-10-30 03:20:57 +08:00
Jiaming Yuan
c6769488b3
Typehint for subset of core API. (#7348) 2021-10-28 20:47:04 +08:00
Jiaming Yuan
45aef75cca
Move skl eval_metric and early_stopping rounds to model params. (#6751)
A new parameter `custom_metric` is added to `train` and `cv` to distinguish the behaviour from the old `feval`.  And `feval` is deprecated.  The new `custom_metric` receives transformed prediction when the built-in objective is used.  This enables XGBoost to use cost functions from other libraries like scikit-learn directly without going through the definition of the link function.

`eval_metric` and `early_stopping_rounds` in sklearn interface are moved from `fit` to `__init__` and is now saved as part of the scikit-learn model.  The old ones in `fit` function are now deprecated. The new `eval_metric` in `__init__` has the same new behaviour as `custom_metric`.

Added more detailed documents for the behaviour of custom objective and metric.
2021-10-28 17:20:20 +08:00
Jiaming Yuan
6b074add66
Update setup.py. (#7360)
* Add new classifiers.
* Typehint.
2021-10-28 14:58:31 +08:00
Jiaming Yuan
3c4aa9b2ea
[breaking] Remove label encoder deprecated in 1.3. (#7357) 2021-10-28 13:24:29 +08:00
Jiaming Yuan
d05754f558
Avoid OMP reduction in AUC. (#7362) 2021-10-28 05:03:52 +08:00
Jiaming Yuan
ac9bfaa4f2
Handle missing values in dataframe with category dtype. (#7331)
* Replace -1 in pandas initializer.
* Unify `IsValid` functor.
* Mimic pandas data handling in cuDF glue code.
* Check invalid categories.
* Fix DDM sketching.
2021-10-28 03:33:54 +08:00
Jiaming Yuan
2eee87423c
Remove old custom objective demo. (#7369)
We have 2 new custom objective demos covering both regression and classification with
accompanying tutorials in documents.
2021-10-27 16:31:48 +08:00
Jiaming Yuan
b9414b6477
Update GPU doc for PR-AUC. [skip ci] (#7368) 2021-10-27 16:31:07 +08:00
Jiaming Yuan
d4349426d8
Re-implement PR-AUC. (#7297)
* Support binary/multi-class classification, ranking.
* Add documents.
* Handle missing data.
2021-10-26 13:07:50 +08:00