Commit Graph

5567 Commits

Author SHA1 Message Date
Jiaming Yuan
c024c42dce Modernize XGBoost Python document. (#7468)
* Use sphinx gallery to integrate examples.
* Remove mock objects.
* Add dask doc inventory.
2021-11-23 23:24:52 +08:00
Philip Hyunsu Cho
96a9848c9e [CI] Fix continuous delivery pipeline for MacOS (#7472) 2021-11-23 22:22:08 +08:00
Jiaming Yuan
b124a27f57 Support scipy sparse in dask. (#7457) 2021-11-23 16:45:36 +08:00
Jiaming Yuan
5262e933f7 Remove unnecessary constexpr. (#7466) 2021-11-23 16:42:08 +08:00
Philip Hyunsu Cho
0c67685e43 [CI] Add a helper script to aid Maven release (#7470)
* [CI] Add a helper script to aid Maven release

* Move script to dev/ [skip ci]

* Update command [skip ci]
2021-11-23 00:11:07 -08:00
Harvey
0552ca8021 Fix typo (#7469) 2021-11-23 08:58:45 +08:00
Jiaming Yuan
176110a22d Support external memory in CPU histogram building. (#7372) 2021-11-23 01:13:33 +08:00
Jiaming Yuan
d33854af1b [Breaking] Accept multi-dim meta info. (#7405)
This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.
2021-11-18 23:02:54 +08:00
Jiaming Yuan
9fb4338964 Add test for eta and mitigate float error. (#7446)
* Add eta test.
* Don't skip test.
2021-11-18 20:42:48 +08:00
Bobby Wang
7cfb310eb4 Rework transform (#7440)
extract the common part of transform code from XGBoostClassifier
and XGBoostRegressor
2021-11-18 15:48:57 +08:00
Philip Hyunsu Cho
2adf222fb2 [CI] CI cost saving (#7407)
* [CI] Drop CUDA 10.1; Require 11.0

* Change NCCL version

* Use CUDA 10.1 for clang-tidy, for now

* Remove JDK 11 and 12

* Fix NCCL version

* Don't require 11.0 just yet, until clang-tidy is fixed

* Skip MultiClassesSerializationTest.GpuHist
2021-11-17 21:02:20 -08:00
Jiaming Yuan
b0015fda96 Fix R CRAN failures. (#7404)
* Remove hist builder dtor.

* Initialize values.

* Tolerance.

* Remove the use of nthread in col maker.
2021-11-16 10:51:12 +08:00
Jiaming Yuan
55ee272ea8 Extend array interface to handle ndarray. (#7434)
* Extend array interface to handle ndarray.

The `ArrayInterface` class is extended to support multi-dim array inputs. Previously this
class handles only 2-dim (vector is also matrix).  This PR specifies the expected
dimension at compile-time and the array interface can perform various checks automatically
for input data. Also, adapters like CSR are more rigorous about their input.  Lastly, row
vector and column vector are handled without intervention from the caller.
2021-11-16 09:52:15 +08:00
Jiaming Yuan
e27f543deb Set use_logger in tracker to false. (#7438) 2021-11-16 05:12:42 +08:00
Jiaming Yuan
d4274bc556 Fix typo. (#7433) 2021-11-15 01:28:11 +08:00
Jiaming Yuan
a7057fa64c Implement typed storage for tensor. (#7429)
* Add `Tensor` class.
* Add elementwise kernel for CPU and GPU.
* Add unravel index.
* Move some computation to compile time.
2021-11-14 18:53:13 +08:00
Kian Meng Ang
d27a11ff87 Fix typos in python package (#7432) 2021-11-14 17:20:19 +08:00
Jiaming Yuan
8cc75f1576 Cleanup Python tests. (#7426) 2021-11-14 15:47:05 +08:00
Jiaming Yuan
38ca96c9fc [CI] Install igraph as binary. (#7417) 2021-11-12 19:04:28 +08:00
Jiaming Yuan
46726ec176 Expose build info (#7399) 2021-11-12 18:22:46 +08:00
Jiaming Yuan
937fa282b5 Extract string view. (#7416)
* Add equality operators.
* Return a view in substr.
* Add proper iterator types.
2021-11-12 18:22:30 +08:00
Jiaming Yuan
ca6f980932 Check number of trees in inplace predict. (#7409) 2021-11-12 18:20:23 +08:00
Jiaming Yuan
97d7582457 Delay breaking changes to 1.6. (#7420)
The patch is too big to be backported.
2021-11-12 16:46:03 +08:00
Bobby Wang
cb685607b2 [jvm-packages] Rework the train pipeline (#7401)
1. Add PreXGBoost to build RDD[Watches] from Dataset
2. Feed RDD[Watches] built from PreXGBoost to XGBoost to train
2021-11-10 17:51:38 +08:00
Jiaming Yuan
8df0a252b7 [doc] Update document for GPU. [skip ci] (#7403)
* Remove outdated workaround and description.
2021-11-09 02:05:55 +08:00
Jiaming Yuan
d7d1b6e3a6 CPU evaluation for cat data. (#7393)
* Implementation for one hot based.
* Implementation for partition based. (LightGBM)
2021-11-06 14:41:35 +08:00
Jiaming Yuan
6ede12412c Update dmlc-core and use data iter for GPU sampling tests. (#7398)
* Update dmlc-core.
* New parquet parser in dmlc-core.
* Use data iter for GPU sampling tests.
2021-11-06 05:12:49 +08:00
Jiaming Yuan
c968217ca8 [R] Fix global feature importance and predict with 1 sample. (#7394)
* [R] Fix global feature importance.

* Add implementation for tree index.  The parameter is not documented in C API since we
should work on porting the model slicing to R instead of supporting more use of tree
index.

* Fix the difference between "gain" and "total_gain".

* debug.

* Fix prediction.
2021-11-05 10:07:00 +08:00
Jiaming Yuan
48aff0eabd [doc][jvm-packages] Update information about Python tracker. [skip ci] (#7396) 2021-11-05 05:55:13 +08:00
Jiaming Yuan
b06040b6d0 Implement a general array view. (#7365)
* Replace existing matrix and vector view.

This is to prepare for handling higher dimension data and prediction when we support multi-target models.
2021-11-05 04:16:11 +08:00
Jiaming Yuan
232144ca09 Add note about CRAN release [skip ci] (#7395) 2021-11-05 00:34:14 +08:00
Jiaming Yuan
4100827971 Pass infomation about objective to tree methods. (#7385)
* Define the `ObjInfo` and pass it down to every tree updater.
2021-11-04 01:52:44 +08:00
Jiaming Yuan
ccdabe4512 Support building gradient index with cat data. (#7371) 2021-11-03 22:37:37 +08:00
Jiaming Yuan
57a4b4ff64 Handle OMP_THREAD_LIMIT. (#7390) 2021-11-03 15:44:38 +08:00
Jiaming Yuan
e6ab594e14 Change shebang used in CLI demo. (#7389)
Change from system Python to environment python3.  For Ubuntu 20.04, only `python3` is
available and there's no `python`.  So at least `python3` is consistent with Python
virtual env, Ubuntu and anaconda.
2021-11-02 22:11:19 +08:00
Jiaming Yuan
a55d43ccfd Add test for invalid categorical data values. (#7380)
* Add test for invalid categorical data values.

* Add check during sketching.
2021-11-02 18:00:52 +08:00
Jiaming Yuan
c74df31bf9 Cleanup the train function. (#7377)
* Move attribute setter to callback.
* Remove the internal train function.
* Remove unnecessary initialization.
2021-11-02 18:00:26 +08:00
Jiaming Yuan
154b15060e Move callbacks from fit to __init__. (#7375) 2021-11-02 17:51:42 +08:00
Jiaming Yuan
32e673d8c4 Support building with CTK11.5. (#7379)
* Support building with CTK11.5.

* Require system cub installation for CTK11.4+.
* Check thrust version for segmented sort.
2021-11-02 16:22:26 +08:00
Jiaming Yuan
a13321148a Support multi-class with base margin. (#7381)
This is already partially supported but never properly tested. So the only possible way to use it is calling `numpy.ndarray.flatten` with `base_margin` before passing it into XGBoost. This PR adds proper support
for most of the data types along with tests.
2021-11-02 13:38:00 +08:00
Jiaming Yuan
6295dc3b67 Fix span reverse iterator. (#7387)
* Fix span reverse iterator.

* Disable `rbegin` on device code to avoid calling host function.
* Add `trbegin` and friends.
2021-11-02 13:35:59 +08:00
Jiaming Yuan
8211e5f341 Add clang-format config. (#7383)
Generated using `clang-format -style=google -dump-config > .clang-format`, with column
width changed from 80 to 100 to be consistent with existing cpplint check.
2021-11-02 13:34:38 +08:00
Jiaming Yuan
0f7a9b42f1 Use double precision in metric calculation. (#7364) 2021-11-02 12:00:32 +08:00
Jiaming Yuan
239dbb3c0a Move macos test to github action. (#7382)
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2021-10-30 14:40:32 +08:00
Bobby Wang
b81ebbef62 [jvm-packages] Fix json4s binary compatibility issue (#7376)
Spark 3.2 depends on 3.7.0-M11 which has changed some implicited functions'
signatures. And it will result the xgboost4j built against spark 3.0/3.1
failed when saving the model.
2021-10-30 03:20:57 +08:00
Jiaming Yuan
c6769488b3 Typehint for subset of core API. (#7348) 2021-10-28 20:47:04 +08:00
Jiaming Yuan
45aef75cca Move skl eval_metric and early_stopping rounds to model params. (#6751)
A new parameter `custom_metric` is added to `train` and `cv` to distinguish the behaviour from the old `feval`.  And `feval` is deprecated.  The new `custom_metric` receives transformed prediction when the built-in objective is used.  This enables XGBoost to use cost functions from other libraries like scikit-learn directly without going through the definition of the link function.

`eval_metric` and `early_stopping_rounds` in sklearn interface are moved from `fit` to `__init__` and is now saved as part of the scikit-learn model.  The old ones in `fit` function are now deprecated. The new `eval_metric` in `__init__` has the same new behaviour as `custom_metric`.

Added more detailed documents for the behaviour of custom objective and metric.
2021-10-28 17:20:20 +08:00
Jiaming Yuan
6b074add66 Update setup.py. (#7360)
* Add new classifiers.
* Typehint.
2021-10-28 14:58:31 +08:00
Jiaming Yuan
3c4aa9b2ea [breaking] Remove label encoder deprecated in 1.3. (#7357) 2021-10-28 13:24:29 +08:00
Jiaming Yuan
d05754f558 Avoid OMP reduction in AUC. (#7362) 2021-10-28 05:03:52 +08:00