5547 Commits

Author SHA1 Message Date
Jiaming Yuan
937fa282b5
Extract string view. (#7416)
* Add equality operators.
* Return a view in substr.
* Add proper iterator types.
2021-11-12 18:22:30 +08:00
Jiaming Yuan
ca6f980932
Check number of trees in inplace predict. (#7409) 2021-11-12 18:20:23 +08:00
Jiaming Yuan
97d7582457
Delay breaking changes to 1.6. (#7420)
The patch is too big to be backported.
2021-11-12 16:46:03 +08:00
Bobby Wang
cb685607b2
[jvm-packages] Rework the train pipeline (#7401)
1. Add PreXGBoost to build RDD[Watches] from Dataset
2. Feed RDD[Watches] built from PreXGBoost to XGBoost to train
2021-11-10 17:51:38 +08:00
Jiaming Yuan
8df0a252b7
[doc] Update document for GPU. [skip ci] (#7403)
* Remove outdated workaround and description.
2021-11-09 02:05:55 +08:00
Jiaming Yuan
d7d1b6e3a6
CPU evaluation for cat data. (#7393)
* Implementation for one hot based.
* Implementation for partition based. (LightGBM)
2021-11-06 14:41:35 +08:00
Jiaming Yuan
6ede12412c
Update dmlc-core and use data iter for GPU sampling tests. (#7398)
* Update dmlc-core.
* New parquet parser in dmlc-core.
* Use data iter for GPU sampling tests.
2021-11-06 05:12:49 +08:00
Jiaming Yuan
c968217ca8
[R] Fix global feature importance and predict with 1 sample. (#7394)
* [R] Fix global feature importance.

* Add implementation for tree index.  The parameter is not documented in C API since we
should work on porting the model slicing to R instead of supporting more use of tree
index.

* Fix the difference between "gain" and "total_gain".

* debug.

* Fix prediction.
2021-11-05 10:07:00 +08:00
Jiaming Yuan
48aff0eabd
[doc][jvm-packages] Update information about Python tracker. [skip ci] (#7396) 2021-11-05 05:55:13 +08:00
Jiaming Yuan
b06040b6d0
Implement a general array view. (#7365)
* Replace existing matrix and vector view.

This is to prepare for handling higher dimension data and prediction when we support multi-target models.
2021-11-05 04:16:11 +08:00
Jiaming Yuan
232144ca09
Add note about CRAN release [skip ci] (#7395) 2021-11-05 00:34:14 +08:00
Jiaming Yuan
4100827971
Pass infomation about objective to tree methods. (#7385)
* Define the `ObjInfo` and pass it down to every tree updater.
2021-11-04 01:52:44 +08:00
Jiaming Yuan
ccdabe4512
Support building gradient index with cat data. (#7371) 2021-11-03 22:37:37 +08:00
Jiaming Yuan
57a4b4ff64
Handle OMP_THREAD_LIMIT. (#7390) 2021-11-03 15:44:38 +08:00
Jiaming Yuan
e6ab594e14
Change shebang used in CLI demo. (#7389)
Change from system Python to environment python3.  For Ubuntu 20.04, only `python3` is
available and there's no `python`.  So at least `python3` is consistent with Python
virtual env, Ubuntu and anaconda.
2021-11-02 22:11:19 +08:00
Jiaming Yuan
a55d43ccfd
Add test for invalid categorical data values. (#7380)
* Add test for invalid categorical data values.

* Add check during sketching.
2021-11-02 18:00:52 +08:00
Jiaming Yuan
c74df31bf9
Cleanup the train function. (#7377)
* Move attribute setter to callback.
* Remove the internal train function.
* Remove unnecessary initialization.
2021-11-02 18:00:26 +08:00
Jiaming Yuan
154b15060e
Move callbacks from fit to __init__. (#7375) 2021-11-02 17:51:42 +08:00
Jiaming Yuan
32e673d8c4
Support building with CTK11.5. (#7379)
* Support building with CTK11.5.

* Require system cub installation for CTK11.4+.
* Check thrust version for segmented sort.
2021-11-02 16:22:26 +08:00
Jiaming Yuan
a13321148a
Support multi-class with base margin. (#7381)
This is already partially supported but never properly tested. So the only possible way to use it is calling `numpy.ndarray.flatten` with `base_margin` before passing it into XGBoost. This PR adds proper support
for most of the data types along with tests.
2021-11-02 13:38:00 +08:00
Jiaming Yuan
6295dc3b67
Fix span reverse iterator. (#7387)
* Fix span reverse iterator.

* Disable `rbegin` on device code to avoid calling host function.
* Add `trbegin` and friends.
2021-11-02 13:35:59 +08:00
Jiaming Yuan
8211e5f341
Add clang-format config. (#7383)
Generated using `clang-format -style=google -dump-config > .clang-format`, with column
width changed from 80 to 100 to be consistent with existing cpplint check.
2021-11-02 13:34:38 +08:00
Jiaming Yuan
0f7a9b42f1
Use double precision in metric calculation. (#7364) 2021-11-02 12:00:32 +08:00
Jiaming Yuan
239dbb3c0a
Move macos test to github action. (#7382)
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2021-10-30 14:40:32 +08:00
Bobby Wang
b81ebbef62
[jvm-packages] Fix json4s binary compatibility issue (#7376)
Spark 3.2 depends on 3.7.0-M11 which has changed some implicited functions'
signatures. And it will result the xgboost4j built against spark 3.0/3.1
failed when saving the model.
2021-10-30 03:20:57 +08:00
Jiaming Yuan
c6769488b3
Typehint for subset of core API. (#7348) 2021-10-28 20:47:04 +08:00
Jiaming Yuan
45aef75cca
Move skl eval_metric and early_stopping rounds to model params. (#6751)
A new parameter `custom_metric` is added to `train` and `cv` to distinguish the behaviour from the old `feval`.  And `feval` is deprecated.  The new `custom_metric` receives transformed prediction when the built-in objective is used.  This enables XGBoost to use cost functions from other libraries like scikit-learn directly without going through the definition of the link function.

`eval_metric` and `early_stopping_rounds` in sklearn interface are moved from `fit` to `__init__` and is now saved as part of the scikit-learn model.  The old ones in `fit` function are now deprecated. The new `eval_metric` in `__init__` has the same new behaviour as `custom_metric`.

Added more detailed documents for the behaviour of custom objective and metric.
2021-10-28 17:20:20 +08:00
Jiaming Yuan
6b074add66
Update setup.py. (#7360)
* Add new classifiers.
* Typehint.
2021-10-28 14:58:31 +08:00
Jiaming Yuan
3c4aa9b2ea
[breaking] Remove label encoder deprecated in 1.3. (#7357) 2021-10-28 13:24:29 +08:00
Jiaming Yuan
d05754f558
Avoid OMP reduction in AUC. (#7362) 2021-10-28 05:03:52 +08:00
Jiaming Yuan
ac9bfaa4f2
Handle missing values in dataframe with category dtype. (#7331)
* Replace -1 in pandas initializer.
* Unify `IsValid` functor.
* Mimic pandas data handling in cuDF glue code.
* Check invalid categories.
* Fix DDM sketching.
2021-10-28 03:33:54 +08:00
Jiaming Yuan
2eee87423c
Remove old custom objective demo. (#7369)
We have 2 new custom objective demos covering both regression and classification with
accompanying tutorials in documents.
2021-10-27 16:31:48 +08:00
Jiaming Yuan
b9414b6477
Update GPU doc for PR-AUC. [skip ci] (#7368) 2021-10-27 16:31:07 +08:00
Jiaming Yuan
d4349426d8
Re-implement PR-AUC. (#7297)
* Support binary/multi-class classification, ranking.
* Add documents.
* Handle missing data.
2021-10-26 13:07:50 +08:00
nicovdijk
a6bcd54b47
[jvm-packages] Fix for space in sys.executable path in create_jni.py (#7358) 2021-10-25 13:45:11 +08:00
Jiaming Yuan
fd61c61071
Avoid omp reduction in rank metric. (#7349) 2021-10-22 14:13:34 +08:00
Jiaming Yuan
e36b066344
[doc] Document the status of RTD hosting. [skip ci] (#7353) 2021-10-22 14:12:55 +08:00
Jiaming Yuan
864d236a82
[doc] Remove num_pbuffer. [skip ci] (#7356) 2021-10-22 14:12:32 +08:00
nicovdijk
31a307cf6b
[XGBoost4J-Spark] Serialization for custom objective and eval (#7274)
* added type hints to custom_obj and custom_eval for Spark persistence


Co-authored-by: Bobby Wang <wbo4958@gmail.com>
2021-10-21 16:22:23 +08:00
Jiaming Yuan
7593fa9982
1.5 release note. [skip ci] (#7271)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-10-21 13:43:31 +08:00
Jiaming Yuan
d1f00fb0b7
Stricter validation for group. (#7345) 2021-10-21 12:13:33 +08:00
nicovdijk
74bab6e504
Control logging for early stopping using shouldPrint() (#7326) 2021-10-21 12:12:06 +08:00
Jiaming Yuan
8d7c6366d7
Accept histogram cut instead gradient index in evaluation. (#7336) 2021-10-20 18:04:46 +08:00
Jiaming Yuan
15685996fc
[doc] Small improvements for categorical data document. (#7330) 2021-10-20 18:04:32 +08:00
Jiaming Yuan
f999897615
[dask] Use nthread in DMatrix construction. (#7337)
This is consistent with the thread overriding behavior.
2021-10-20 15:16:40 +08:00
Philip Hyunsu Cho
b8e8f0fcd9
[doc] Use latest Sphinx RTD theme (#7347) 2021-10-20 00:04:43 -07:00
Jiaming Yuan
3b0b74fa94
[doc] Use RTD theme. (#7346) 2021-10-19 23:49:19 -07:00
Jiaming Yuan
376b448015
[doc] Fix broken links. (#7341)
* Fix most of the link checks from sphinx.
* Remove duplicate explicit target name.
2021-10-20 14:45:30 +08:00
Jiaming Yuan
f53da412aa
Add typehint to tracker. (#7338) 2021-10-20 12:49:36 +08:00
Jiaming Yuan
5ff210ed75
Small fix for the release doc and script. [skip ci] (#7332)
Add Philip as co-maintainer of maven packages.
2021-10-20 12:49:12 +08:00