5497 Commits

Author SHA1 Message Date
Jiaming Yuan
35dac8af1d
[BP] Fix index type for bitfield. (#7541) (#7560) 2022-01-14 00:21:34 +08:00
Jiaming Yuan
1311a20f49
[BP] Fix num_boosted_rounds for linear model. (#7538) (#7559)
* Add note.

* Fix n boosted rounds.
2022-01-14 00:20:57 +08:00
Jiaming Yuan
328d1e18db
[backport] [R] Fix single sample prediction. (#7524) (#7558) 2022-01-14 00:20:17 +08:00
Jiaming Yuan
3e2d7519a6
[dask] Fix asyncio. (#7508) (#7561) 2022-01-13 21:49:11 +08:00
Jiaming Yuan
afb9dfd421
[backport] CI fixes for macos (#7482)
* [CI] Fix continuous delivery pipeline for MacOS (#7472)

* Fix github macos package upload. (#7474)

* Fix macos package upload. (#7475)


* Split up the tests.

* [CI] Add missing step extract_branch (#7479)

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-11-25 01:57:55 +08:00
Hyunsu Cho
eb69c6110a Bump version to 1.5.1 v1.5.1 2021-11-22 14:29:59 -08:00
Jiaming Yuan
0f9ffcdc16
[backport] Fix R CRAN failures. (#7404) (#7451)
* Remove hist builder dtor.

* Initialize values.

* Tolerance.

* Remove the use of nthread in col maker.
2021-11-19 21:40:04 +08:00
Jiaming Yuan
9bbd00a49f
[backport] Set use_logger in tracker to false. (#7438) (#7439) 2021-11-16 09:51:37 +08:00
Jiaming Yuan
7e239f229c
[CI] Install igraph as binary. (#7417) (#7430) 2021-11-13 01:53:41 +08:00
Jiaming Yuan
a013942649
Check number of trees in inplace predict. (#7409) (#7424) 2021-11-12 19:31:31 +08:00
Jiaming Yuan
4d2ea0d4ef
[backport] [doc] Fix broken links. (#7341) (#7418)
* Fix most of the link checks from sphinx.
* Remove duplicate explicit target name.
2021-11-11 19:33:02 +08:00
Jiaming Yuan
d1052b5cfe
[jvm-packages] Fix json4s binary compatibility issue (#7376) (#7414)
Spark 3.2 depends on 3.7.0-M11 which has changed some implicited functions'
signatures. And it will result the xgboost4j built against spark 3.0/3.1
failed when saving the model.

Co-authored-by: Bobby Wang <wbo4958@gmail.com>
2021-11-10 21:25:11 +08:00
Jiaming Yuan
14c56f05da
[backport] Handle missing values in dataframe with category dtype. (#7331) (#7413)
* Handle missing values in dataframe with category dtype. (#7331)

* Replace -1 in pandas initializer.
* Unify `IsValid` functor.
* Mimic pandas data handling in cuDF glue code.
* Check invalid categories.
* Fix DDM sketching.

* Fix pick error.
2021-11-10 21:24:46 +08:00
Jiaming Yuan
11f8b5cfcd
[backport] Support building with CTK11.5. (#7379) (#7411)
* Support building with CTK11.5.

* Require system cub installation for CTK11.4+.
* Check thrust version for segmented sort.
2021-11-10 19:23:29 +08:00
Jiaming Yuan
e7ac2486eb
[backport] [R] Fix global feature importance and predict with 1 sample. (#7394) (#7397)
* [R] Fix global feature importance.

* Add implementation for tree index.  The parameter is not documented in C API since we
should work on porting the model slicing to R instead of supporting more use of tree
index.

* Fix the difference between "gain" and "total_gain".

* debug.

* Fix prediction.
2021-11-06 00:07:36 +08:00
Jiaming Yuan
a3d195e73e
Handle OMP_THREAD_LIMIT. (#7390) (#7391) 2021-11-03 20:25:51 +08:00
Jiaming Yuan
fab3c05ced
Move macos test to github action. (#7382) (#7392)
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2021-11-03 18:39:47 +08:00
Jiaming Yuan
584b45a9cc
Release 1.5.0. (#7317) v1.5.0 2021-10-15 12:21:04 +08:00
Jiaming Yuan
30c1b5c54c
[backport] Fix prediction with cat data in sklearn interface. (#7306) (#7312)
* Specify DMatrix parameter for pre-processing dataframe.
* Add document about the behaviour of prediction.
2021-10-12 18:49:57 +08:00
Jiaming Yuan
36e247aca4
Fix weighted samples in multi-class AUC. (#7300) (#7305) 2021-10-11 18:00:36 +08:00
Jiaming Yuan
c4aff733bb
[backport] Fix cv verbose_eval (#7291) (#7296) 2021-10-08 14:24:27 +08:00
Jiaming Yuan
cdbfd21d31
[backport] Fix gamma neg log likelihood. (#7275) (#7285) 2021-10-05 23:01:11 +08:00
Jiaming Yuan
508a0b0dbd
[backport] [R] Fix document for nthread. (#7263) (#7269) 2021-09-28 14:41:32 +08:00
Jiaming Yuan
e04e773f9f
Add RC1 tag for building packages. (#7261) 2021-09-28 11:50:18 +08:00
Jiaming Yuan
1debabb321
Change version to 1.5.0. (#7258) v1.5.0rc1 2021-09-26 13:27:54 +08:00
Jiaming Yuan
d8a549e6ac
Avoid thread block with sparse data. (#7255) 2021-09-25 13:11:34 +08:00
Jiaming Yuan
ca17f8a5fc
Dispatch thrust versions and upgrade rmm. (#7254)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-09-25 03:43:23 +08:00
Jiaming Yuan
fbd58bf190
[jvm-packages] Create demo and test for xgboost4j early stopping. (#7252) 2021-09-25 03:29:27 +08:00
Bobby Wang
0ee11dac77
[jvm-packages][xgboost4j-gpu] Support GPU dataframe and DeviceQuantileDMatrix (#7195)
Following classes are added to support dataframe in java binding:

- `Column` is an abstract type for a single column in tabular data.
- `ColumnBatch` is an abstract type for dataframe.

- `CuDFColumn` is an implementaiton of `Column` that consume cuDF column
- `CudfColumnBatch` is an implementation of `ColumnBatch` that consumes cuDF dataframe.

- `DeviceQuantileDMatrix` is the interface for quantized data.

The Java implementation mimics the Python interface and uses `__cuda_array_interface__` protocol for memory indexing.  One difference is on JVM package, the data batch is staged on the host as java iterators cannot be reset.

Co-authored-by: jiamingy <jm.yuan@outlook.com>
2021-09-24 14:25:00 +08:00
Philip Hyunsu Cho
d27a427dc5
[CI] Rotate access keys for uploading MacOS artifacts from Travis CI (#7253) 2021-09-24 10:44:00 +08:00
ShvetsKS
475fd1abec
Reduced span overheads in objective function calculate (#7206)
Co-authored-by: fis <jm.yuan@outlook.com>
2021-09-23 04:43:59 +08:00
Jiaming Yuan
9472be7d77
Fix initialization from pandas series. (#7243) 2021-09-23 04:43:25 +08:00
david-cortes
4f93e5586a
Improve wording for warning (#7248)
This warning sounds  a bit ungrammatical. Additionally, the second part of the warning is not clear. This PR changes the wording to make it clearer.
2021-09-21 10:48:11 +08:00
Jiaming Yuan
18bd16341a
Update Python intro. [skip ci] (#7235)
* Fix the link to demo.
* Stop recommending text file inputs.
* Brief mention to scikit-learn interface.
* Fix indent warning in tree method doc.
2021-09-21 02:47:09 +00:00
david-cortes
61a619b5c3
[R] Avoid symbol naming conflicts with other packages (#7245)
* don't register all R symbols

* typo
2021-09-19 11:17:08 -07:00
Jiaming Yuan
e48e05e6e2
Add typehint to rabit module. (#7240) 2021-09-17 18:31:02 +08:00
Jiaming Yuan
c735c17f33
Disable callback and ES on random forest. (#7236) 2021-09-17 18:21:17 +08:00
Jiaming Yuan
c311a8c1d8
Enable compiling with system cub. (#7232)
- Tested with all CUDA 11.x.
- Workaround cub scan by using discard iterator in AUC.
- Limit the size of Argsort when compiled with CUDA cub.
2021-09-17 14:28:18 +08:00
Jiaming Yuan
b18f5f61b0
Fix pylint (#7241) 2021-09-17 11:50:36 +08:00
Jiaming Yuan
38a23f66a8
Fix typo in release script. [skip ci] (#7238) 2021-09-17 11:14:05 +08:00
Jiaming Yuan
8ad7e8eeb0
[doc] Fix typo. [skip ci] (#7226) 2021-09-17 11:13:49 +08:00
Jiaming Yuan
22d56cebf1
Encode pandas categorical data automatically. (#7231) 2021-09-17 11:09:55 +08:00
Jiaming Yuan
32e0858501
Fix travis. (#7237) 2021-09-17 10:06:23 +08:00
Jiaming Yuan
31c1e13f90
Categorical data support in CPU sketching. (#7221) 2021-09-17 04:37:09 +08:00
Jiaming Yuan
9f63d6fead
[jvm-packages] Deprecate constructors with implicit missing value. (#7225) 2021-09-17 04:35:04 +08:00
Jiaming Yuan
0ed979b096
Support more input types for categorical data. (#7220)
* Support more input types for categorical data.

* Shorten the type name from "categorical" to "c".
* Tests for np/cp array and scipy csr/csc/coo.
* Specify the type for feature info.
2021-09-16 20:39:30 +08:00
Jiaming Yuan
2942dc68e4
Fix mixed types in GPU sketching. (#7228) 2021-09-16 00:10:25 +08:00
Jiaming Yuan
037dd0820d
Implement __sklearn_is_fitted__. (#7230) 2021-09-15 19:09:04 +08:00
Jiaming Yuan
d997c967d5
Demo for experimental categorical data support. (#7213) 2021-09-15 08:20:12 +08:00
Jiaming Yuan
3515931305
Initial support for external memory in gradient index. (#7183)
* Add hessian to batch param in preparation of new approx impl.
* Extract a push method for gradient index matrix.
* Use span instead of vector ref for hessian in sketching.
* Create a binary format for gradient index.
2021-09-13 12:40:56 +08:00