Commit Graph

930 Commits

Author SHA1 Message Date
Jiaming Yuan
55ee272ea8 Extend array interface to handle ndarray. (#7434)
* Extend array interface to handle ndarray.

The `ArrayInterface` class is extended to support multi-dim array inputs. Previously this
class handles only 2-dim (vector is also matrix).  This PR specifies the expected
dimension at compile-time and the array interface can perform various checks automatically
for input data. Also, adapters like CSR are more rigorous about their input.  Lastly, row
vector and column vector are handled without intervention from the caller.
2021-11-16 09:52:15 +08:00
Jiaming Yuan
a7057fa64c Implement typed storage for tensor. (#7429)
* Add `Tensor` class.
* Add elementwise kernel for CPU and GPU.
* Add unravel index.
* Move some computation to compile time.
2021-11-14 18:53:13 +08:00
Jiaming Yuan
8cc75f1576 Cleanup Python tests. (#7426) 2021-11-14 15:47:05 +08:00
Jiaming Yuan
46726ec176 Expose build info (#7399) 2021-11-12 18:22:46 +08:00
Jiaming Yuan
937fa282b5 Extract string view. (#7416)
* Add equality operators.
* Return a view in substr.
* Add proper iterator types.
2021-11-12 18:22:30 +08:00
Jiaming Yuan
ca6f980932 Check number of trees in inplace predict. (#7409) 2021-11-12 18:20:23 +08:00
Jiaming Yuan
d7d1b6e3a6 CPU evaluation for cat data. (#7393)
* Implementation for one hot based.
* Implementation for partition based. (LightGBM)
2021-11-06 14:41:35 +08:00
Jiaming Yuan
6ede12412c Update dmlc-core and use data iter for GPU sampling tests. (#7398)
* Update dmlc-core.
* New parquet parser in dmlc-core.
* Use data iter for GPU sampling tests.
2021-11-06 05:12:49 +08:00
Jiaming Yuan
c968217ca8 [R] Fix global feature importance and predict with 1 sample. (#7394)
* [R] Fix global feature importance.

* Add implementation for tree index.  The parameter is not documented in C API since we
should work on porting the model slicing to R instead of supporting more use of tree
index.

* Fix the difference between "gain" and "total_gain".

* debug.

* Fix prediction.
2021-11-05 10:07:00 +08:00
Jiaming Yuan
b06040b6d0 Implement a general array view. (#7365)
* Replace existing matrix and vector view.

This is to prepare for handling higher dimension data and prediction when we support multi-target models.
2021-11-05 04:16:11 +08:00
Jiaming Yuan
4100827971 Pass infomation about objective to tree methods. (#7385)
* Define the `ObjInfo` and pass it down to every tree updater.
2021-11-04 01:52:44 +08:00
Jiaming Yuan
ccdabe4512 Support building gradient index with cat data. (#7371) 2021-11-03 22:37:37 +08:00
Jiaming Yuan
57a4b4ff64 Handle OMP_THREAD_LIMIT. (#7390) 2021-11-03 15:44:38 +08:00
Jiaming Yuan
a55d43ccfd Add test for invalid categorical data values. (#7380)
* Add test for invalid categorical data values.

* Add check during sketching.
2021-11-02 18:00:52 +08:00
Jiaming Yuan
154b15060e Move callbacks from fit to __init__. (#7375) 2021-11-02 17:51:42 +08:00
Jiaming Yuan
a13321148a Support multi-class with base margin. (#7381)
This is already partially supported but never properly tested. So the only possible way to use it is calling `numpy.ndarray.flatten` with `base_margin` before passing it into XGBoost. This PR adds proper support
for most of the data types along with tests.
2021-11-02 13:38:00 +08:00
Jiaming Yuan
6295dc3b67 Fix span reverse iterator. (#7387)
* Fix span reverse iterator.

* Disable `rbegin` on device code to avoid calling host function.
* Add `trbegin` and friends.
2021-11-02 13:35:59 +08:00
Jiaming Yuan
0f7a9b42f1 Use double precision in metric calculation. (#7364) 2021-11-02 12:00:32 +08:00
Jiaming Yuan
239dbb3c0a Move macos test to github action. (#7382)
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2021-10-30 14:40:32 +08:00
Jiaming Yuan
c6769488b3 Typehint for subset of core API. (#7348) 2021-10-28 20:47:04 +08:00
Jiaming Yuan
45aef75cca Move skl eval_metric and early_stopping rounds to model params. (#6751)
A new parameter `custom_metric` is added to `train` and `cv` to distinguish the behaviour from the old `feval`.  And `feval` is deprecated.  The new `custom_metric` receives transformed prediction when the built-in objective is used.  This enables XGBoost to use cost functions from other libraries like scikit-learn directly without going through the definition of the link function.

`eval_metric` and `early_stopping_rounds` in sklearn interface are moved from `fit` to `__init__` and is now saved as part of the scikit-learn model.  The old ones in `fit` function are now deprecated. The new `eval_metric` in `__init__` has the same new behaviour as `custom_metric`.

Added more detailed documents for the behaviour of custom objective and metric.
2021-10-28 17:20:20 +08:00
Jiaming Yuan
3c4aa9b2ea [breaking] Remove label encoder deprecated in 1.3. (#7357) 2021-10-28 13:24:29 +08:00
Jiaming Yuan
ac9bfaa4f2 Handle missing values in dataframe with category dtype. (#7331)
* Replace -1 in pandas initializer.
* Unify `IsValid` functor.
* Mimic pandas data handling in cuDF glue code.
* Check invalid categories.
* Fix DDM sketching.
2021-10-28 03:33:54 +08:00
Jiaming Yuan
2eee87423c Remove old custom objective demo. (#7369)
We have 2 new custom objective demos covering both regression and classification with
accompanying tutorials in documents.
2021-10-27 16:31:48 +08:00
Jiaming Yuan
d4349426d8 Re-implement PR-AUC. (#7297)
* Support binary/multi-class classification, ranking.
* Add documents.
* Handle missing data.
2021-10-26 13:07:50 +08:00
Jiaming Yuan
d1f00fb0b7 Stricter validation for group. (#7345) 2021-10-21 12:13:33 +08:00
Jiaming Yuan
8d7c6366d7 Accept histogram cut instead gradient index in evaluation. (#7336) 2021-10-20 18:04:46 +08:00
Jiaming Yuan
f999897615 [dask] Use nthread in DMatrix construction. (#7337)
This is consistent with the thread overriding behavior.
2021-10-20 15:16:40 +08:00
Jiaming Yuan
3b0b74fa94 [doc] Use RTD theme. (#7346) 2021-10-19 23:49:19 -07:00
Jiaming Yuan
f53da412aa Add typehint to tracker. (#7338) 2021-10-20 12:49:36 +08:00
Jiaming Yuan
fb1a9e6bc5 Avoid omp reduction in coordinate descent and aft metrics. (#7316)
Aside from the omp issue, parameter configuration for aft metric is simplified.
2021-10-17 15:55:49 +08:00
Jiaming Yuan
f56e2e9a66 Support categorical data with pandas Dataframe in inplace prediction (#7322) 2021-10-17 14:32:06 +08:00
Jiaming Yuan
8e619010d0 Extract CPUExpandEntry and HistParam. (#7321)
* Remove kRootNid.
* Check for empty hessian.
2021-10-17 14:22:25 +08:00
Jiaming Yuan
4ddf8d001c Deterministic result for element-wise/mclass metrics. (#7303)
Remove openmp reduction.
2021-10-13 14:22:40 +08:00
Jiaming Yuan
130df8cdda Add tests for tree grow policy. (#7302) 2021-10-12 15:04:06 +08:00
Jiaming Yuan
5b17bb0031 Fix prediction with cat data in sklearn interface. (#7306)
* Specify DMatrix parameter for pre-processing dataframe.
* Add document about the behaviour of prediction.
2021-10-12 14:31:12 +08:00
Jiaming Yuan
298af6f409 Fix weighted samples in multi-class AUC. (#7300) 2021-10-11 15:12:29 +08:00
Jiaming Yuan
69d3b1b8b4 Remove old callback deprecated in 1.3. (#7280) 2021-10-08 17:24:59 +08:00
Jiaming Yuan
578de9f762 Fix cv verbose_eval (#7291) 2021-10-08 12:28:38 +08:00
Jiaming Yuan
d8cb395380 Fix gamma neg log likelihood. (#7275) 2021-10-05 16:57:08 +08:00
Jiaming Yuan
d8a549e6ac Avoid thread block with sparse data. (#7255) 2021-09-25 13:11:34 +08:00
Jiaming Yuan
ca17f8a5fc Dispatch thrust versions and upgrade rmm. (#7254)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-09-25 03:43:23 +08:00
Bobby Wang
0ee11dac77 [jvm-packages][xgboost4j-gpu] Support GPU dataframe and DeviceQuantileDMatrix (#7195)
Following classes are added to support dataframe in java binding:

- `Column` is an abstract type for a single column in tabular data.
- `ColumnBatch` is an abstract type for dataframe.

- `CuDFColumn` is an implementaiton of `Column` that consume cuDF column
- `CudfColumnBatch` is an implementation of `ColumnBatch` that consumes cuDF dataframe.

- `DeviceQuantileDMatrix` is the interface for quantized data.

The Java implementation mimics the Python interface and uses `__cuda_array_interface__` protocol for memory indexing.  One difference is on JVM package, the data batch is staged on the host as java iterators cannot be reset.

Co-authored-by: jiamingy <jm.yuan@outlook.com>
2021-09-24 14:25:00 +08:00
Jiaming Yuan
c735c17f33 Disable callback and ES on random forest. (#7236) 2021-09-17 18:21:17 +08:00
Jiaming Yuan
22d56cebf1 Encode pandas categorical data automatically. (#7231) 2021-09-17 11:09:55 +08:00
Jiaming Yuan
32e0858501 Fix travis. (#7237) 2021-09-17 10:06:23 +08:00
Jiaming Yuan
31c1e13f90 Categorical data support in CPU sketching. (#7221) 2021-09-17 04:37:09 +08:00
Jiaming Yuan
0ed979b096 Support more input types for categorical data. (#7220)
* Support more input types for categorical data.

* Shorten the type name from "categorical" to "c".
* Tests for np/cp array and scipy csr/csc/coo.
* Specify the type for feature info.
2021-09-16 20:39:30 +08:00
Jiaming Yuan
2942dc68e4 Fix mixed types in GPU sketching. (#7228) 2021-09-16 00:10:25 +08:00
Jiaming Yuan
037dd0820d Implement __sklearn_is_fitted__. (#7230) 2021-09-15 19:09:04 +08:00