xgboost

Author	SHA1	Message	Date
Jiaming Yuan	8cc75f1576	Cleanup Python tests. (#7426 )	2021-11-14 15:47:05 +08:00
Jiaming Yuan	46726ec176	Expose build info (#7399 )	2021-11-12 18:22:46 +08:00
Jiaming Yuan	937fa282b5	Extract string view. (#7416 ) * Add equality operators. * Return a view in substr. * Add proper iterator types.	2021-11-12 18:22:30 +08:00
Jiaming Yuan	ca6f980932	Check number of trees in inplace predict. (#7409 )	2021-11-12 18:20:23 +08:00
Jiaming Yuan	d7d1b6e3a6	CPU evaluation for cat data. (#7393 ) * Implementation for one hot based. * Implementation for partition based. (LightGBM)	2021-11-06 14:41:35 +08:00
Jiaming Yuan	6ede12412c	Update dmlc-core and use data iter for GPU sampling tests. (#7398 ) * Update dmlc-core. * New parquet parser in dmlc-core. * Use data iter for GPU sampling tests.	2021-11-06 05:12:49 +08:00
Jiaming Yuan	c968217ca8	[R] Fix global feature importance and predict with 1 sample. (#7394 ) * [R] Fix global feature importance. * Add implementation for tree index. The parameter is not documented in C API since we should work on porting the model slicing to R instead of supporting more use of tree index. * Fix the difference between "gain" and "total_gain". * debug. * Fix prediction.	2021-11-05 10:07:00 +08:00
Jiaming Yuan	b06040b6d0	Implement a general array view. (#7365 ) * Replace existing matrix and vector view. This is to prepare for handling higher dimension data and prediction when we support multi-target models.	2021-11-05 04:16:11 +08:00
Jiaming Yuan	4100827971	Pass infomation about objective to tree methods. (#7385 ) * Define the `ObjInfo` and pass it down to every tree updater.	2021-11-04 01:52:44 +08:00
Jiaming Yuan	ccdabe4512	Support building gradient index with cat data. (#7371 )	2021-11-03 22:37:37 +08:00
Jiaming Yuan	57a4b4ff64	Handle `OMP_THREAD_LIMIT`. (#7390 )	2021-11-03 15:44:38 +08:00
Jiaming Yuan	a55d43ccfd	Add test for invalid categorical data values. (#7380 ) * Add test for invalid categorical data values. * Add check during sketching.	2021-11-02 18:00:52 +08:00
Jiaming Yuan	154b15060e	Move callbacks from `fit` to `__init__`. (#7375 )	2021-11-02 17:51:42 +08:00
Jiaming Yuan	a13321148a	Support multi-class with base margin. (#7381 ) This is already partially supported but never properly tested. So the only possible way to use it is calling `numpy.ndarray.flatten` with `base_margin` before passing it into XGBoost. This PR adds proper support for most of the data types along with tests.	2021-11-02 13:38:00 +08:00
Jiaming Yuan	6295dc3b67	Fix span reverse iterator. (#7387 ) * Fix span reverse iterator. * Disable `rbegin` on device code to avoid calling host function. * Add `trbegin` and friends.	2021-11-02 13:35:59 +08:00
Jiaming Yuan	0f7a9b42f1	Use double precision in metric calculation. (#7364 )	2021-11-02 12:00:32 +08:00
Jiaming Yuan	239dbb3c0a	Move macos test to github action. (#7382 ) Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2021-10-30 14:40:32 +08:00
Jiaming Yuan	c6769488b3	Typehint for subset of core API. (#7348 )	2021-10-28 20:47:04 +08:00
Jiaming Yuan	45aef75cca	Move skl `eval_metric` and `early_stopping rounds` to model params. (#6751 ) A new parameter `custom_metric` is added to `train` and `cv` to distinguish the behaviour from the old `feval`. And `feval` is deprecated. The new `custom_metric` receives transformed prediction when the built-in objective is used. This enables XGBoost to use cost functions from other libraries like scikit-learn directly without going through the definition of the link function. `eval_metric` and `early_stopping_rounds` in sklearn interface are moved from `fit` to `__init__` and is now saved as part of the scikit-learn model. The old ones in `fit` function are now deprecated. The new `eval_metric` in `__init__` has the same new behaviour as `custom_metric`. Added more detailed documents for the behaviour of custom objective and metric.	2021-10-28 17:20:20 +08:00
Jiaming Yuan	3c4aa9b2ea	[breaking] Remove label encoder deprecated in 1.3. (#7357 )	2021-10-28 13:24:29 +08:00
Jiaming Yuan	ac9bfaa4f2	Handle missing values in dataframe with category dtype. (#7331 ) * Replace -1 in pandas initializer. * Unify `IsValid` functor. * Mimic pandas data handling in cuDF glue code. * Check invalid categories. * Fix DDM sketching.	2021-10-28 03:33:54 +08:00
Jiaming Yuan	2eee87423c	Remove old custom objective demo. (#7369 ) We have 2 new custom objective demos covering both regression and classification with accompanying tutorials in documents.	2021-10-27 16:31:48 +08:00
Jiaming Yuan	d4349426d8	Re-implement PR-AUC. (#7297 ) * Support binary/multi-class classification, ranking. * Add documents. * Handle missing data.	2021-10-26 13:07:50 +08:00
Jiaming Yuan	d1f00fb0b7	Stricter validation for group. (#7345 )	2021-10-21 12:13:33 +08:00
Jiaming Yuan	8d7c6366d7	Accept histogram cut instead gradient index in evaluation. (#7336 )	2021-10-20 18:04:46 +08:00
Jiaming Yuan	f999897615	[dask] Use nthread in DMatrix construction. (#7337 ) This is consistent with the thread overriding behavior.	2021-10-20 15:16:40 +08:00
Jiaming Yuan	3b0b74fa94	[doc] Use RTD theme. (#7346 )	2021-10-19 23:49:19 -07:00
Jiaming Yuan	f53da412aa	Add typehint to tracker. (#7338 )	2021-10-20 12:49:36 +08:00
Jiaming Yuan	fb1a9e6bc5	Avoid omp reduction in coordinate descent and aft metrics. (#7316 ) Aside from the omp issue, parameter configuration for aft metric is simplified.	2021-10-17 15:55:49 +08:00
Jiaming Yuan	f56e2e9a66	Support categorical data with pandas Dataframe in inplace prediction (#7322 )	2021-10-17 14:32:06 +08:00
Jiaming Yuan	8e619010d0	Extract CPUExpandEntry and HistParam. (#7321 ) * Remove kRootNid. * Check for empty hessian.	2021-10-17 14:22:25 +08:00
Jiaming Yuan	4ddf8d001c	Deterministic result for element-wise/mclass metrics. (#7303 ) Remove openmp reduction.	2021-10-13 14:22:40 +08:00
Jiaming Yuan	130df8cdda	Add tests for tree grow policy. (#7302 )	2021-10-12 15:04:06 +08:00
Jiaming Yuan	5b17bb0031	Fix prediction with cat data in sklearn interface. (#7306 ) * Specify DMatrix parameter for pre-processing dataframe. * Add document about the behaviour of prediction.	2021-10-12 14:31:12 +08:00
Jiaming Yuan	298af6f409	Fix weighted samples in multi-class AUC. (#7300 )	2021-10-11 15:12:29 +08:00
Jiaming Yuan	69d3b1b8b4	Remove old callback deprecated in 1.3. (#7280 )	2021-10-08 17:24:59 +08:00
Jiaming Yuan	578de9f762	Fix cv `verbose_eval` (#7291 )	2021-10-08 12:28:38 +08:00
Jiaming Yuan	d8cb395380	Fix gamma neg log likelihood. (#7275 )	2021-10-05 16:57:08 +08:00
Jiaming Yuan	d8a549e6ac	Avoid thread block with sparse data. (#7255 )	2021-09-25 13:11:34 +08:00
Jiaming Yuan	ca17f8a5fc	Dispatch thrust versions and upgrade rmm. (#7254 ) Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2021-09-25 03:43:23 +08:00
Bobby Wang	0ee11dac77	[jvm-packages][xgboost4j-gpu] Support GPU dataframe and `DeviceQuantileDMatrix` (#7195 ) Following classes are added to support dataframe in java binding: - `Column` is an abstract type for a single column in tabular data. - `ColumnBatch` is an abstract type for dataframe. - `CuDFColumn` is an implementaiton of `Column` that consume cuDF column - `CudfColumnBatch` is an implementation of `ColumnBatch` that consumes cuDF dataframe. - `DeviceQuantileDMatrix` is the interface for quantized data. The Java implementation mimics the Python interface and uses `__cuda_array_interface__` protocol for memory indexing. One difference is on JVM package, the data batch is staged on the host as java iterators cannot be reset. Co-authored-by: jiamingy <jm.yuan@outlook.com>	2021-09-24 14:25:00 +08:00
Jiaming Yuan	c735c17f33	Disable callback and ES on random forest. (#7236 )	2021-09-17 18:21:17 +08:00
Jiaming Yuan	22d56cebf1	Encode pandas categorical data automatically. (#7231 )	2021-09-17 11:09:55 +08:00
Jiaming Yuan	32e0858501	Fix travis. (#7237 )	2021-09-17 10:06:23 +08:00
Jiaming Yuan	31c1e13f90	Categorical data support in CPU sketching. (#7221 )	2021-09-17 04:37:09 +08:00
Jiaming Yuan	0ed979b096	Support more input types for categorical data. (#7220 ) * Support more input types for categorical data. * Shorten the type name from "categorical" to "c". * Tests for np/cp array and scipy csr/csc/coo. * Specify the type for feature info.	2021-09-16 20:39:30 +08:00
Jiaming Yuan	2942dc68e4	Fix mixed types in GPU sketching. (#7228 )	2021-09-16 00:10:25 +08:00
Jiaming Yuan	037dd0820d	Implement `__sklearn_is_fitted__`. (#7230 )	2021-09-15 19:09:04 +08:00
Jiaming Yuan	d997c967d5	Demo for experimental categorical data support. (#7213 )	2021-09-15 08:20:12 +08:00
Jiaming Yuan	3515931305	Initial support for external memory in gradient index. (#7183 ) * Add hessian to batch param in preparation of new approx impl. * Extract a push method for gradient index matrix. * Use span instead of vector ref for hessian in sketching. * Create a binary format for gradient index.	2021-09-13 12:40:56 +08:00

... 5 6 7 8 9 ...

1228 Commits