xgboost

Author	SHA1	Message	Date
Jiaming Yuan	d6ea5cc1ed	Cover approx tree method for categorical data tests. (#7569 ) * Add tree to df tests. * Add plotting tests. * Add histogram tests.	2022-01-16 11:31:40 +08:00
Jiaming Yuan	001503186c	Rewrite approx (#7214 ) This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing. The rewrite has many benefits: - Support for both `max_leaves` and `max_depth`. - Support for `grow_policy`. - Support for mono constraint. - Support for feature weights. - Support for easier bin configuration (`max_bin`). - Support for categorical data. - Faster performance for most of the datasets. (many times faster) - Support for prediction cache. - Significantly better performance for external memory. - Unites the code base between approx and hist.	2022-01-10 21:15:05 +08:00
Jiaming Yuan	0df2ae63c7	Fix num_boosted_rounds for linear model. (#7538 ) * Add note. * Fix n boosted rounds.	2022-01-05 03:29:33 +08:00
Ginko Balboa	29bfa94bb6	Fix external memory with gpu_hist and subsampling combination bug. (#7481 ) Instead of accessing data from the `original_page_`, access the data from the first page of the available batch. fix #7476 Co-authored-by: jiamingy <jm.yuan@outlook.com>	2021-12-24 11:15:35 +08:00
Jiaming Yuan	58a6723eb1	Initial support for multioutput regression. (#7514 ) * Add num target model parameter, which is configured from input labels. * Change elementwise metric and indexing for weights. * Add demo. * Add tests.	2021-12-18 09:28:38 +08:00
Jiaming Yuan	70b12d898a	[dask] Fix ddqdm with empty partition. (#7510 ) * Fix empty partition. * war.	2021-12-16 20:37:29 +08:00
Jiaming Yuan	a55d43ccfd	Add test for invalid categorical data values. (#7380 ) * Add test for invalid categorical data values. * Add check during sketching.	2021-11-02 18:00:52 +08:00
Jiaming Yuan	a13321148a	Support multi-class with base margin. (#7381 ) This is already partially supported but never properly tested. So the only possible way to use it is calling `numpy.ndarray.flatten` with `base_margin` before passing it into XGBoost. This PR adds proper support for most of the data types along with tests.	2021-11-02 13:38:00 +08:00
Jiaming Yuan	3c4aa9b2ea	[breaking] Remove label encoder deprecated in 1.3. (#7357 )	2021-10-28 13:24:29 +08:00
Jiaming Yuan	ac9bfaa4f2	Handle missing values in dataframe with category dtype. (#7331 ) * Replace -1 in pandas initializer. * Unify `IsValid` functor. * Mimic pandas data handling in cuDF glue code. * Check invalid categories. * Fix DDM sketching.	2021-10-28 03:33:54 +08:00
Jiaming Yuan	d4349426d8	Re-implement PR-AUC. (#7297 ) * Support binary/multi-class classification, ranking. * Add documents. * Handle missing data.	2021-10-26 13:07:50 +08:00
Jiaming Yuan	f999897615	[dask] Use nthread in DMatrix construction. (#7337 ) This is consistent with the thread overriding behavior.	2021-10-20 15:16:40 +08:00
Jiaming Yuan	f56e2e9a66	Support categorical data with pandas Dataframe in inplace prediction (#7322 )	2021-10-17 14:32:06 +08:00
Jiaming Yuan	5b17bb0031	Fix prediction with cat data in sklearn interface. (#7306 ) * Specify DMatrix parameter for pre-processing dataframe. * Add document about the behaviour of prediction.	2021-10-12 14:31:12 +08:00
Jiaming Yuan	298af6f409	Fix weighted samples in multi-class AUC. (#7300 )	2021-10-11 15:12:29 +08:00
Jiaming Yuan	69d3b1b8b4	Remove old callback deprecated in 1.3. (#7280 )	2021-10-08 17:24:59 +08:00
Jiaming Yuan	0ed979b096	Support more input types for categorical data. (#7220 ) * Support more input types for categorical data. * Shorten the type name from "categorical" to "c". * Tests for np/cp array and scipy csr/csc/coo. * Specify the type for feature info.	2021-09-16 20:39:30 +08:00
Jiaming Yuan	037dd0820d	Implement `__sklearn_is_fitted__`. (#7230 )	2021-09-15 19:09:04 +08:00
Jiaming Yuan	d997c967d5	Demo for experimental categorical data support. (#7213 )	2021-09-15 08:20:12 +08:00
Jiaming Yuan	3a4f51f39f	Avoid calling CUDA code on CPU for linear model. (#7154 )	2021-09-01 10:45:31 +08:00
Jiaming Yuan	7a1d67f9cb	[breaking] Use integer atomic for GPU histogram. (#7180 ) On GPU we use rouding factor to truncate the gradient for deterministic results. This PR changes the gradient representation to fixed point number with exponent aligned with rounding factor. [breaking] Drop non-deterministic histogram. Use fixed point for shared memory. This PR is to improve the performance of GPU Hist. Co-authored-by: Andy Adinets <aadinets@nvidia.com>	2021-08-28 05:17:05 +08:00
Jiaming Yuan	7bdedacb54	Document for `process_type`. (#7135 ) * Update document for prune and refresh. * Add demo.	2021-08-03 13:11:52 +08:00
Jiaming Yuan	e88ac9cc54	[dask] Extend tree stats tests. (#7128 ) * Add tests to GPU. * Assert cover in children sums up to the parent.	2021-07-27 12:22:13 +08:00
Jiaming Yuan	e6088366df	Export Python Interface for external memory. (#7070 ) * Add Python iterator interface. * Add tests. * Add demo. * Add documents. * Handle empty dataset.	2021-07-22 15:15:53 +08:00
Jiaming Yuan	a5d222fcdb	Handle categorical split in model histogram and dataframe. (#7065 ) * Error on get_split_value_histogram when feature is categorical * Add a category column to output dataframe	2021-07-02 13:10:36 +08:00
Jiaming Yuan	8fa32fdda2	Implement categorical data support for SHAP. (#7053 ) * Add CPU implementation. * Update GPUTreeSHAP. * Add GPU implementation by defining custom split condition.	2021-06-25 19:02:46 +08:00
Jiaming Yuan	1d4d345634	Tests for dask skl categorical data support. (#7054 )	2021-06-24 16:33:57 +08:00
Jiaming Yuan	29f8fd6fee	Support categorical split in tree model dump. (#7036 )	2021-06-18 16:46:20 +08:00
Jiaming Yuan	86715e4cd4	Support categorical data for dask functional interface and DQM. (#7043 ) * Support categorical data for dask functional interface and DQM. * Implement categorical data support for GPU GK-merge. * Add support for dask functional interface. * Add support for DQM. * Get newer cupy.	2021-06-18 13:06:52 +08:00
Jiaming Yuan	d9799b09d0	Categorical data support for cuDF. (#7042 ) * Add support in DMatrix. * Add support in DQM, except for iterator.	2021-06-17 13:54:33 +08:00
Jiaming Yuan	72f9daf9b6	Fix `gpu_id` with custom objective. (#7015 )	2021-06-09 14:51:17 +08:00
Jiaming Yuan	c4b9f4f622	Add `enable_categorical` to sklearn. (#7011 )	2021-06-04 02:29:14 +08:00
Jiaming Yuan	ee4f51a631	Support for all primitive types from array. (#7003 ) * Change C API name. * Test for all primitive types from array. * Add native support for CPU 128 float. * Convert boolean and float16 in Python. * Fix dask version for now.	2021-06-01 08:34:48 +08:00
Jiaming Yuan	44cc9c04ea	Fix multiclass auc with empty dataset. (#6947 )	2021-05-12 15:01:14 +08:00
Jiaming Yuan	05ac415780	[dask] Set dataframe index in predict. (#6944 )	2021-05-12 13:24:21 +08:00
Jiaming Yuan	bec2b4f094	Revert "Use CPU input for test_boost_from_prediction. (#6818 )" (#6858 ) This reverts commit 74f3a2f4b5c2654af90d1477fd543b5d97280fbe.	2021-04-20 14:54:02 +08:00
Jiaming Yuan	dee5ef2dfd	Typehint for Sklearn. (#6799 )	2021-04-14 06:55:21 +08:00
Jiaming Yuan	74f3a2f4b5	Use CPU input for test_boost_from_prediction. (#6818 )	2021-04-02 00:11:35 +08:00
Jiaming Yuan	47b62480af	More general predict proba. (#6817 ) * Use `output_margin` for `softmax`. * Add test for dask binary cls. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2021-04-01 19:52:12 +08:00
Jiaming Yuan	79b8b560d2	Optimize dart inplace predict perf. (#6804 )	2021-03-31 15:20:54 +08:00
Jiaming Yuan	a59c7323b4	Fix inplace predict missing value. (#6787 )	2021-03-27 05:36:10 +08:00
Jiaming Yuan	a7083d3c13	Fix dart inplace prediction with GPU input. (#6777 ) * Fix dart inplace predict with data on GPU, which might trigger a fatal check for device access right. * Avoid copying data whenever possible.	2021-03-25 12:00:32 +08:00
Jiaming Yuan	bcc0277338	Re-implement ROC-AUC. (#6747 ) * Re-implement ROC-AUC. * Binary * MultiClass * LTR * Add documents. This PR resolves a few issues: - Define a value when the dataset is invalid, which can happen if there's an empty dataset, or when the dataset contains only positive or negative values. - Define ROC-AUC for multi-class classification. - Define weighted average value for distributed setting. - A correct implementation for learning to rank task. Previous implementation is just binary classification with averaging across groups, which doesn't measure ordered learning to rank.	2021-03-20 16:52:40 +08:00
Jiaming Yuan	4ee8340e79	Support column major array. (#6765 )	2021-03-20 05:19:46 +08:00
Jiaming Yuan	4f75f514ce	Fix GPU RF (#6755 ) * Fix sampling.	2021-03-17 06:23:35 +08:00
Jiaming Yuan	325bc93e16	[dask] Use `distributed.MultiLock` (#6743 ) * [dask] Use `distributed.MultiLock` This enables training multiple models in parallel. * Conditionally import `MultiLock`. * Use async train directly in scikit learn interface. * Use `worker_client` when available.	2021-03-16 14:19:41 +08:00
Jiaming Yuan	1335db6113	[dask] Improve documents. (#6687 ) * Add tag for versions. * use autoclass in sphinx build. Made some class methods to be private to avoid exporting documents.	2021-02-09 09:20:58 +08:00
Jiaming Yuan	4656b09d5d	[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668 ) * Add a new API function for predicting on `DMatrix`. This function aligns with rest of the `XGBoosterPredictFrom` functions on semantic of function arguments. Purge `ntree_limit` from libxgboost, use iteration instead. * [dask] Use `inplace_predict` by default for dask sklearn models. * [dask] Run prediction shape inference on worker instead of client. The breaking change is in the Python sklearn `apply` function, I made it to be consistent with other prediction functions where `best_iteration` is used by default.	2021-02-08 18:26:32 +08:00
Jiaming Yuan	411592a347	Enhance inplace prediction. (#6653 ) * Accept array interface for csr and array. * Accept an optional proxy dmatrix for metainfo. This constructs an explicit `_ProxyDMatrix` type in Python. * Remove unused doc. * Add strict output.	2021-02-02 11:41:46 +08:00
Jiaming Yuan	bc08e0c9d1	Remove `experimental_json_serialization` from tests. (#6640 )	2021-01-27 17:44:49 +08:00

1 2 3 4

159 Commits