xgboost

Author	SHA1	Message	Date
Philip Hyunsu Cho	0c67685e43	[CI] Add a helper script to aid Maven release (#7470 ) * [CI] Add a helper script to aid Maven release * Move script to dev/ [skip ci] * Update command [skip ci]	2021-11-23 00:11:07 -08:00
Harvey	0552ca8021	Fix typo (#7469 )	2021-11-23 08:58:45 +08:00
Jiaming Yuan	176110a22d	Support external memory in CPU histogram building. (#7372 )	2021-11-23 01:13:33 +08:00
Jiaming Yuan	d33854af1b	[Breaking] Accept multi-dim meta info. (#7405 ) This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.	2021-11-18 23:02:54 +08:00
Jiaming Yuan	9fb4338964	Add test for eta and mitigate float error. (#7446 ) * Add eta test. * Don't skip test.	2021-11-18 20:42:48 +08:00
Bobby Wang	7cfb310eb4	Rework transform (#7440 ) extract the common part of transform code from XGBoostClassifier and XGBoostRegressor	2021-11-18 15:48:57 +08:00
Philip Hyunsu Cho	2adf222fb2	[CI] CI cost saving (#7407 ) * [CI] Drop CUDA 10.1; Require 11.0 * Change NCCL version * Use CUDA 10.1 for clang-tidy, for now * Remove JDK 11 and 12 * Fix NCCL version * Don't require 11.0 just yet, until clang-tidy is fixed * Skip MultiClassesSerializationTest.GpuHist	2021-11-17 21:02:20 -08:00
Jiaming Yuan	b0015fda96	Fix R CRAN failures. (#7404 ) * Remove hist builder dtor. * Initialize values. * Tolerance. * Remove the use of nthread in col maker.	2021-11-16 10:51:12 +08:00
Jiaming Yuan	55ee272ea8	Extend array interface to handle ndarray. (#7434 ) * Extend array interface to handle ndarray. The `ArrayInterface` class is extended to support multi-dim array inputs. Previously this class handles only 2-dim (vector is also matrix). This PR specifies the expected dimension at compile-time and the array interface can perform various checks automatically for input data. Also, adapters like CSR are more rigorous about their input. Lastly, row vector and column vector are handled without intervention from the caller.	2021-11-16 09:52:15 +08:00
Jiaming Yuan	e27f543deb	Set use_logger in tracker to false. (#7438 )	2021-11-16 05:12:42 +08:00
Jiaming Yuan	d4274bc556	Fix typo. (#7433 )	2021-11-15 01:28:11 +08:00
Jiaming Yuan	a7057fa64c	Implement typed storage for tensor. (#7429 ) * Add `Tensor` class. * Add elementwise kernel for CPU and GPU. * Add unravel index. * Move some computation to compile time.	2021-11-14 18:53:13 +08:00
Kian Meng Ang	d27a11ff87	Fix typos in python package (#7432 )	2021-11-14 17:20:19 +08:00
Jiaming Yuan	8cc75f1576	Cleanup Python tests. (#7426 )	2021-11-14 15:47:05 +08:00
Jiaming Yuan	38ca96c9fc	[CI] Install igraph as binary. (#7417 )	2021-11-12 19:04:28 +08:00
Jiaming Yuan	46726ec176	Expose build info (#7399 )	2021-11-12 18:22:46 +08:00
Jiaming Yuan	937fa282b5	Extract string view. (#7416 ) * Add equality operators. * Return a view in substr. * Add proper iterator types.	2021-11-12 18:22:30 +08:00
Jiaming Yuan	ca6f980932	Check number of trees in inplace predict. (#7409 )	2021-11-12 18:20:23 +08:00
Jiaming Yuan	97d7582457	Delay breaking changes to 1.6. (#7420 ) The patch is too big to be backported.	2021-11-12 16:46:03 +08:00
Bobby Wang	cb685607b2	[jvm-packages] Rework the train pipeline (#7401 ) 1. Add PreXGBoost to build RDD[Watches] from Dataset 2. Feed RDD[Watches] built from PreXGBoost to XGBoost to train	2021-11-10 17:51:38 +08:00
Jiaming Yuan	8df0a252b7	[doc] Update document for GPU. [skip ci] (#7403 ) * Remove outdated workaround and description.	2021-11-09 02:05:55 +08:00
Jiaming Yuan	d7d1b6e3a6	CPU evaluation for cat data. (#7393 ) * Implementation for one hot based. * Implementation for partition based. (LightGBM)	2021-11-06 14:41:35 +08:00
Jiaming Yuan	6ede12412c	Update dmlc-core and use data iter for GPU sampling tests. (#7398 ) * Update dmlc-core. * New parquet parser in dmlc-core. * Use data iter for GPU sampling tests.	2021-11-06 05:12:49 +08:00
Jiaming Yuan	c968217ca8	[R] Fix global feature importance and predict with 1 sample. (#7394 ) * [R] Fix global feature importance. * Add implementation for tree index. The parameter is not documented in C API since we should work on porting the model slicing to R instead of supporting more use of tree index. * Fix the difference between "gain" and "total_gain". * debug. * Fix prediction.	2021-11-05 10:07:00 +08:00
Jiaming Yuan	48aff0eabd	[doc][jvm-packages] Update information about Python tracker. [skip ci] (#7396 )	2021-11-05 05:55:13 +08:00
Jiaming Yuan	b06040b6d0	Implement a general array view. (#7365 ) * Replace existing matrix and vector view. This is to prepare for handling higher dimension data and prediction when we support multi-target models.	2021-11-05 04:16:11 +08:00
Jiaming Yuan	232144ca09	Add note about CRAN release [skip ci] (#7395 )	2021-11-05 00:34:14 +08:00
Jiaming Yuan	4100827971	Pass infomation about objective to tree methods. (#7385 ) * Define the `ObjInfo` and pass it down to every tree updater.	2021-11-04 01:52:44 +08:00
Jiaming Yuan	ccdabe4512	Support building gradient index with cat data. (#7371 )	2021-11-03 22:37:37 +08:00
Jiaming Yuan	57a4b4ff64	Handle `OMP_THREAD_LIMIT`. (#7390 )	2021-11-03 15:44:38 +08:00
Jiaming Yuan	e6ab594e14	Change shebang used in CLI demo. (#7389 ) Change from system Python to environment python3. For Ubuntu 20.04, only `python3` is available and there's no `python`. So at least `python3` is consistent with Python virtual env, Ubuntu and anaconda.	2021-11-02 22:11:19 +08:00
Jiaming Yuan	a55d43ccfd	Add test for invalid categorical data values. (#7380 ) * Add test for invalid categorical data values. * Add check during sketching.	2021-11-02 18:00:52 +08:00
Jiaming Yuan	c74df31bf9	Cleanup the `train` function. (#7377 ) * Move attribute setter to callback. * Remove the internal train function. * Remove unnecessary initialization.	2021-11-02 18:00:26 +08:00
Jiaming Yuan	154b15060e	Move callbacks from `fit` to `__init__`. (#7375 )	2021-11-02 17:51:42 +08:00
Jiaming Yuan	32e673d8c4	Support building with CTK11.5. (#7379 ) * Support building with CTK11.5. * Require system cub installation for CTK11.4+. * Check thrust version for segmented sort.	2021-11-02 16:22:26 +08:00
Jiaming Yuan	a13321148a	Support multi-class with base margin. (#7381 ) This is already partially supported but never properly tested. So the only possible way to use it is calling `numpy.ndarray.flatten` with `base_margin` before passing it into XGBoost. This PR adds proper support for most of the data types along with tests.	2021-11-02 13:38:00 +08:00
Jiaming Yuan	6295dc3b67	Fix span reverse iterator. (#7387 ) * Fix span reverse iterator. * Disable `rbegin` on device code to avoid calling host function. * Add `trbegin` and friends.	2021-11-02 13:35:59 +08:00
Jiaming Yuan	8211e5f341	Add clang-format config. (#7383 ) Generated using `clang-format -style=google -dump-config > .clang-format`, with column width changed from 80 to 100 to be consistent with existing cpplint check.	2021-11-02 13:34:38 +08:00
Jiaming Yuan	0f7a9b42f1	Use double precision in metric calculation. (#7364 )	2021-11-02 12:00:32 +08:00
Jiaming Yuan	239dbb3c0a	Move macos test to github action. (#7382 ) Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2021-10-30 14:40:32 +08:00
Bobby Wang	b81ebbef62	[jvm-packages] Fix json4s binary compatibility issue (#7376 ) Spark 3.2 depends on 3.7.0-M11 which has changed some implicited functions' signatures. And it will result the xgboost4j built against spark 3.0/3.1 failed when saving the model.	2021-10-30 03:20:57 +08:00
Jiaming Yuan	c6769488b3	Typehint for subset of core API. (#7348 )	2021-10-28 20:47:04 +08:00
Jiaming Yuan	45aef75cca	Move skl `eval_metric` and `early_stopping rounds` to model params. (#6751 ) A new parameter `custom_metric` is added to `train` and `cv` to distinguish the behaviour from the old `feval`. And `feval` is deprecated. The new `custom_metric` receives transformed prediction when the built-in objective is used. This enables XGBoost to use cost functions from other libraries like scikit-learn directly without going through the definition of the link function. `eval_metric` and `early_stopping_rounds` in sklearn interface are moved from `fit` to `__init__` and is now saved as part of the scikit-learn model. The old ones in `fit` function are now deprecated. The new `eval_metric` in `__init__` has the same new behaviour as `custom_metric`. Added more detailed documents for the behaviour of custom objective and metric.	2021-10-28 17:20:20 +08:00
Jiaming Yuan	6b074add66	Update setup.py. (#7360 ) * Add new classifiers. * Typehint.	2021-10-28 14:58:31 +08:00
Jiaming Yuan	3c4aa9b2ea	[breaking] Remove label encoder deprecated in 1.3. (#7357 )	2021-10-28 13:24:29 +08:00
Jiaming Yuan	d05754f558	Avoid OMP reduction in AUC. (#7362 )	2021-10-28 05:03:52 +08:00
Jiaming Yuan	ac9bfaa4f2	Handle missing values in dataframe with category dtype. (#7331 ) * Replace -1 in pandas initializer. * Unify `IsValid` functor. * Mimic pandas data handling in cuDF glue code. * Check invalid categories. * Fix DDM sketching.	2021-10-28 03:33:54 +08:00
Jiaming Yuan	2eee87423c	Remove old custom objective demo. (#7369 ) We have 2 new custom objective demos covering both regression and classification with accompanying tutorials in documents.	2021-10-27 16:31:48 +08:00
Jiaming Yuan	b9414b6477	Update GPU doc for PR-AUC. [skip ci] (#7368 )	2021-10-27 16:31:07 +08:00
Jiaming Yuan	d4349426d8	Re-implement PR-AUC. (#7297 ) * Support binary/multi-class classification, ranking. * Add documents. * Handle missing data.	2021-10-26 13:07:50 +08:00

1 2 3 4 5 ...

5563 Commits