xgboost

Author	SHA1	Message	Date
Jiaming Yuan	663136aa08	Implement feature score for linear model. (#7048 ) * Add feature score support for linear model. * Port R interface to the new implementation. * Add linear model support in Python. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2021-06-25 14:34:02 +08:00
Jiaming Yuan	7dd29ffd47	Implement feature score in GBTree. (#7041 ) * Categorical data support. * Eliminate text parsing during feature score computation.	2021-06-18 11:53:16 +08:00
Jiaming Yuan	b1fdb220f4	Remove deprecated `n_gpus` parameter. (#6821 )	2021-04-02 03:02:32 +08:00
Jiaming Yuan	f6fe15d11f	Improve parameter validation (#6769 ) * Add quotes to unused parameters. * Check for whitespace.	2021-03-20 01:56:55 +08:00
Jiaming Yuan	9da2287ab8	[breaking] Save booster feature info in JSON, remove feature name generation. (#6605 ) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package.	2021-02-25 18:54:16 +08:00
Jiaming Yuan	4656b09d5d	[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668 ) * Add a new API function for predicting on `DMatrix`. This function aligns with rest of the `XGBoosterPredictFrom` functions on semantic of function arguments. Purge `ntree_limit` from libxgboost, use iteration instead. * [dask] Use `inplace_predict` by default for dask sklearn models. * [dask] Run prediction shape inference on worker instead of client. The breaking change is in the Python sklearn `apply` function, I made it to be consistent with other prediction functions where `best_iteration` is used by default.	2021-02-08 18:26:32 +08:00
Jiaming Yuan	411592a347	Enhance inplace prediction. (#6653 ) * Accept array interface for csr and array. * Accept an optional proxy dmatrix for metainfo. This constructs an explicit `_ProxyDMatrix` type in Python. * Remove unused doc. * Add strict output.	2021-02-02 11:41:46 +08:00
Philip Hyunsu Cho	0f2ed21a9d	[Breaking] Change default evaluation metric for binary:logitraw objective to logloss (#6647 )	2021-01-29 00:12:12 +09:00
Jiaming Yuan	bc08e0c9d1	Remove `experimental_json_serialization` from tests. (#6640 )	2021-01-27 17:44:49 +08:00
Jiaming Yuan	80065d571e	[dask] Add DaskXGBRanker (#6576 ) * Initial support for distributed LTR using dask. * Support `qid` in libxgboost. * Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]` to avoid duplicated code. * Define `DaskXGBRanker`. The dask ranker doesn't support group structure, instead it uses query id and convert to group ptr internally.	2021-01-08 18:35:09 +08:00
Jiaming Yuan	ca3da55de4	Support early stopping with training continuation, correct num boosted rounds. (#6506 ) * Implement early stopping with training continuation. * Add new C API for obtaining boosted rounds. * Fix off by 1 in `save_best`. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-12-17 19:59:19 +08:00
Jiaming Yuan	c5876277a8	Drop saving binary format for memory snapshot. (#6513 )	2020-12-17 00:14:57 +08:00
Philip Hyunsu Cho	fb56da5e8b	Add global configuration (#6414 ) * Add management functions for global configuration: XGBSetGlobalConfig(), XGBGetGlobalConfig(). * Add Python interface: set_config(), get_config(), and config_context(). * Add unit tests for Python * Add R interface: xgb.set.config(), xgb.get.config() * Add unit tests for R Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-12-03 00:05:18 -08:00
Honza Sterba	b0036b339b	Optionaly fail when gpu_id is set to invalid value (#6342 )	2020-11-28 15:14:12 +08:00
Jiaming Yuan	8a17610666	Implement GPU predict leaf. (#6187 )	2020-11-11 17:33:47 +08:00
Jiaming Yuan	519cee115a	Avoid resetting seed for every configuration. (#6349 )	2020-11-06 10:28:35 +08:00
Jiaming Yuan	2cc9662005	Support slicing tree model (#6302 ) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-11-02 23:27:39 -08:00
Igor Moura	5e1e972aea	Clean up warnings (#6325 )	2020-10-30 23:50:29 +08:00
Jiaming Yuan	3310e208fd	Fix inplace prediction interval. (#6259 ) * Add back the interval in call. * Make the interval non-optional.	2020-10-28 13:13:59 +08:00
Jiaming Yuan	b180223d18	Cleanup RABIT. (#6290 ) * Remove recovery and MPI speed tests. * Remove readme. * Remove Python binding. * Add checks in C API.	2020-10-27 08:48:22 +08:00
Jiaming Yuan	b5c2a47b20	Drop single point model recovery (#6262 ) * Pass rabit params in JVM package. * Implement timeout using poll timeout parameter. * Remove OOB data check.	2020-10-21 15:27:03 +08:00
Jiaming Yuan	ddf37cca30	Unify thread configuration. (#6186 )	2020-10-19 16:05:42 +08:00
Christian Lorentzen	cf4f019ed6	[Breaking] Change default evaluation metric for classification to logloss / mlogloss (#6183 ) * Change DefaultEvalMetric of classification from error to logloss * Change default binary metric in plugin/example/custom_obj.cc * Set old error metric in python tests * Set old error metric in R tests * Fix missed eval metrics and typos in R tests * Fix setting eval_metric twice in R tests * Add warning for empty eval_metric for classification * Fix Dask tests Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-10-02 12:06:47 -07:00
Rory Mitchell	dda9e1e487	Update GPUTreeshap (#6163 ) * Reduce shap test duration * Test interoperability with shap package * Add feature interactions * Update GPUTreeShap	2020-09-28 09:43:47 +13:00
Qi Zhang	989ddd036f	Swap byte-order in binary serializer to support big-endian arch (#5813 ) * fixed some endian issues * Use dmlc::ByteSwap() to simplify code * Fix lint check * [CI] Add test for s390x * Download latest CMake on s390x * Fix a bug in my code * Save magic number in dmatrix with byteswap on big-endian machine * Save version in binary with byteswap on big-endian machine * Load scalar with byteswap in MetaInfo * Add a debugging message * Handle arrays correctly when byteswapping * EOF can also be 255 * Handle magic number in MetaInfo carefully * Skip Tree.Load test for big-endian, since the test manually builds little-endian binary model * Handle missing packages in Python tests * Don't use boto3 in model compatibility tests * Add s390 Docker file for local testing * Add model compatibility tests * Add R compatibility test * Revert "Add R compatibility test" This reverts commit c2d2bdcb7dbae133cbb927fcd20f7e83ee2b18a8. Co-authored-by: Qi Zhang <q.zhang@ibm.com> Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-08-18 14:47:17 -07:00
Jiaming Yuan	6f7112a848	Move warning about empty dataset. (#5998 )	2020-08-11 14:10:51 +08:00
boxdot	d268a2a463	Thread-safe prediction by making the prediction cache thread-local. (#5853 ) Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-07-30 12:33:50 +08:00
Jiaming Yuan	18349a7ccf	[Breaking] Fix custom metric for multi output. (#5954 ) * Set output margin to true for custom metric. This fixes only R and Python.	2020-07-29 19:25:27 +08:00
Jiaming Yuan	75b8c22b0b	Fix prediction heuristic (#5955 ) * Relax check for prediction. * Relax test in spark test. * Add tests in C++.	2020-07-29 19:24:07 +08:00
Jiaming Yuan	40361043ae	[BLOCKING] Remove to_string. (#5934 )	2020-07-26 10:21:26 +08:00
Alexander Gugel	970b4b3fa2	Add XGBoosterGetNumFeature (#5856 ) - add GetNumFeature to Learner - add XGBoosterGetNumFeature to C API - update c-api-demo accordingly	2020-07-13 23:25:17 -07:00
Jiaming Yuan	93c44a9a64	Move feature names and types of DMatrix from Python to C++. (#5858 ) * Add thread local return entry for DMatrix. * Save feature name and feature type in binary file. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-07-07 09:40:13 +08:00
Philip Hyunsu Cho	1d22a9be1c	Revert "Reorder includes. (#5749 )" (#5771 ) This reverts commit d3a0efbf162f3dceaaf684109e1178c150b32de3.	2020-06-09 10:29:28 -07:00
Jiaming Yuan	d3a0efbf16	Reorder includes. (#5749 ) * Reorder includes. * R.	2020-06-03 17:30:47 +12:00
Jiaming Yuan	9e1b29944e	Fix loading old model. (#5724 ) * Add test.	2020-05-31 14:55:32 +08:00
Jiaming Yuan	7d93932423	Better message when no GPU is found. (#5594 )	2020-04-26 10:00:57 +08:00
Jiaming Yuan	e726dd9902	Set device in device dmatrix. (#5596 )	2020-04-25 13:42:53 +08:00
Rory Mitchell	660be66207	Avoid rabit calls in learner configuration (#5581 )	2020-04-24 14:59:29 +12:00
Jiaming Yuan	fcbedcedf8	Fix configuration I load model. (#5562 )	2020-04-20 17:25:11 +08:00
Jiaming Yuan	29a4cfe400	Group aware GPU sketching. (#5551 ) * Group aware GPU weighted sketching. * Distribute group weights to each data point. * Relax the test. * Validate input meta info. * Fix metainfo copy ctor.	2020-04-20 17:18:52 +08:00
Jiaming Yuan	a2f54963b6	Write binary header. (#5532 )	2020-04-15 17:47:57 +08:00
Jiaming Yuan	bd653fad4c	Remove distcol updater. (#5507 ) Closes #5498.	2020-04-10 12:52:56 +08:00
Bobby Wang	ad826e913f	[jvm-packages]add feature size for LabelPoint and DataBatch (#5303 ) * fix type error * Validate number of features. * resolve comments * add feature size for LabelPoint and DataBatch * pass the feature size to native * move feature size validating tests into a separate suite * resolve comments Co-authored-by: fis <jm.yuan@outlook.com>	2020-04-07 16:49:52 -07:00
Jiaming Yuan	0012f2ef93	Upgrade clang-tidy on CI. (#5469 ) * Correct all clang-tidy errors. * Upgrade clang-tidy to 10 on CI. Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-04-05 04:42:29 +08:00
Jiaming Yuan	a9313802ea	Fix dump model. (#5485 )	2020-04-05 03:52:54 +08:00
Jiaming Yuan	6601a641d7	Thread safe, inplace prediction. (#5389 ) Normal prediction with DMatrix is now thread safe with locks. Added inplace prediction is lock free thread safe. When data is on device (cupy, cudf), the returned data is also on device. * Implementation for numpy, csr, cudf and cupy. * Implementation for dask. * Remove sync in simple dmatrix.	2020-03-30 15:35:28 +08:00
Jiaming Yuan	45a97ddf32	Split up `LearnerImpl`. (#5350 )	2020-03-12 16:30:23 +08:00
Jiaming Yuan	0dd97c206b	Move thread local entry into Learner. (#5396 ) * Move thread local entry into Learner. This is an attempt to workaround CUDA context issue in static variable, where the CUDA context can be released before device vector. * Add PredictionEntry to thread local entry. This eliminates one copy of prediction vector. * Don't define CUDA C API in a namespace.	2020-03-07 15:37:39 +08:00
Jiaming Yuan	8d06878bf9	Deterministic GPU histogram. (#5361 ) * Use pre-rounding based method to obtain reproducible floating point summation. * GPU Hist for regression and classification are bit-by-bit reproducible. * Add doc. * Switch to thrust reduce for `node_sum_gradient`.	2020-03-04 15:13:28 +08:00
Jiaming Yuan	c35cdecddd	Move prediction cache to Learner. (#5220 ) * Move prediction cache into Learner. * Clean-ups - Remove duplicated cache in Learner and GBM. - Remove ad-hoc fix of invalid cache. - Remove `PredictFromCache` in predictors. - Remove prediction cache for linear altogether, as it's only moving the prediction into training process but doesn't provide any actual overall speed gain. - The cache is now unique to Learner, which means the ownership is no longer shared by any other components. * Changes - Add version to prediction cache. - Use weak ptr to check expired DMatrix. - Pass shared pointer instead of raw pointer.	2020-02-14 13:04:23 +08:00

1 2 3

148 Commits