xgboost

Author	SHA1	Message	Date
Jiaming Yuan	d9799b09d0	Categorical data support for cuDF. (#7042 ) * Add support in DMatrix. * Add support in DQM, except for iterator.	2021-06-17 13:54:33 +08:00
Jiaming Yuan	b56614e9b8	[R] Use new predict function. (#6819 ) * Call new C prediction API. * Add `strict_shape`. * Add `iterationrange`. * Update document.	2021-06-11 13:03:29 +08:00
Jiaming Yuan	ee4f51a631	Support for all primitive types from array. (#7003 ) * Change C API name. * Test for all primitive types from array. * Add native support for CPU 128 float. * Convert boolean and float16 in Python. * Fix dask version for now.	2021-06-01 08:34:48 +08:00
vslaykovsky	2a9979e256	Fixed incorrect feature mismatch error message (#6949 ) data.shape[0] denotes the number of samples, data.shape[1] is the number of features	2021-05-11 13:52:11 +08:00
Jiaming Yuan	37ad60fe25	Enforce input data is not `object`. (#6927 ) * Check for object data type. * Allow strided arrays with greater underlying buffer size.	2021-05-02 00:09:01 +08:00
Jiaming Yuan	ef473b1f09	Disable pylint error. (#6911 )	2021-04-29 01:01:37 +08:00
Jiaming Yuan	a2ecbdaa31	Add an API guard to prevent global variables being changed. (#6891 )	2021-04-23 10:27:57 +08:00
Jiaming Yuan	a5d7094a45	Update documents. (#6856 ) * Add early stopping section to prediction doc. * Remove best_ntree_limit. * Better doxygen output.	2021-04-16 12:41:03 +08:00
Jiaming Yuan	dee5ef2dfd	Typehint for Sklearn. (#6799 )	2021-04-14 06:55:21 +08:00
giladmaya	aa0d8f20c1	Support configuring constraints by feature names (#6783 ) Co-authored-by: fis <jm.yuan@outlook.com>	2021-04-04 06:53:33 +08:00
Jiaming Yuan	7e06c81894	Fix approximated predict contribution. (#6811 )	2021-04-03 02:15:03 +08:00
Jiaming Yuan	0cced530ea	[doc] Clarify prediction function. (#6813 )	2021-04-03 02:12:04 +08:00
Jiaming Yuan	a5c852660b	Update document for sklearn model IO. (#6809 ) * Update the use of JSON. * Remove unnecessary type cast.	2021-04-01 15:52:36 +08:00
Jiaming Yuan	9da2287ab8	[breaking] Save booster feature info in JSON, remove feature name generation. (#6605 ) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package.	2021-02-25 18:54:16 +08:00
Jiaming Yuan	c375173dca	Support pylint 2.7.0 (#6726 )	2021-02-25 12:49:58 +08:00
Jiaming Yuan	bdedaab8d1	Fix pylint. (#6714 )	2021-02-19 11:53:27 +08:00
Roffild	4c5d2608e0	[python-package] Fix class Booster: feature_types = None (#6705 )	2021-02-13 17:50:23 +08:00
Jiaming Yuan	4656b09d5d	[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668 ) * Add a new API function for predicting on `DMatrix`. This function aligns with rest of the `XGBoosterPredictFrom` functions on semantic of function arguments. Purge `ntree_limit` from libxgboost, use iteration instead. * [dask] Use `inplace_predict` by default for dask sklearn models. * [dask] Run prediction shape inference on worker instead of client. The breaking change is in the Python sklearn `apply` function, I made it to be consistent with other prediction functions where `best_iteration` is used by default.	2021-02-08 18:26:32 +08:00
Jiaming Yuan	411592a347	Enhance inplace prediction. (#6653 ) * Accept array interface for csr and array. * Accept an optional proxy dmatrix for metainfo. This constructs an explicit `_ProxyDMatrix` type in Python. * Remove unused doc. * Add strict output.	2021-02-02 11:41:46 +08:00
Jiaming Yuan	4bf23c2391	Specify shape in prediction contrib and interaction. (#6614 )	2021-01-26 02:08:22 +08:00
Jiaming Yuan	8942c98054	Define metainfo and other parameters for all DMatrix interfaces. (#6601 ) This PR ensures all DMatrix types have a common interface. * Fix logic in avoiding duplicated DMatrix in sklearn. * Check for consistency between DMatrix types. * Add doc for bounds.	2021-01-25 16:06:06 +08:00
Jiaming Yuan	d356b7a071	Restore unknown data support. (#6595 )	2021-01-14 04:51:16 +08:00
Jiaming Yuan	0027220aa0	[breaking] Remove duplicated predict functions, Fix attributes IO. (#6593 ) * Fix attributes not being restored. * Rename all `data` to `X`. [breaking]	2021-01-13 16:56:49 +08:00
Jiaming Yuan	80065d571e	[dask] Add DaskXGBRanker (#6576 ) * Initial support for distributed LTR using dask. * Support `qid` in libxgboost. * Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]` to avoid duplicated code. * Define `DaskXGBRanker`. The dask ranker doesn't support group structure, instead it uses query id and convert to group ptr internally.	2021-01-08 18:35:09 +08:00
Jiaming Yuan	de8fd852a5	[dask] Add type hints. (#6519 ) * Add validate_features. * Show type hints in doc. Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-12-29 19:41:02 +08:00
Philip Hyunsu Cho	fbb980d9d3	Expand `~` into the home directory on Linux and MacOS (#6531 )	2020-12-19 23:35:13 -08:00
Jiaming Yuan	ca3da55de4	Support early stopping with training continuation, correct num boosted rounds. (#6506 ) * Implement early stopping with training continuation. * Add new C API for obtaining boosted rounds. * Fix off by 1 in `save_best`. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-12-17 19:59:19 +08:00
Jiaming Yuan	0e97d97d50	Fix merge conflict. (#6512 )	2020-12-16 18:02:25 +08:00
Jiaming Yuan	347f593169	Accept numpy array for DMatrix slice index. (#6368 )	2020-12-16 14:42:52 +08:00
Jiaming Yuan	ef4a0e0aac	Fix DMatrix feature names/types IO. (#6507 ) * Fix feature names/types IO Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-12-16 14:24:27 +08:00
Jiaming Yuan	3c3f026ec1	Move metric configuration into booster. (#6504 )	2020-12-16 05:35:04 +08:00
Jiaming Yuan	47b86180f6	Don't validate feature when number of rows is 0. (#6472 )	2020-12-07 18:08:51 +08:00
Nikhil Choudhary	ae1662028a	Fixed few grammatical mistakes in doc (#6393 )	2020-11-15 13:48:08 +08:00
Jiaming Yuan	fcfeb4959c	Deprecate positional arguments. (#6365 ) Deprecate positional arguments in following functions: - `__init__` for all classes in sklearn module. - `fit` method for all classes in sklearn module. - dask interface. - `set_info` for `DMatrix` class. Refactor the evaluation matrices handling.	2020-11-13 11:10:30 +08:00
Jiaming Yuan	2cc9662005	Support slicing tree model (#6302 ) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-11-02 23:27:39 -08:00
Rory Mitchell	29745c6df2	Fix inclusive scan for large sizes (#6234 )	2020-11-03 17:01:43 +13:00
Jiaming Yuan	bed7ae4083	Loop over `thrust::reduce`. (#6229 ) * Check input chunk size of dqdm. * Add doc for current limitation.	2020-10-14 10:40:56 +13:00
Jiaming Yuan	2443275891	Cleanup Python code. (#6223 ) * Remove pathlike as XGBoost 1.2 requires Python 3.6. * Move conditional import of dask/distributed into dask module.	2020-10-12 15:44:41 +08:00
Jiaming Yuan	7622b8cdb8	Enable categorical data support on Python DMatrix. (#6166 ) * Only pandas is recognized.	2020-09-29 11:22:56 +08:00
Jiaming Yuan	33d80ffad0	[dask] Support more meta data on functional interface. (#6132 ) * Add base_margin, label_(lower\|upper)_bound. * Test survival training with dask.	2020-09-21 16:56:37 +08:00
Jiaming Yuan	a144daf034	Limit tree depth for GPU hist. (#6045 )	2020-08-22 19:34:52 +08:00
ShvetsKS	24f2e6c97e	Optimize DMatrix build time. (#5877 ) Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>	2020-08-20 01:37:03 +08:00
Qi Zhang	989ddd036f	Swap byte-order in binary serializer to support big-endian arch (#5813 ) * fixed some endian issues * Use dmlc::ByteSwap() to simplify code * Fix lint check * [CI] Add test for s390x * Download latest CMake on s390x * Fix a bug in my code * Save magic number in dmatrix with byteswap on big-endian machine * Save version in binary with byteswap on big-endian machine * Load scalar with byteswap in MetaInfo * Add a debugging message * Handle arrays correctly when byteswapping * EOF can also be 255 * Handle magic number in MetaInfo carefully * Skip Tree.Load test for big-endian, since the test manually builds little-endian binary model * Handle missing packages in Python tests * Don't use boto3 in model compatibility tests * Add s390 Docker file for local testing * Add model compatibility tests * Add R compatibility test * Revert "Add R compatibility test" This reverts commit c2d2bdcb7dbae133cbb927fcd20f7e83ee2b18a8. Co-authored-by: Qi Zhang <q.zhang@ibm.com> Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-08-18 14:47:17 -07:00
Jiaming Yuan	4d99c58a5f	Feature weights (#5962 )	2020-08-18 19:55:41 +08:00
Jiaming Yuan	18349a7ccf	[Breaking] Fix custom metric for multi output. (#5954 ) * Set output margin to true for custom metric. This fixes only R and Python.	2020-07-29 19:25:27 +08:00
Jiaming Yuan	029a8b533f	Simplify the data backends. (#5893 )	2020-07-16 15:17:31 +08:00
Jiaming Yuan	d0a29c3135	Remove print. (#5867 )	2020-07-08 04:12:14 +08:00
Jiaming Yuan	a3ec964346	Accept iterator in device dmatrix. (#5783 ) * Remove Device DMatrix.	2020-07-07 21:44:48 +08:00
Jiaming Yuan	93c44a9a64	Move feature names and types of DMatrix from Python to C++. (#5858 ) * Add thread local return entry for DMatrix. * Save feature name and feature type in binary file. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-07-07 09:40:13 +08:00
Jiaming Yuan	8104f10328	Update document for model dump. (#5818 ) * Clarify the relationship between dump and save. * Mention the schema.	2020-06-22 14:33:54 +08:00

1 2 3 4 5 ...

290 Commits