xgboost

Author	SHA1	Message	Date
Jiaming Yuan	35dac8af1d	[BP] Fix index type for bitfield. (#7541 ) (#7560 )	2022-01-14 00:21:34 +08:00
Jiaming Yuan	1311a20f49	[BP] Fix num_boosted_rounds for linear model. (#7538 ) (#7559 ) * Add note. * Fix n boosted rounds.	2022-01-14 00:20:57 +08:00
Jiaming Yuan	3e2d7519a6	[dask] Fix asyncio. (#7508 ) (#7561 )	2022-01-13 21:49:11 +08:00
Jiaming Yuan	afb9dfd421	[backport] CI fixes for macos (#7482 ) * [CI] Fix continuous delivery pipeline for MacOS (#7472) * Fix github macos package upload. (#7474) * Fix macos package upload. (#7475) * Split up the tests. * [CI] Add missing step extract_branch (#7479) Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2021-11-25 01:57:55 +08:00
Jiaming Yuan	a013942649	Check number of trees in inplace predict. (#7409 ) (#7424 )	2021-11-12 19:31:31 +08:00
Jiaming Yuan	14c56f05da	[backport] Handle missing values in dataframe with category dtype. (#7331 ) (#7413 ) * Handle missing values in dataframe with category dtype. (#7331) * Replace -1 in pandas initializer. * Unify `IsValid` functor. * Mimic pandas data handling in cuDF glue code. * Check invalid categories. * Fix DDM sketching. * Fix pick error.	2021-11-10 21:24:46 +08:00
Jiaming Yuan	e7ac2486eb	[backport] [R] Fix global feature importance and predict with 1 sample. (#7394 ) (#7397 ) * [R] Fix global feature importance. * Add implementation for tree index. The parameter is not documented in C API since we should work on porting the model slicing to R instead of supporting more use of tree index. * Fix the difference between "gain" and "total_gain". * debug. * Fix prediction.	2021-11-06 00:07:36 +08:00
Jiaming Yuan	a3d195e73e	Handle `OMP_THREAD_LIMIT`. (#7390 ) (#7391 )	2021-11-03 20:25:51 +08:00
Jiaming Yuan	fab3c05ced	Move macos test to github action. (#7382 ) (#7392 ) Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2021-11-03 18:39:47 +08:00
Jiaming Yuan	30c1b5c54c	[backport] Fix prediction with cat data in sklearn interface. (#7306 ) (#7312 ) * Specify DMatrix parameter for pre-processing dataframe. * Add document about the behaviour of prediction.	2021-10-12 18:49:57 +08:00
Jiaming Yuan	36e247aca4	Fix weighted samples in multi-class AUC. (#7300 ) (#7305 )	2021-10-11 18:00:36 +08:00
Jiaming Yuan	c4aff733bb	[backport] Fix cv `verbose_eval` (#7291 ) (#7296 )	2021-10-08 14:24:27 +08:00
Jiaming Yuan	cdbfd21d31	[backport] Fix gamma neg log likelihood. (#7275 ) (#7285 )	2021-10-05 23:01:11 +08:00
Jiaming Yuan	d8a549e6ac	Avoid thread block with sparse data. (#7255 )	2021-09-25 13:11:34 +08:00
Jiaming Yuan	ca17f8a5fc	Dispatch thrust versions and upgrade rmm. (#7254 ) Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2021-09-25 03:43:23 +08:00
Bobby Wang	0ee11dac77	[jvm-packages][xgboost4j-gpu] Support GPU dataframe and `DeviceQuantileDMatrix` (#7195 ) Following classes are added to support dataframe in java binding: - `Column` is an abstract type for a single column in tabular data. - `ColumnBatch` is an abstract type for dataframe. - `CuDFColumn` is an implementaiton of `Column` that consume cuDF column - `CudfColumnBatch` is an implementation of `ColumnBatch` that consumes cuDF dataframe. - `DeviceQuantileDMatrix` is the interface for quantized data. The Java implementation mimics the Python interface and uses `__cuda_array_interface__` protocol for memory indexing. One difference is on JVM package, the data batch is staged on the host as java iterators cannot be reset. Co-authored-by: jiamingy <jm.yuan@outlook.com>	2021-09-24 14:25:00 +08:00
Jiaming Yuan	c735c17f33	Disable callback and ES on random forest. (#7236 )	2021-09-17 18:21:17 +08:00
Jiaming Yuan	22d56cebf1	Encode pandas categorical data automatically. (#7231 )	2021-09-17 11:09:55 +08:00
Jiaming Yuan	32e0858501	Fix travis. (#7237 )	2021-09-17 10:06:23 +08:00
Jiaming Yuan	31c1e13f90	Categorical data support in CPU sketching. (#7221 )	2021-09-17 04:37:09 +08:00
Jiaming Yuan	0ed979b096	Support more input types for categorical data. (#7220 ) * Support more input types for categorical data. * Shorten the type name from "categorical" to "c". * Tests for np/cp array and scipy csr/csc/coo. * Specify the type for feature info.	2021-09-16 20:39:30 +08:00
Jiaming Yuan	2942dc68e4	Fix mixed types in GPU sketching. (#7228 )	2021-09-16 00:10:25 +08:00
Jiaming Yuan	037dd0820d	Implement `__sklearn_is_fitted__`. (#7230 )	2021-09-15 19:09:04 +08:00
Jiaming Yuan	d997c967d5	Demo for experimental categorical data support. (#7213 )	2021-09-15 08:20:12 +08:00
Jiaming Yuan	3515931305	Initial support for external memory in gradient index. (#7183 ) * Add hessian to batch param in preparation of new approx impl. * Extract a push method for gradient index matrix. * Use span instead of vector ref for hessian in sketching. * Create a binary format for gradient index.	2021-09-13 12:40:56 +08:00
Jiaming Yuan	b12e7f7edd	Add noexcept to JSON objects. (#7205 )	2021-09-07 13:56:48 +08:00
Jiaming Yuan	3a4f51f39f	Avoid calling CUDA code on CPU for linear model. (#7154 )	2021-09-01 10:45:31 +08:00
Jiaming Yuan	7a1d67f9cb	[breaking] Use integer atomic for GPU histogram. (#7180 ) On GPU we use rouding factor to truncate the gradient for deterministic results. This PR changes the gradient representation to fixed point number with exponent aligned with rounding factor. [breaking] Drop non-deterministic histogram. Use fixed point for shared memory. This PR is to improve the performance of GPU Hist. Co-authored-by: Andy Adinets <aadinets@nvidia.com>	2021-08-28 05:17:05 +08:00
Philip Hyunsu Cho	3060f0b562	[CI] Automatically build GPU-enabled R package for Windows (#7185 ) * [CI] Automatically build GPU-enabled R package for Windows * Update Jenkinsfile-win64 * Build R package for the release branch only * Update install doc	2021-08-25 02:11:01 -07:00
Philip Hyunsu Cho	d04312b9c0	[CI] Fix hanging Python setup in Windows CI (#7186 )	2021-08-24 22:03:51 -07:00
Jiaming Yuan	3f38d983a6	Fix prediction configuration. (#7159 ) After the predictor parameter was added to the constructor, this configuration was broken.	2021-08-11 16:34:36 +08:00
Jiaming Yuan	149f209af6	Extract histogram builder from CPU Hist. (#7152 ) * Extract the CPU histogram builder. * Fix tests. * Reduce number of histograms being built.	2021-08-09 21:15:21 +08:00
Jiaming Yuan	8a84be37b8	Pass scikit learn estimator checks for regressor. (#7130 ) * Check data shape. * Check labels.	2021-08-03 18:58:20 +08:00
Jiaming Yuan	e2c406f5c8	Support `min_delta` in early stopping. (#7137 ) * Support `min_delta` in early stopping. * Remove abs_tol.	2021-08-03 14:29:17 +08:00
Jiaming Yuan	7bdedacb54	Document for `process_type`. (#7135 ) * Update document for prune and refresh. * Add demo.	2021-08-03 13:11:52 +08:00
Jiaming Yuan	d080b5a953	Fix model slicing. (#7149 ) * Use correct pointer. * Remove best_iteration/best_score.	2021-08-03 11:51:56 +08:00
Philip Hyunsu Cho	f1a4a1ac95	[CI] Upgrade build image to CentOS 7 + GCC 8; require CUDA 10.1 and later (#7141 )	2021-07-29 10:54:33 -07:00
Jiaming Yuan	7ee7a95b84	Use upstream URI in distributed quantile tests. (#7129 ) * Use upstream URI in distributed quantile tests. * Fix test cv `PytestAssertRewriteWarning`.	2021-07-27 14:09:49 +08:00
Jiaming Yuan	e88ac9cc54	[dask] Extend tree stats tests. (#7128 ) * Add tests to GPU. * Assert cover in children sums up to the parent.	2021-07-27 12:22:13 +08:00
Jiaming Yuan	778135f657	Fix parameter loading with training continuation. (#7121 ) * Add a demo for training continuation.	2021-07-23 10:51:47 +08:00
ShvetsKS	caa9e527dd	Remove extra sync for dense data (#7120 ) Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>	2021-07-22 19:02:31 +08:00
Jiaming Yuan	e6088366df	Export Python Interface for external memory. (#7070 ) * Add Python iterator interface. * Add tests. * Add demo. * Add documents. * Handle empty dataset.	2021-07-22 15:15:53 +08:00
Jiaming Yuan	bd1f3a38f0	Rewrite sparse dmatrix using callbacks. (#7092 ) - Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves. - Remove use of threaded iterator and IO queue. - Remove `page_size`. - Make sure the number of pages in memory is bounded. - Make sure the cache can not be violated. - Provide an interface for internal algorithms to process data asynchronously.	2021-07-16 12:33:31 +08:00
Philip Hyunsu Cho	2801d69fb7	[CI] Pin libomp to 11.1.0 (#7107 )	2021-07-15 11:16:51 +08:00
Jiaming Yuan	345796825f	Optional find dependency in installed cmake config. (#7099 ) * Find dependency only when xgboost is built as static library. * Resolve msvc warning. * Add test for linking shared library.	2021-07-11 17:20:55 +08:00
Jiaming Yuan	77f6cf2d13	Support hessian in host sketch container. (#7081 ) Prepare for migrating approx onto hist's codebase.	2021-07-08 16:33:58 +08:00
Jiaming Yuan	84d359efb8	Support host data in proxy DMatrix. (#7087 )	2021-07-08 11:35:48 +08:00
Jiaming Yuan	5d7cdf2e36	[Breaking] Rename Quantile DMatrix C API. (#7082 ) The role of ProxyDMatrix is going beyond what it was designed. Now it's used by both QuantileDeviceDMatrix and inplace prediction. After the refactoring of sparse DMatrix it will also be used for external memory. Renaming the C API to extract it from QuantileDeviceDMatrix.	2021-07-08 11:34:14 +08:00
Jiaming Yuan	c766f143ab	Refactor external memory formats. (#7089 ) * Save base_rowid. * Return write size. * Remove unused function.	2021-07-08 04:04:51 +08:00
Jiaming Yuan	615ab2b03e	Extract evaluate splits from CPU hist. (#7079 ) Other than modularizing the split evaluation function, this PR also removes some more functions including `InitNewNodes` and `BuildNodeStats` among some other unused variables. Also, scattered code like setting leaf weights is grouped into the split evaluator and `NodeEntry` is simplified and made private. Another subtle difference with the original implementation is that the modified code doesn't call `tree[nidx].Parent()` to traversal upward.	2021-07-07 15:16:25 +08:00

1 2 3 4 5 ...

903 Commits