xgboost

Author	SHA1	Message	Date
Jiaming Yuan	35dac8af1d	[BP] Fix index type for bitfield. (#7541 ) (#7560 )	2022-01-14 00:21:34 +08:00
Jiaming Yuan	14c56f05da	[backport] Handle missing values in dataframe with category dtype. (#7331 ) (#7413 ) * Handle missing values in dataframe with category dtype. (#7331) * Replace -1 in pandas initializer. * Unify `IsValid` functor. * Mimic pandas data handling in cuDF glue code. * Check invalid categories. * Fix DDM sketching. * Fix pick error.	2021-11-10 21:24:46 +08:00
Jiaming Yuan	d8a549e6ac	Avoid thread block with sparse data. (#7255 )	2021-09-25 13:11:34 +08:00
Jiaming Yuan	31c1e13f90	Categorical data support in CPU sketching. (#7221 )	2021-09-17 04:37:09 +08:00
Jiaming Yuan	2942dc68e4	Fix mixed types in GPU sketching. (#7228 )	2021-09-16 00:10:25 +08:00
Jiaming Yuan	b12e7f7edd	Add noexcept to JSON objects. (#7205 )	2021-09-07 13:56:48 +08:00
Jiaming Yuan	7a1d67f9cb	[breaking] Use integer atomic for GPU histogram. (#7180 ) On GPU we use rouding factor to truncate the gradient for deterministic results. This PR changes the gradient representation to fixed point number with exponent aligned with rounding factor. [breaking] Drop non-deterministic histogram. Use fixed point for shared memory. This PR is to improve the performance of GPU Hist. Co-authored-by: Andy Adinets <aadinets@nvidia.com>	2021-08-28 05:17:05 +08:00
Jiaming Yuan	7ee7a95b84	Use upstream URI in distributed quantile tests. (#7129 ) * Use upstream URI in distributed quantile tests. * Fix test cv `PytestAssertRewriteWarning`.	2021-07-27 14:09:49 +08:00
Jiaming Yuan	bd1f3a38f0	Rewrite sparse dmatrix using callbacks. (#7092 ) - Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves. - Remove use of threaded iterator and IO queue. - Remove `page_size`. - Make sure the number of pages in memory is bounded. - Make sure the cache can not be violated. - Provide an interface for internal algorithms to process data asynchronously.	2021-07-16 12:33:31 +08:00
Jiaming Yuan	345796825f	Optional find dependency in installed cmake config. (#7099 ) * Find dependency only when xgboost is built as static library. * Resolve msvc warning. * Add test for linking shared library.	2021-07-11 17:20:55 +08:00
Jiaming Yuan	77f6cf2d13	Support hessian in host sketch container. (#7081 ) Prepare for migrating approx onto hist's codebase.	2021-07-08 16:33:58 +08:00
Jiaming Yuan	1cd20efe68	Move `GHistIndex` into `DMatrix`. (#7064 )	2021-07-01 00:44:49 +08:00
Jiaming Yuan	1c8fdf2218	Remove use of `device_idx` in `dh::LaunchN`. (#7063 ) It's an unused parameter, removing it can make the CI log more readable.	2021-06-29 11:37:26 +08:00
ShvetsKS	2567404ab6	Simplify sparse and dense CPU hist kernels (#7029 ) * Simplify sparse and dense kernels * Extract row partitioner. Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>	2021-06-11 18:26:30 +08:00
Jiaming Yuan	556a83022d	Implement unified update prediction cache for (gpu_)hist. (#6860 ) * Implement utilites for linalg. * Unify the update prediction cache functions. * Implement update prediction cache for multi-class gpu hist.	2021-04-17 00:29:34 +08:00
Jiaming Yuan	3039dd194b	Don't estimate sketch batch size when rmm is used. (#6807 )	2021-03-31 15:29:56 +08:00
Jiaming Yuan	a59c7323b4	Fix inplace predict missing value. (#6787 )	2021-03-27 05:36:10 +08:00
Jiaming Yuan	bcc0277338	Re-implement ROC-AUC. (#6747 ) * Re-implement ROC-AUC. * Binary * MultiClass * LTR * Add documents. This PR resolves a few issues: - Define a value when the dataset is invalid, which can happen if there's an empty dataset, or when the dataset contains only positive or negative values. - Define ROC-AUC for multi-class classification. - Define weighted average value for distributed setting. - A correct implementation for learning to rank task. Previous implementation is just binary classification with averaging across groups, which doesn't measure ordered learning to rank.	2021-03-20 16:52:40 +08:00
Jiaming Yuan	1a73a28511	Add device argsort. (#6749 ) This is part of https://github.com/dmlc/xgboost/pull/6747 .	2021-03-16 16:05:22 +08:00
Jiaming Yuan	218a5fb6dd	Simplify Span checks. (#6685 ) * Stop printing out message. * Remove R specialization. The printed message is not really useful anyway, without a reproducible example there's no way to fix it. But if there's a reproducible example, we can always obtain these information by a debugger. Removing the `printf` function avoids creating the context in kernel.	2021-02-09 08:12:58 +08:00
Jiaming Yuan	1b70a323a7	Improve string view to reduce string allocation. (#6644 )	2021-01-27 19:08:52 +08:00
Jiaming Yuan	f2f7dd87b8	Use view for `SparsePage` exclusively. (#6590 )	2021-01-11 18:04:55 +08:00
Jiaming Yuan	886486a519	Support categorical data in GPU weighted sketching. (#6508 )	2020-12-16 14:23:28 +08:00
ShvetsKS	956beead70	Thread local memory allocation for BuildHist (#6358 ) * thread mem locality * fix apply * cleanup * fix lint * fix tests * simple try * fix * fix * apply comments * fix comments * fix * apply simple comment Co-authored-by: ShvetsKS <kirill.shvets@intel.com>	2020-11-25 17:50:12 +03:00
Jiaming Yuan	43efadea2e	Deterministic data partitioning for external memory (#6317 ) * Make external memory data partitioning deterministic. * Change the meaning of `page_size` from bytes to number of rows. * Design a data pool. * Note for external memory. * Enable unity build on Windows CI. * Force garbage collect on test.	2020-11-11 06:11:06 +08:00
Jiaming Yuan	c4da967b5c	Support unity build. (#6295 ) * Support unity build. * Setup on Windows Jenkins. * Revert "Setup on Windows Jenkins." This reverts commit 8345cb8d2b009eec8ae9fa6f16412a7c9b6ec12c.	2020-10-28 11:49:28 -07:00
Jiaming Yuan	ddf37cca30	Unify thread configuration. (#6186 )	2020-10-19 16:05:42 +08:00
Jiaming Yuan	bed7ae4083	Loop over `thrust::reduce`. (#6229 ) * Check input chunk size of dqdm. * Add doc for current limitation.	2020-10-14 10:40:56 +13:00
Rory Mitchell	734a911a26	Loop over copy_if (#6201 ) * Loop over copy_if * Catch OOM. Co-authored-by: fis <jm.yuan@outlook.com>	2020-10-14 10:23:16 +13:00
Jiaming Yuan	2241563f23	Handle duplicated values in sketching. (#6178 ) * Accumulate weights in duplicated values. * Fix device id in iterative dmatrix.	2020-10-10 19:32:44 +08:00
Jiaming Yuan	444131a2e6	Add categorical data support to GPU Hist. (#6164 )	2020-09-29 11:27:25 +08:00
Jiaming Yuan	14afdb4d92	Support categorical data in ellpack. (#6140 )	2020-09-24 19:28:57 +08:00
Jiaming Yuan	210c131ce7	Support categorical data in GPU sketching. (#6137 )	2020-09-21 13:53:06 +08:00
Jiaming Yuan	a069a21e03	Implement intrusive ptr (#6129 ) * Use intrusive ptr for JSON.	2020-09-20 20:07:16 +08:00
Jiaming Yuan	e319b63f9e	Merge extract cuts into QuantileContainer. (#6125 ) * Use pruning for initial summary construction.	2020-09-18 16:36:39 +08:00
Jiaming Yuan	29b7fea572	Optimize cpu sketch allreduce for sparse data. (#6009 ) * Bypass RABIT serialization reducer and use custom allgather based merging.	2020-08-19 10:03:45 +08:00
Qi Zhang	989ddd036f	Swap byte-order in binary serializer to support big-endian arch (#5813 ) * fixed some endian issues * Use dmlc::ByteSwap() to simplify code * Fix lint check * [CI] Add test for s390x * Download latest CMake on s390x * Fix a bug in my code * Save magic number in dmatrix with byteswap on big-endian machine * Save version in binary with byteswap on big-endian machine * Load scalar with byteswap in MetaInfo * Add a debugging message * Handle arrays correctly when byteswapping * EOF can also be 255 * Handle magic number in MetaInfo carefully * Skip Tree.Load test for big-endian, since the test manually builds little-endian binary model * Handle missing packages in Python tests * Don't use boto3 in model compatibility tests * Add s390 Docker file for local testing * Add model compatibility tests * Add R compatibility test * Revert "Add R compatibility test" This reverts commit c2d2bdcb7dbae133cbb927fcd20f7e83ee2b18a8. Co-authored-by: Qi Zhang <q.zhang@ibm.com> Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-08-18 14:47:17 -07:00
Jiaming Yuan	4d99c58a5f	Feature weights (#5962 )	2020-08-18 19:55:41 +08:00
Jiaming Yuan	674c409e9d	Remove rabit dependency on public headers. (#6005 )	2020-08-13 08:26:20 +08:00
Philip Hyunsu Cho	9adb812a0a	RMM integration plugin (#5873 ) * [CI] Add RMM as an optional dependency * Replace caching allocator with pool allocator from RMM * Revert "Replace caching allocator with pool allocator from RMM" This reverts commit e15845d4e72e890c2babe31a988b26503a7d9038. * Use rmm::mr::get_default_resource() * Try setting default resource (doesn't work yet) * Allocate pool_mr in the heap * Prevent leaking pool_mr handle * Separate EXPECT_DEATH() in separate test suite suffixed DeathTest * Turn off death tests for RMM * Address reviewer's feedback * Prevent leaking of cuda_mr * Fix Jenkinsfile syntax * Remove unnecessary function in Jenkinsfile * [CI] Install NCCL into RMM container * Run Python tests * Try building with RMM, CUDA 10.0 * Do not use RMM for CUDA 10.0 target * Actually test for test_rmm flag * Fix TestPythonGPU * Use CNMeM allocator, since pool allocator doesn't yet support multiGPU * Use 10.0 container to build RMM-enabled XGBoost * Revert "Use 10.0 container to build RMM-enabled XGBoost" This reverts commit 789021fa31112e25b683aef39fff375403060141. * Fix Jenkinsfile * [CI] Assign larger /dev/shm to NCCL * Use 10.2 artifact to run multi-GPU Python tests * Add CUDA 10.0 -> 11.0 cross-version test; remove CUDA 10.0 target * Rename Conda env rmm_test -> gpu_test * Use env var to opt into CNMeM pool for C++ tests * Use identical CUDA version for RMM builds and tests * Use Pytest fixtures to enable RMM pool in Python tests * Move RMM to plugin/CMakeLists.txt; use PLUGIN_RMM * Use per-device MR; use command arg in gtest * Set CMake prefix path to use Conda env * Use 0.15 nightly version of RMM * Remove unnecessary header * Fix a unit test when cudf is missing * Add RMM demos * Remove print() * Use HostDeviceVector in GPU predictor * Simplify pytest setup; use LocalCUDACluster fixture * Address reviewers' commments Co-authored-by: Hyunsu Cho <chohyu01@cs.wasshington.edu>	2020-08-12 01:26:02 -07:00
Jiaming Yuan	ee70a2380b	Unify CPU hist sketching (#5880 )	2020-08-12 01:33:06 +08:00
Philip Hyunsu Cho	71b0528a2f	GPU implementation of AFT survival objective and metric (#5714 ) * Add interval accuracy * De-virtualize AFT functions * Lint * Refactor AFT metric using GPU-CPU reducer * Fix R build * Fix build on Windows * Fix copyright header * Clang-tidy * Fix crashing demo * Fix typos in comment; explain GPU ID * Remove unnecessary #include * Add C++ test for interval accuracy * Fix a bug in accuracy metric: use log pred * Refactor AFT objective using GPU-CPU Transform * Lint * Fix lint * Use Ninja to speed up build * Use time, not /usr/bin/time * Add cpu_build worker class, with concurrency = 1 * Use concurrency = 1 only for CUDA build * concurrency = 1 for clang-tidy * Address reviewer's feedback * Update link to AFT paper	2020-07-17 01:18:13 -07:00
Jiaming Yuan	e471056ec4	Fix sketch size calculation. (#5898 )	2020-07-17 08:33:16 +08:00
Jiaming Yuan	dd445af56e	Cleanup on device sketch. (#5874 ) * Remove old functions. * Merge weighted and un-weighted into a common interface.	2020-07-14 10:15:54 +08:00
Rong Ou	06320729d4	fix device sketch with weights in external memory mode (#5870 )	2020-07-08 08:44:07 +08:00
Jiaming Yuan	048d969be4	Implement GK sketching on GPU. (#5846 ) * Implement GK sketching on GPU. * Strong tests on quantile building. * Handle sparse dataset by binary searching the column index. * Hypothesis test on dask.	2020-07-07 12:16:21 +08:00
Jiaming Yuan	4b0852ee41	Use dmlc stream when URI protocol is not local file. (#5857 )	2020-07-07 03:07:12 +08:00
Philip Hyunsu Cho	a67bc64819	Add an option to run brute-force test for JSON round-trip (#5804 ) * Add an option to run brute-force test for JSON round-trip * Apply reviewer's feedback * Remove unneeded objects * Parallel run. * Max. * Use signed 64-bit loop var, to support MSVC * Add exhaustive test to CI * Run JSON test in Win build worker * Revert "Run JSON test in Win build worker" This reverts commit c97b2c7dda37b3585b445d36961605b79552ca89. * Revert "Add exhaustive test to CI" This reverts commit c149c2ce9971a07a7289f9b9bc247818afd5a667. Co-authored-by: fis <jm.yuan@outlook.com>	2020-06-17 23:46:02 -07:00
Jiaming Yuan	38ee514787	Implement fast number serialization routines. (#5772 ) * Implement ryu algorithm. * Implement integer printing. * Full coverage roundtrip test.	2020-06-17 12:39:23 +08:00
Jiaming Yuan	1fa84b61c1	Implement `Empty` method for host device vector. (#5781 ) * Fix accessing nullptr.	2020-06-13 19:02:26 +08:00

1 2 3 4

157 Commits