Jiaming Yuan
0f7a9b42f1
Use double precision in metric calculation. ( #7364 )
2021-11-02 12:00:32 +08:00
Jiaming Yuan
d05754f558
Avoid OMP reduction in AUC. ( #7362 )
2021-10-28 05:03:52 +08:00
Jiaming Yuan
ac9bfaa4f2
Handle missing values in dataframe with category dtype. ( #7331 )
...
* Replace -1 in pandas initializer.
* Unify `IsValid` functor.
* Mimic pandas data handling in cuDF glue code.
* Check invalid categories.
* Fix DDM sketching.
2021-10-28 03:33:54 +08:00
Jiaming Yuan
d4349426d8
Re-implement PR-AUC. ( #7297 )
...
* Support binary/multi-class classification, ranking.
* Add documents.
* Handle missing data.
2021-10-26 13:07:50 +08:00
Jiaming Yuan
fd61c61071
Avoid omp reduction in rank metric. ( #7349 )
2021-10-22 14:13:34 +08:00
Jiaming Yuan
d1f00fb0b7
Stricter validation for group. ( #7345 )
2021-10-21 12:13:33 +08:00
Jiaming Yuan
8d7c6366d7
Accept histogram cut instead gradient index in evaluation. ( #7336 )
2021-10-20 18:04:46 +08:00
Jiaming Yuan
fbb0dc4275
Remove auto configuration of seed_per_iteration. ( #7009 )
...
* Remove auto configuration of seed_per_iteration.
This should be related to model recovery from rabit, which is removed.
* Document.
2021-10-17 15:58:57 +08:00
Jiaming Yuan
fb1a9e6bc5
Avoid omp reduction in coordinate descent and aft metrics. ( #7316 )
...
Aside from the omp issue, parameter configuration for aft metric is simplified.
2021-10-17 15:55:49 +08:00
Jiaming Yuan
8e619010d0
Extract CPUExpandEntry and HistParam. ( #7321 )
...
* Remove kRootNid.
* Check for empty hessian.
2021-10-17 14:22:25 +08:00
Jiaming Yuan
4ddf8d001c
Deterministic result for element-wise/mclass metrics. ( #7303 )
...
Remove openmp reduction.
2021-10-13 14:22:40 +08:00
Jiaming Yuan
a7d0c66457
Remove unused code. ( #7293 )
2021-10-12 15:04:41 +08:00
Jiaming Yuan
298af6f409
Fix weighted samples in multi-class AUC. ( #7300 )
2021-10-11 15:12:29 +08:00
Jiaming Yuan
d8cb395380
Fix gamma neg log likelihood. ( #7275 )
2021-10-05 16:57:08 +08:00
Jiaming Yuan
b3b03200e2
Remove old warning in 1.3 ( #7279 )
2021-10-01 08:05:50 +08:00
Jiaming Yuan
d8a549e6ac
Avoid thread block with sparse data. ( #7255 )
2021-09-25 13:11:34 +08:00
Jiaming Yuan
ca17f8a5fc
Dispatch thrust versions and upgrade rmm. ( #7254 )
...
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-09-25 03:43:23 +08:00
ShvetsKS
475fd1abec
Reduced span overheads in objective function calculate ( #7206 )
...
Co-authored-by: fis <jm.yuan@outlook.com>
2021-09-23 04:43:59 +08:00
david-cortes
4f93e5586a
Improve wording for warning ( #7248 )
...
This warning sounds a bit ungrammatical. Additionally, the second part of the warning is not clear. This PR changes the wording to make it clearer.
2021-09-21 10:48:11 +08:00
Jiaming Yuan
c311a8c1d8
Enable compiling with system cub. ( #7232 )
...
- Tested with all CUDA 11.x.
- Workaround cub scan by using discard iterator in AUC.
- Limit the size of Argsort when compiled with CUDA cub.
2021-09-17 14:28:18 +08:00
Jiaming Yuan
22d56cebf1
Encode pandas categorical data automatically. ( #7231 )
2021-09-17 11:09:55 +08:00
Jiaming Yuan
31c1e13f90
Categorical data support in CPU sketching. ( #7221 )
2021-09-17 04:37:09 +08:00
Jiaming Yuan
0ed979b096
Support more input types for categorical data. ( #7220 )
...
* Support more input types for categorical data.
* Shorten the type name from "categorical" to "c".
* Tests for np/cp array and scipy csr/csc/coo.
* Specify the type for feature info.
2021-09-16 20:39:30 +08:00
Jiaming Yuan
2942dc68e4
Fix mixed types in GPU sketching. ( #7228 )
2021-09-16 00:10:25 +08:00
Jiaming Yuan
3515931305
Initial support for external memory in gradient index. ( #7183 )
...
* Add hessian to batch param in preparation of new approx impl.
* Extract a push method for gradient index matrix.
* Use span instead of vector ref for hessian in sketching.
* Create a binary format for gradient index.
2021-09-13 12:40:56 +08:00
Jiaming Yuan
804b2ac60f
Expose DMatrix API for CUDA columnar and array. ( #7217 )
...
* Use JSON encoded configurations.
* Expose them into header file.
2021-09-09 17:55:25 +08:00
Jiaming Yuan
b12e7f7edd
Add noexcept to JSON objects. ( #7205 )
2021-09-07 13:56:48 +08:00
Jiaming Yuan
3a4f51f39f
Avoid calling CUDA code on CPU for linear model. ( #7154 )
2021-09-01 10:45:31 +08:00
Jiaming Yuan
ba69244a94
Restore the custom double atomic add. ( #7198 )
2021-08-28 18:30:42 +08:00
Jiaming Yuan
7a1d67f9cb
[breaking] Use integer atomic for GPU histogram. ( #7180 )
...
On GPU we use rouding factor to truncate the gradient for deterministic results. This PR changes the gradient representation to fixed point number with exponent aligned with rounding factor.
[breaking] Drop non-deterministic histogram.
Use fixed point for shared memory.
This PR is to improve the performance of GPU Hist.
Co-authored-by: Andy Adinets <aadinets@nvidia.com>
2021-08-28 05:17:05 +08:00
Jiaming Yuan
e7d7ab6bc3
Better error message for ncclUnhandledCudaError. ( #7190 )
2021-08-27 10:29:22 +08:00
Jiaming Yuan
ee8d1f5ed8
Fix histogram truncation. ( #7181 )
...
* Fix truncation.
* Lint.
* lint.
2021-08-24 18:34:32 -07:00
Jiaming Yuan
bf562bd33c
Remove unused code. ( #7175 )
2021-08-18 14:02:19 +08:00
Jiaming Yuan
9600ca83f3
Remove synchronization in monitor. ( #7164 )
...
* Remove synchronization in monitor.
Calling rabit functions during destruction is flaky.
* Add xgboost prefix to nvtx marker.
2021-08-11 16:33:53 +08:00
Jiaming Yuan
149f209af6
Extract histogram builder from CPU Hist. ( #7152 )
...
* Extract the CPU histogram builder.
* Fix tests.
* Reduce number of histograms being built.
2021-08-09 21:15:21 +08:00
Philip Hyunsu Cho
336af4f974
Work around a segfault observed in SparsePage::Push() ( #7161 )
...
* Work around a segfault observed in SparsePage::Push()
* Revert "Work around a segfault observed in SparsePage::Push()"
This reverts commit 30934844d00908750a5442082eb4769b1489f6a9.
* Don't call vector::resize() inside OpenMP block
* Set GITHUB_PAT env var to fix R tests
* Use built-in GITHUB_TOKEN
2021-08-08 02:12:30 -07:00
Jiaming Yuan
8a84be37b8
Pass scikit learn estimator checks for regressor. ( #7130 )
...
* Check data shape.
* Check labels.
2021-08-03 18:58:20 +08:00
Jiaming Yuan
d080b5a953
Fix model slicing. ( #7149 )
...
* Use correct pointer.
* Remove best_iteration/best_score.
2021-08-03 11:51:56 +08:00
Robert Maynard
1a75f43304
Allow compilation with nvcc 11.4 ( #7131 )
...
* Use type aliases for discard iterators
* update to include host_vector as thrust 1.12 doesn't bring it in as a side-effect
* cub::DispatchRadixSort requires signed offset types
2021-07-27 20:05:33 +08:00
Taewoo Kim
41e882f80b
Check input value is duplicated when quantile queue is full ( #7091 )
...
Co-authored-by: Taewoo Kim <taewoo@layer6.com>
2021-07-23 03:07:01 +08:00
ShvetsKS
caa9e527dd
Remove extra sync for dense data ( #7120 )
...
Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
2021-07-22 19:02:31 +08:00
Jiaming Yuan
e6088366df
Export Python Interface for external memory. ( #7070 )
...
* Add Python iterator interface.
* Add tests.
* Add demo.
* Add documents.
* Handle empty dataset.
2021-07-22 15:15:53 +08:00
farfarawayzyt
e64ee6592f
fix typo in src/common/hist.cc BuildHistKernel ( #7116 )
2021-07-21 19:53:05 +08:00
farfarawayzyt
d7c14496d2
fix typo in arguments of PartitionBuilder::Init ( #7113 )
...
Co-authored-by: Yuntian Zhang <zhangyt@lamda.nju.edu.cn>
2021-07-16 15:46:22 +08:00
Jiaming Yuan
bd1f3a38f0
Rewrite sparse dmatrix using callbacks. ( #7092 )
...
- Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves.
- Remove use of threaded iterator and IO queue.
- Remove `page_size`.
- Make sure the number of pages in memory is bounded.
- Make sure the cache can not be violated.
- Provide an interface for internal algorithms to process data asynchronously.
2021-07-16 12:33:31 +08:00
Jiaming Yuan
abec3dbf6d
Fix thread safety of softmax prediction. ( #7104 )
2021-07-16 02:06:55 +08:00
Jiaming Yuan
345796825f
Optional find dependency in installed cmake config. ( #7099 )
...
* Find dependency only when xgboost is built as static library.
* Resolve msvc warning.
* Add test for linking shared library.
2021-07-11 17:20:55 +08:00
Jiaming Yuan
77f6cf2d13
Support hessian in host sketch container. ( #7081 )
...
Prepare for migrating approx onto hist's codebase.
2021-07-08 16:33:58 +08:00
Jiaming Yuan
84d359efb8
Support host data in proxy DMatrix. ( #7087 )
2021-07-08 11:35:48 +08:00
Jiaming Yuan
5d7cdf2e36
[Breaking] Rename Quantile DMatrix C API. ( #7082 )
...
The role of ProxyDMatrix is going beyond what it was designed. Now it's used by both
QuantileDeviceDMatrix and inplace prediction. After the refactoring of sparse DMatrix it
will also be used for external memory. Renaming the C API to extract it from
QuantileDeviceDMatrix.
2021-07-08 11:34:14 +08:00