* [coll] Pass context to various functions.
In the future, the `Context` object would be required for collective operations, this PR
passes the context object to some required functions to prepare for swapping out the
implementation.
- Rework the precision metric for both CPU and GPU.
- Mention it in the document.
- Cleanup old support code for GPU ranking metric.
- Deterministic GPU implementation.
* Drop support for classification.
* type.
* use batch shape.
* lint.
* cpu build.
* cpu build.
* lint.
* Tests.
* Fix.
* Cleanup error message.
- Pass context from booster to DMatrix.
- Use context instead of integer for `n_threads`.
- Check the consistency configuration for `max_bin`.
- Test for all combinations of initialization options.
* Implement multi-target for hist.
- Add new hist tree builder.
- Move data fetchers for tests.
- Dispatch function calls in gbm base on the tree type.
- Use `bst_bin_t` in batch param constructor.
- Use `StringView` to avoid `std::string` when appropriate.
- Avoid using `MetaInfo` in quantile constructor to limit the scope of parameter.
- Remove unused parameters. There are still many warnings that are not yet
addressed. Currently, the warnings in dmlc-core dominate the error log.
- Remove `distributed` parameter from metric.
- Fixes some warnings about signed comparison.
* Add hessian to batch param in preparation of new approx impl.
* Extract a push method for gradient index matrix.
* Use span instead of vector ref for hessian in sketching.
* Create a binary format for gradient index.
* Implement GK sketching on GPU.
* Strong tests on quantile building.
* Handle sparse dataset by binary searching the column index.
* Hypothesis test on dask.
* Set default dtor for SimpleDMatrix to initialize default copy ctor, which is
deleted due to unique ptr.
* Remove commented code.
* Remove warning for calling host function (std::max).
* Remove warning for initialization order.
* Remove warning for unused variables.
* Added finding quantiles on GPU.
- this includes datasets where weights are assigned to data rows
- as the quantiles found by the new algorithm are not the same
as those found by the old one, test thresholds in
tests/python-gpu/test_gpu_updaters.py have been adjusted.
* Adjustments and improved testing for finding quantiles on the GPU.
- added C++ tests for the DeviceSketch() function
- reduced one of the thresholds in test_gpu_updaters.py
- adjusted the cuts found by the find_cuts_k kernel
* Fatal error if GPU algorithm selected without GPU support compiled
* Resolve type conversion warnings
* Fix gpu unit test failure
* Fix compressed iterator edge case
* Fix python unit test failures due to flake8 update on pip
* Support histogram-based algorithm + multiple tree growing strategy
* Add a brand new updater to support histogram-based algorithm, which buckets
continuous features into discrete bins to speed up training. To use it, set
`tree_method = fast_hist` to configuration.
* Support multiple tree growing strategies. For now, two policies are supported:
* `grow_policy=depthwise` (default): favor splitting at nodes closest to the
root, i.e. grow depth-wise.
* `grow_policy=lossguide`: favor splitting at nodes with highest loss change
* Improve single-threaded performance
* Unroll critical loops
* Introduce specialized code for dense data (i.e. no missing values)
* Additional training parameters: `max_leaves`, `max_bin`, `grow_policy`, `verbose`
* Adding a small test for hist method
* Fix memory error in row_set.h
When std::vector is resized, a reference to one of its element may become
stale. Any such reference must be updated as well.
* Resolve cross-platform compilation issues
* Versions of g++ older than 4.8 lacks support for a few C++11 features, e.g.
alignas(*) and new initializer syntax. To support g++ 4.6, use pre-C++11
initializer and remove alignas(*).
* Versions of MSVC older than 2015 does not support alignas(*). To support
MSVC 2012, remove alignas(*).
* For g++ 4.8 and newer, alignas(*) is enabled for performance benefits.
* Some old compilers (MSVC 2012, g++ 4.6) do not support template aliases
(which uses `using` to declate type aliases). So always use `typedef`.
* Fix a host of CI issues
* Remove dependency for libz on osx
* Fix heading for hist_util
* Fix minor style issues
* Add missing #include
* Remove extraneous logging
* Enable tree_method=hist in R
* Renaming HistMaker to GHistBuilder to avoid confusion
* Fix R integration
* Respond to style comments
* Consistent tie-breaking for priority queue using timestamps
* Last-minute style fixes
* Fix issuecomment-271977647
The way we quantize data is broken. The agaricus data consists of all
categorical values. When NAs are converted into 0's,
`HistCutMatrix::Init` assign both 0's and 1's to the same single bin.
Why? gmat only the smallest value (0) and an upper bound (2), which is twice
the maximum value (1). Add the maximum value itself to gmat to fix the issue.
* Fix issuecomment-272266358
* Remove padding from cut values for the continuous case
* For categorical/ordinal values, use midpoints as bin boundaries to be safe
* Fix CI issue -- do not use xrange(*)
* Fix corner case in quantile sketch
Signed-off-by: Philip Cho <chohyu01@cs.washington.edu>
* Adding a test for an edge case in quantile sketcher
max_bin=2 used to cause an exception.
* Fix fast_hist test
The test used to require a strictly increasing Test AUC for all examples.
One of them exhibits a small blip in Test AUC before achieving a Test AUC
of 1. (See bottom.)
Solution: do not require monotonic increase for this particular example.
[0] train-auc:0.99989 test-auc:0.999497
[1] train-auc:1 test-auc:0.999749
[2] train-auc:1 test-auc:0.999749
[3] train-auc:1 test-auc:0.999749
[4] train-auc:1 test-auc:0.999749
[5] train-auc:1 test-auc:0.999497
[6] train-auc:1 test-auc:1
[7] train-auc:1 test-auc:1
[8] train-auc:1 test-auc:1
[9] train-auc:1 test-auc:1