669 Commits

Author SHA1 Message Date
Jiaming Yuan
e0509b3307
Fix pruner. (#5335)
* Honor the tree depth.
* Prevent pruning pruned node.
2020-02-25 08:32:46 +08:00
Rory Mitchell
b0ed3f0a66
Remove unnecessary DMatrix methods (#5324) 2020-02-25 12:40:39 +13:00
Jiaming Yuan
655cf17b60
Predict on Ellpack. (#5327)
* Unify GPU prediction node.
* Add `PageExists`.
* Dispatch prediction on input data for GPU Predictor.
2020-02-23 06:27:03 +08:00
Philip Hyunsu Cho
cfae247231
Fix a small typo in sklearn.py that broke multiple eval metrics (#5341) 2020-02-22 19:02:37 +08:00
Philip Hyunsu Cho
7ac7e8778f
Port patches from 1.0.0 branch (#5336)
* Remove f-string, since it's not supported by Python 3.5 (#5330)

* Remove f-string, since it's not supported by Python 3.5

* Add Python 3.5 to CI, to ensure compatibility

* Remove duplicated matplotlib

* Show deprecation notice for Python 3.5

* Fix lint

* Fix lint

* Fix a unit test that mistook MINOR ver for PATCH ver

* Enforce only major version in JSON model schema

* Bump version to 1.1.0-SNAPSHOT
2020-02-21 13:13:21 -08:00
Rory Mitchell
bc96ceb8b2
Refactor SparsePageSource, delete cache files after use (#5321)
* Refactor sparse page source

* Delete temporary cache files

* Log fatal if cache exists

* Log fatal if multiple threads used with prefetcher
2020-02-19 16:43:41 +13:00
Rory Mitchell
b2b2c4e231
Remove SimpleCSRSource (#5315) 2020-02-18 16:49:17 +13:00
Jiaming Yuan
0110754a76
Remove update prediction cache from predictors. (#5312)
Move this function into gbtree, and uses only updater for doing so. As now the predictor knows exactly how many trees to predict, there's no need for it to update the prediction cache.
2020-02-17 11:35:47 +08:00
Jiaming Yuan
e433a379e4
Fix changing locale. (#5314)
* Fix changing locale.

* Don't use locale guard.

As number parsing is implemented in house, we don't need locale.

* Update doc.
2020-02-17 11:31:13 +08:00
Jiaming Yuan
c35cdecddd
Move prediction cache to Learner. (#5220)
* Move prediction cache into Learner.

* Clean-ups

- Remove duplicated cache in Learner and GBM.
- Remove ad-hoc fix of invalid cache.
- Remove `PredictFromCache` in predictors.
- Remove prediction cache for linear altogether, as it's only moving the
  prediction into training process but doesn't provide any actual overall speed
  gain.
- The cache is now unique to Learner, which means the ownership is no longer
  shared by any other components.

* Changes

- Add version to prediction cache.
- Use weak ptr to check expired DMatrix.
- Pass shared pointer instead of raw pointer.
2020-02-14 13:04:23 +08:00
Rory Mitchell
24ad9dec0b
Testing hist_util (#5251)
* Rank tests

* Remove categorical split specialisation

* Extend tests to multiple features, switch to WQSketch

* Add tests for SparseCuts

* Add external memory quantile tests, fix some existing tests
2020-02-14 14:36:43 +13:00
Jiaming Yuan
911a902835
Merge model compatibility fixes from 1.0rc branch. (#5305)
* Port test model compatibility.
* Port logit model fix.

https://github.com/dmlc/xgboost/pull/5248
https://github.com/dmlc/xgboost/pull/5281
2020-02-13 20:41:58 +08:00
Jiaming Yuan
29eeea709a
Pass shared pointer instead of raw pointer to Learner. (#5302)
Extracted from https://github.com/dmlc/xgboost/pull/5220 .
2020-02-11 14:16:38 +08:00
Jiaming Yuan
595a00466d
Rewrite setup.py. (#5271)
The setup.py is rewritten.  This new script uses only Python code and provide customized
implementation of setuptools commands.  This way users can run most of setuptools commands
just like any other Python libraries.

* Remove setup_pip.py
* Remove soft links.
* Define customized commands.
* Remove shell script.
* Remove makefile script.
* Update the doc for building from source.
2020-02-04 13:35:42 +08:00
Rong Ou
e4b74c4d22
Gradient based sampling for GPU Hist (#5093)
* Implement gradient based sampling for GPU Hist tree method.
* Add samplers and handle compacted page in GPU Hist.
2020-02-04 10:31:27 +08:00
Jiaming Yuan
a5cc112eea
Export JSON config in get_params. (#5256) 2020-02-03 12:46:51 +08:00
Jiaming Yuan
ed0216642f
Avoid dask test fixtures. (#5270)
* Fix Travis OSX timeout.

* Fix classifier.
2020-02-03 12:39:20 +08:00
Jiaming Yuan
fe8d72b50b
Cleanup warnings. (#5247)
From clang-tidy-9 and gcc-7: Invalid case style, narrowing definition, wrong
initialization order, unused variables.
2020-01-31 14:52:15 +08:00
Jiaming Yuan
472ded549d
Save Scikit-Learn attributes into learner attributes. (#5245)
* Remove the recommendation for pickle.

* Save skl attributes in booster.attr

* Test loading scikit-learn model with native booster.
2020-01-30 16:00:18 +08:00
Egor Smirnov
c67163250e
Optimized BuildHist function (#5156) 2020-01-29 23:32:57 -08:00
Philip Hyunsu Cho
4240daed4e
Make pip install xgboost*.tar.gz work by fixing build-python.sh (#5241)
* Make pip install xgboost*.tar.gz work by fixing build-python.sh

* Simplify install doc

* Add test

* Install Miniconda for Linux target too

* Build XGBoost only once in sdist

* Try importing xgboost after installation

* Don't set PYTHONPATH env var for sdist test
2020-01-28 23:18:23 -08:00
Jiaming Yuan
ef19480eda
Add dart to JSON schema. (#5218)
* Add dart to JSON schema.

* Use spaces instead of tab.
2020-01-28 13:29:09 +08:00
Rory Mitchell
1b3947d929
Make some GPU tests deterministic (#5229) 2020-01-26 11:53:07 +13:00
Jiaming Yuan
3eb1279bbf
Config for linear updaters. (#5222) 2020-01-25 11:26:46 +08:00
Jiaming Yuan
40680368cf
Add constraint parameters to Scikit-Learn interface. (#5227)
* Add document for constraints.

* Fix a format error in doc for objective function.
2020-01-25 11:12:02 +08:00
Philip Hyunsu Cho
44469a0ca9
Extensible binary serialization format for DMatrix::MetaInfo (#5187)
* Turn xgboost::DataType into C++11 enum class

* New binary serialization format for DMatrix::MetaInfo

* Fix clang-tidy

* Fix c++ test

* Implement new format proposal

* Move helper functions to anonymous namespace; remove unneeded field

* Fix lint

* Add shape.

* Keep only roundtrip test.

* Fix test.

* various fixes

* Update data.cc

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2020-01-23 11:33:17 -08:00
OrdoAbChao
b4f952bd22 [Breaking] Remove Scikit-Learn default parameters (#5130)
* Simplify Scikit-Learn parameter management.

* Copy base class for removing duplicated parameter signatures.
* Set all parameters to None.
* Handle None in set_param.
* Extract the doc.

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2020-01-23 20:25:20 +08:00
Rory Mitchell
aa9a68010b
uint not supported in cudf (#5225) 2020-01-23 16:59:18 +13:00
Jiaming Yuan
1891cc766d
Fix metainfo from DataFrame. (#5216)
* Fix metainfo from DataFrame.

* Unify helper functions for data and meta.
2020-01-22 16:29:44 +08:00
Rory Mitchell
9c56480c61
Support dmatrix construction from cupy array (#5206) 2020-01-22 13:15:27 +13:00
Philip Hyunsu Cho
0184f2e9f7
Explicitly use UTF-8 codepage when using MSVC (#5197)
* Explicitly use UTF-8 codepage when using MSVC

* Fix build with CUDA enabled
2020-01-14 13:30:34 -08:00
Rory Mitchell
a73e25e15f
Implement slice via adapters (#5198) 2020-01-14 12:55:41 +13:00
Kodi Arfer
f100b8d878 [Breaking] Don't drop trees during DART prediction by default (#5115)
* Simplify DropTrees calling logic

* Add `training` parameter for prediction method.

* [Breaking]: Add `training` to C API.

* Change for R and Python custom objective.

* Correct comment.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2020-01-13 21:48:30 +08:00
Jiaming Yuan
7b65698187
Enforce correct data shape. (#5191)
* Fix syncing DMatrix columns.
* notes for tree method.
* Enable feature validation for all interfaces except for jvm.
* Better tests for boosting from predictions.
* Disable validation on JVM.
2020-01-13 15:48:17 +08:00
Rory Mitchell
8cbcc53ccb
Remove old cudf constructor code (#5194) 2020-01-10 16:35:23 +13:00
Rory Mitchell
87ebfc1315
Implement cudf construction with adapters. (#5189) 2020-01-09 20:23:06 +13:00
Jiaming Yuan
ee287808fb
Lazy initialization of device vector. (#5173)
* Lazy initialization of device vector.

* Fix #5162.

* Disable copy constructor of HostDeviceVector.  Prevents implicit copying.

* Fix CPU build.

* Bring back move assignment operator.
2020-01-07 11:23:05 +08:00
Jiaming Yuan
ebc86a3afa
Disable parameter validation for Scikit-Learn interface. (#5167)
* Disable parameter validation for now.

Scikit-Learn passes all parameters down to XGBoost, whether they are used or
not.

* Add option `validate_parameters`.
2020-01-07 11:17:31 +08:00
Egor Smirnov
7b17e76c5b Optimized EvaluateSplut function (#5138)
* Add block based threading utilities.
2019-12-31 18:18:42 +08:00
Jiaming Yuan
04db125699
Quick fix for memory leak in CPU Hist. (#5153)
Closes https://github.com/dmlc/xgboost/issues/3579 .

* Don't use map.
2019-12-31 14:05:53 +08:00
K.O
018df6004e Fix feature_name crated from int64index dataframe. (#5081) 2019-12-30 12:26:22 +08:00
Jiaming Yuan
6848d0426f
Clean up Python 2 compatibility code. (#5161) 2019-12-27 18:34:53 +08:00
Jiaming Yuan
61286c6e8f
Fix wrapping GPU ID and prevent data copying. (#5160)
* Removed some data copying.

* Make sure gpu_id is valid before any configuration is carried out.
2019-12-27 16:51:08 +08:00
sriramch
ee81ba8e1f implementation of map ranking algorithm on gpu (#5129)
* - implementation of map ranking algorithm
  - also effected necessary suggestions mentioned in the earlier ranking pr's
  - made some performance improvements to the ndcg algo as well
2019-12-27 12:05:37 +13:00
Philip Hyunsu Cho
9b0af6e882 Enable OpenMP with Apple Clang (Mac default compiler) (#5146)
* Add OpenMP as CMake target

* Require CMake 3.12, to allow linking OpenMP target to objxgboost

* Specify OpenMP compiler flag for CUDA host compiler

* Require CMake 3.16+ if the OS is Mac OSX

* Use AppleClang in Mac tests.

* Update dmlc-core
2019-12-26 16:53:12 +08:00
Jiaming Yuan
f3d7877802
Parameter validation (#5157)
* Unused code.

* Split up old colmaker parameters from train param.

* Fix dart.

* Better name.
2019-12-26 11:59:05 +08:00
Jiaming Yuan
ced3660f60
Tests for empty dmatrix. (#5159) 2019-12-26 11:51:54 +08:00
Jiaming Yuan
298ebe68ac
[Breaking] Remove learning_rates in Python. (#5155)
* Remove `learning_rates`.

It's been deprecated since we have callback.

* Set `before_iteration` of `reset_learning_rate` to False to preserve 
  the initial learning rate, and comply to the term "reset".

Closes #4709.

* Tests for various `tree_method`.
2019-12-24 14:25:48 +08:00
Jiaming Yuan
0202e04a8e
Add base margin to sklearn interface. (#5151) 2019-12-24 09:43:41 +08:00
Jiaming Yuan
1d0ca49761
Example JSON model parser and Schema. (#5137) 2019-12-23 19:47:35 +08:00