4267 Commits

Author SHA1 Message Date
Jiaming Yuan
e0509b3307
Fix pruner. (#5335)
* Honor the tree depth.
* Prevent pruning pruned node.
2020-02-25 08:32:46 +08:00
Rory Mitchell
b0ed3f0a66
Remove unnecessary DMatrix methods (#5324) 2020-02-25 12:40:39 +13:00
Jiaming Yuan
655cf17b60
Predict on Ellpack. (#5327)
* Unify GPU prediction node.
* Add `PageExists`.
* Dispatch prediction on input data for GPU Predictor.
2020-02-23 06:27:03 +08:00
daiki katsuragawa
70a91ec3ba
Update README.md (#5346) 2020-02-23 02:52:37 +08:00
Philip Hyunsu Cho
cfae247231
Fix a small typo in sklearn.py that broke multiple eval metrics (#5341) 2020-02-22 19:02:37 +08:00
Rong Ou
d6b31df449
update docs for gpu external memory (#5332)
* update docs for gpu external memory

* add hist limitation
2020-02-22 14:57:40 +08:00
Philip Hyunsu Cho
7ac7e8778f
Port patches from 1.0.0 branch (#5336)
* Remove f-string, since it's not supported by Python 3.5 (#5330)

* Remove f-string, since it's not supported by Python 3.5

* Add Python 3.5 to CI, to ensure compatibility

* Remove duplicated matplotlib

* Show deprecation notice for Python 3.5

* Fix lint

* Fix lint

* Fix a unit test that mistook MINOR ver for PATCH ver

* Enforce only major version in JSON model schema

* Bump version to 1.1.0-SNAPSHOT
2020-02-21 13:13:21 -08:00
Philip Hyunsu Cho
8aa8ef1031
Display Sponsor button, link to OpenCollective (#5325) 2020-02-19 01:58:21 -08:00
Rory Mitchell
bc96ceb8b2
Refactor SparsePageSource, delete cache files after use (#5321)
* Refactor sparse page source

* Delete temporary cache files

* Log fatal if cache exists

* Log fatal if multiple threads used with prefetcher
2020-02-19 16:43:41 +13:00
Rory Mitchell
b2b2c4e231
Remove SimpleCSRSource (#5315) 2020-02-18 16:49:17 +13:00
Jiaming Yuan
9f77c18b0d
Add JVM_CHECK_CALL. (#5199)
* Added a check call macro in jvm package, prevents executing other functions
from jvm when error occurred in XGBoost. For example, when prediction fails jvm
should not try to allocate memory based on the output prediction size.
2020-02-18 11:10:55 +08:00
Jiaming Yuan
0110754a76
Remove update prediction cache from predictors. (#5312)
Move this function into gbtree, and uses only updater for doing so. As now the predictor knows exactly how many trees to predict, there's no need for it to update the prediction cache.
2020-02-17 11:35:47 +08:00
Jiaming Yuan
e433a379e4
Fix changing locale. (#5314)
* Fix changing locale.

* Don't use locale guard.

As number parsing is implemented in house, we don't need locale.

* Update doc.
2020-02-17 11:31:13 +08:00
Rory Mitchell
7e32af5c21
Wide dataset quantile performance improvement (#5306) 2020-02-16 10:24:42 +13:00
Jiaming Yuan
ed2465cce4
Add configuration to R interface. (#5217)
* Save and load internal parameter configuration as JSON.
2020-02-16 03:01:58 +08:00
Jiaming Yuan
8ca9744b07
Use scikit-learn in extra dependencies. (#5310) 2020-02-15 07:12:51 +08:00
Jiaming Yuan
c35cdecddd
Move prediction cache to Learner. (#5220)
* Move prediction cache into Learner.

* Clean-ups

- Remove duplicated cache in Learner and GBM.
- Remove ad-hoc fix of invalid cache.
- Remove `PredictFromCache` in predictors.
- Remove prediction cache for linear altogether, as it's only moving the
  prediction into training process but doesn't provide any actual overall speed
  gain.
- The cache is now unique to Learner, which means the ownership is no longer
  shared by any other components.

* Changes

- Add version to prediction cache.
- Use weak ptr to check expired DMatrix.
- Pass shared pointer instead of raw pointer.
2020-02-14 13:04:23 +08:00
Rory Mitchell
24ad9dec0b
Testing hist_util (#5251)
* Rank tests

* Remove categorical split specialisation

* Extend tests to multiple features, switch to WQSketch

* Add tests for SparseCuts

* Add external memory quantile tests, fix some existing tests
2020-02-14 14:36:43 +13:00
Jiaming Yuan
911a902835
Merge model compatibility fixes from 1.0rc branch. (#5305)
* Port test model compatibility.
* Port logit model fix.

https://github.com/dmlc/xgboost/pull/5248
https://github.com/dmlc/xgboost/pull/5281
2020-02-13 20:41:58 +08:00
Jiaming Yuan
29eeea709a
Pass shared pointer instead of raw pointer to Learner. (#5302)
Extracted from https://github.com/dmlc/xgboost/pull/5220 .
2020-02-11 14:16:38 +08:00
Philip Hyunsu Cho
2e0067e790
Update affiliation of @hcho3 (#5292) 2020-02-06 20:58:39 -08:00
Andrew Kane
94828a7c0c
Updated Windows build docs (#5283) 2020-02-05 12:19:54 +08:00
Jiaming Yuan
84e395d91e
Fix CMake build on Windows with setuptools. (#5280) 2020-02-05 10:47:39 +08:00
Jiaming Yuan
595a00466d
Rewrite setup.py. (#5271)
The setup.py is rewritten.  This new script uses only Python code and provide customized
implementation of setuptools commands.  This way users can run most of setuptools commands
just like any other Python libraries.

* Remove setup_pip.py
* Remove soft links.
* Define customized commands.
* Remove shell script.
* Remove makefile script.
* Update the doc for building from source.
2020-02-04 13:35:42 +08:00
Rong Ou
e4b74c4d22
Gradient based sampling for GPU Hist (#5093)
* Implement gradient based sampling for GPU Hist tree method.
* Add samplers and handle compacted page in GPU Hist.
2020-02-04 10:31:27 +08:00
Philip Hyunsu Cho
c74216f22c
Declare Python 3.8 support in setup.py (#5274) 2020-02-03 10:38:52 -08:00
David Díaz Vico
71e7e3b96f
Improved sklearn compatibility (#5255) 2020-02-03 13:30:45 +08:00
Jiaming Yuan
a5cc112eea
Export JSON config in get_params. (#5256) 2020-02-03 12:46:51 +08:00
Jiaming Yuan
ed0216642f
Avoid dask test fixtures. (#5270)
* Fix Travis OSX timeout.

* Fix classifier.
2020-02-03 12:39:20 +08:00
Jiaming Yuan
856b81c727
Ignore gdb_history. [skip ci] (#5257) 2020-02-02 20:40:09 +08:00
Nan Zhu
d7b45fbcaf
[jvm-packages] do not use multiple jobs to make checkpoints (#5082)
* temp

* temp

* tep

* address the comments

* fix stylistic issues

* fix

* external checkpoint
2020-02-01 19:36:39 -08:00
Philip Hyunsu Cho
fa26313feb
Remove use of std::cout from R package (#5261) 2020-02-01 05:52:19 -08:00
Jiaming Yuan
fe8d72b50b
Cleanup warnings. (#5247)
From clang-tidy-9 and gcc-7: Invalid case style, narrowing definition, wrong
initialization order, unused variables.
2020-01-31 14:52:15 +08:00
Philip Hyunsu Cho
adc795929a
Fix compilation error due to 64-bit integer narrowing to size_t (#5250) 2020-01-30 21:32:15 -08:00
Jiaming Yuan
472ded549d
Save Scikit-Learn attributes into learner attributes. (#5245)
* Remove the recommendation for pickle.

* Save skl attributes in booster.attr

* Test loading scikit-learn model with native booster.
2020-01-30 16:00:18 +08:00
Egor Smirnov
c67163250e
Optimized BuildHist function (#5156) 2020-01-29 23:32:57 -08:00
Philip Hyunsu Cho
4240daed4e
Make pip install xgboost*.tar.gz work by fixing build-python.sh (#5241)
* Make pip install xgboost*.tar.gz work by fixing build-python.sh

* Simplify install doc

* Add test

* Install Miniconda for Linux target too

* Build XGBoost only once in sdist

* Try importing xgboost after installation

* Don't set PYTHONPATH env var for sdist test
2020-01-28 23:18:23 -08:00
Philip Hyunsu Cho
cb3ed404cf
[R] Enable OpenMP with AppleClang in XGBoost R package (#5240)
* [R] Enable OpenMP with AppleClang in XGBoost R package

* Dramatically simplify install doc
2020-01-28 12:37:22 -08:00
Philip Hyunsu Cho
f7105fa44f
Update Rabit (#5237) 2020-01-28 02:05:01 -08:00
Jiaming Yuan
43974939f4
Better error message about loading pickled model. (#5236)
Print the link to new tutorial.
2020-01-28 15:29:14 +08:00
Philip Hyunsu Cho
b513dcd352
Support FreeBSD (#5233)
* Fix build on FreeBSD

* Use __linux__ macro
2020-01-27 22:19:16 -08:00
Jiaming Yuan
ef19480eda
Add dart to JSON schema. (#5218)
* Add dart to JSON schema.

* Use spaces instead of tab.
2020-01-28 13:29:09 +08:00
Philip Hyunsu Cho
0c7455276d
[R] Robust endian detection in CRAN xgboost build (#5232)
* [R] Robust endian detection in CRAN xgboost build

* Check for external backtrace() lib

* Update Makevars.win
2020-01-27 02:55:13 -08:00
Rory Mitchell
1b3947d929
Make some GPU tests deterministic (#5229) 2020-01-26 11:53:07 +13:00
Jiaming Yuan
3eb1279bbf
Config for linear updaters. (#5222) 2020-01-25 11:26:46 +08:00
Jiaming Yuan
40680368cf
Add constraint parameters to Scikit-Learn interface. (#5227)
* Add document for constraints.

* Fix a format error in doc for objective function.
2020-01-25 11:12:02 +08:00
Philip Hyunsu Cho
44469a0ca9
Extensible binary serialization format for DMatrix::MetaInfo (#5187)
* Turn xgboost::DataType into C++11 enum class

* New binary serialization format for DMatrix::MetaInfo

* Fix clang-tidy

* Fix c++ test

* Implement new format proposal

* Move helper functions to anonymous namespace; remove unneeded field

* Fix lint

* Add shape.

* Keep only roundtrip test.

* Fix test.

* various fixes

* Update data.cc

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2020-01-23 11:33:17 -08:00
OrdoAbChao
b4f952bd22 [Breaking] Remove Scikit-Learn default parameters (#5130)
* Simplify Scikit-Learn parameter management.

* Copy base class for removing duplicated parameter signatures.
* Set all parameters to None.
* Handle None in set_param.
* Extract the doc.

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2020-01-23 20:25:20 +08:00
Rory Mitchell
aa9a68010b
uint not supported in cudf (#5225) 2020-01-23 16:59:18 +13:00
Jiaming Yuan
1891cc766d
Fix metainfo from DataFrame. (#5216)
* Fix metainfo from DataFrame.

* Unify helper functions for data and meta.
2020-01-22 16:29:44 +08:00