274 Commits

Author SHA1 Message Date
Jonas
00ea7b83c9 Fix docs for num_parallel_tree (#4221)
Minor formatting correction for `num_parallel_tree`.
2019-03-06 23:47:51 +08:00
Philip Hyunsu Cho
67c38805a1
Update build doc: PyPI wheel now support multi-GPU (#4219) 2019-03-05 13:25:31 -08:00
Adam November
0c1d5f1120 Fix snapshot artifact name in docs. (#4196) 2019-03-03 13:27:50 -08:00
Matthew Jones
92b7577c62 [REVIEW] Enable Multi-Node Multi-GPU functionality (#4095)
* Initial commit to support multi-node multi-gpu xgboost using dask

* Fixed NCCL initialization by not ignoring the opg parameter.

- it now crashes on NCCL initialization, but at least we're attempting it properly

* At the root node, perform a rabit::Allreduce to get initial sum_gradient across workers

* Synchronizing in a couple of more places.

- now the workers don't go down, but just hang
- no more "wild" values of gradients
- probably needs syncing in more places

* Added another missing max-allreduce operation inside BuildHistLeftRight

* Removed unnecessary collective operations.

* Simplified rabit::Allreduce() sync of gradient sums.

* Removed unnecessary rabit syncs around ncclAllReduce.

- this improves performance _significantly_ (7x faster for overall training,
  20x faster for xgboost proper)

* pulling in latest xgboost

* removing changes to updater_quantile_hist.cc

* changing use_nccl_opg initialization, removing unnecessary if statements

* added definition for opaque ncclUniqueId struct to properly encapsulate GetUniqueId

* placing struct defintion in guard to avoid duplicate code errors

* addressing linting errors

* removing

* removing additional arguments to AllReduer initialization

* removing distributed flag

* making comm init symmetric

* removing distributed flag

* changing ncclCommInit to support multiple modalities

* fix indenting

* updating ncclCommInitRank block with necessary group calls

* fix indenting

* adding print statement, and updating accessor in vector

* improving print statement to end-line

* generalizing nccl_rank construction using rabit

* assume device_ordinals is the same for every node

* test, assume device_ordinals is identical for all nodes

* test, assume device_ordinals is unique for all nodes

* changing names of offset variable to be more descriptive, editing indenting

* wrapping ncclUniqueId GetUniqueId() and aesthetic changes

* adding synchronization, and tests for distributed

* adding  to tests

* fixing broken #endif

* fixing initialization of gpu histograms, correcting errors in tests

* adding to contributors list

* adding distributed tests to jenkins

* fixing bad path in distributed test

* debugging

* adding kubernetes for distributed tests

* adding proper import for OrderedDict

* adding urllib3==1.22 to address ordered_dict import error

* added sleep to allow workers to save their models for comparison

* adding name to GPU contributors under docs
2019-03-02 10:03:22 +13:00
Yanbo Liang
9fefa2128d [jvm-packages] Fix early stop with xgboost4j-spark (#4176)
* Fix early stop with xgboost4j-spark

* Update XGBoost.java

* Update XGBoost.java

* Update XGBoost.java

To use -Float.MAX_VALUE as the lower bound, in case there is positive metric.

* Only update best score if the current score is better (no update when equal)

* Update xgboost-spark tutorial to fix early stopping docs.
2019-03-01 13:02:57 -08:00
Jiaming Yuan
754fe8142b
Make `HistCutMatrix::Init' be aware of groups. (#4115)
* Add checks for group size.
* Simple docs.
* Search group index during hist cut matrix initialization.

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2019-02-16 04:39:41 +08:00
Nan Zhu
ae3bb9c2d5
Distributed Fast Histogram Algorithm (#4011)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* init

* allow hist algo

* more changes

* temp

* update

* remove hist sync

* udpate rabit

* change hist size

* change the histogram

* update kfactor

* sync per node stats

* temp

* update

* final

* code clean

* update rabit

* more cleanup

* fix errors

* fix failed tests

* enforce c++11

* fix lint issue

* broadcast subsampled feature correctly

* revert some changes

* fix lint issue

* enable monotone and interaction constraints

* don't specify default for monotone and interactions

* update docs
2019-02-05 05:12:53 -08:00
Jiaming Yuan
8905df4a18
Perform clang-tidy on both cpp and cuda source. (#4034)
* Basic script for using compilation database.

* Add `GENERATE_COMPILATION_DATABASE' to CMake.
* Rearrange CMakeLists.txt.
* Add basic python clang-tidy script.
* Remove modernize-use-auto.
* Add clang-tidy to Jenkins
* Refine logic for correct path detection

In Jenkins, the project root is of form /home/ubuntu/workspace/xgboost_PR-XXXX

* Run clang-tidy in CUDA 9.2 container
* Use clang_tidy container
2019-02-05 16:07:43 +08:00
Jiaming Yuan
1088dff42c Prevent training without setting up caches. (#4066)
* Prevent training without setting up caches.

* Add warning for internal functions.
* Check number of features.

* Address reviewer's comment.
2019-02-03 01:03:29 -08:00
Nan Zhu
e0094d996e fix doc about max_depth (#4078)
* fix doc

* Update doc/parameter.rst

Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>
2019-01-30 12:53:44 -08:00
Tatsuhito KATO
15fe2f1e7c fix typos (#4027) 2018-12-28 00:36:47 +08:00
Jiaming Yuan
7735252925
Document num_parallel_tree. (#4022) 2018-12-25 22:00:58 +08:00
Nan Zhu
c055a32609
[jvm-packages]support multiple validation datasets in Spark (#3910)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* wrap iterators

* enable copartition training and validationset

* add parameters

* converge code path and have init unit test

* enable multi evals for ranking

* unit test and doc

* update example

* fix early stopping

* address the offline comments

* udpate doc

* test eval metrics

* fix compilation issue

* fix example
2018-12-17 21:03:57 -08:00
Andy Adinets
42bf90eb8f Column sampling at individual nodes (splits). (#3971)
* Column sampling at individual nodes (splits).

* Documented colsample_bynode parameter.

- also updated documentation for colsample_by* parameters

* Updated documentation.

* GetFeatureSet() returns shared pointer to std::vector.

* Sync sampled columns across multiple processes.
2018-12-14 22:37:35 +08:00
Jiaming Yuan
e0a279114e
Unify logging facilities. (#3982)
* Unify logging facilities.

* Enhance `ConsoleLogger` to handle different verbosity.
* Override macros from `dmlc`.
* Don't use specialized gamma when building with GPU.
* Remove verbosity cache in monitor.
* Test monitor.
* Deprecate `silent`.
* Fix doc and messages.
* Fix python test.
* Fix silent tests.
2018-12-14 19:29:58 +08:00
Rory Mitchell
93f9ce9ef9
Single precision histograms on GPU (#3965)
* Allow single precision histogram summation in gpu_hist

* Add python test, reduce run-time of gpu_hist tests

* Update documentation
2018-12-10 10:55:30 +13:00
Philip Hyunsu Cho
4f26053b09
Fix typo in Feature Interaction Constraints tutorial (#3975) 2018-12-06 19:38:40 -08:00
Philip Hyunsu Cho
e9ab4a1c6c
Address #3933: document limitation of DMLC CSV parser + recommend Pandas (#3934) 2018-11-23 04:13:36 -08:00
Jiaming Yuan
daf77ca7b7
Enable running objectives with 0 GPU. (#3878)
* Enable running objectives with 0 GPU.

* Enable 0 GPU for objectives.
* Add doc for GPU objectives.
* Fix some objectives defaulted to running on all GPUs.
2018-11-13 20:19:59 +13:00
Jiacheng Xu
d810e6dec9 Fix a typo in the R-package documentation: max.deph -> max.depth (#3890)
Signed-off-by: Jiacheng Xu <xjcmaxwellcjx@gmail.com>
2018-11-12 01:43:23 -08:00
Philip Hyunsu Cho
828d75714d
Fix #3857: take down AWS YARN tutorial, as it is outdated (#3885) 2018-11-08 23:08:32 -08:00
Jiaming Yuan
f1275f52c1
Fix specifying gpu_id, add tests. (#3851)
* Rewrite gpu_id related code.

* Remove normalised/unnormalised operatios.
* Address difference between `Index' and `Device ID'.
* Modify doc for `gpu_id'.
* Better LOG for GPUSet.
* Check specified n_gpus.
* Remove inappropriate `device_idx' term.
* Clarify GpuIdType and size_t.
2018-11-06 18:17:53 +13:00
Philip Hyunsu Cho
c22e90d5d2
Correct typo 2018-11-04 05:22:53 -08:00
Philip Hyunsu Cho
6da462234e
Move MinGW-w64 + Python section to the end, since it's 'advanced' (#3863) 2018-11-04 05:12:27 -08:00
Philip Hyunsu Cho
a650131fc3
Update doc: colsample_bylevel now works for tree_method=hist (#3862)
This feature was introduced by #3635
2018-11-04 02:25:25 -08:00
Philip Hyunsu Cho
583c88bce7 [jvm-packages] Require vanilla Apache Spark (#3854) 2018-11-01 19:15:40 -07:00
Jonathan Friedman
45d321da28 Fix typo in docs (#3852)
Fix typo in docs
2018-11-01 13:03:59 -07:00
Zhao Hang
e3c1afac6b Update parameter.rst (#3843) 2018-10-31 00:19:45 +13:00
Nan Zhu
5fbe230636
[jvm-packages] documenting tracker (#3831)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* documenting tracker

* Make it a separate note
2018-10-25 18:53:46 -07:00
Nan Zhu
4ae225a08d
[Blocking][jvm-packages] fix the early stopping feature (#3808)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* temp

* add method for classifier and regressor

* update tutorial

* address the comments

* update
2018-10-23 14:53:13 -07:00
KOLANICH
5480e05173 Added some instructions on using MinGW-built XGBoost with python. (#3774)
* Added some instructions on using MinGW-built XGBoost with python.

* Changes according to the discussion and some additions

* Fixed wording and removed redundancy.

* Even more fixes

* Fixed links. Removed redundancy.

* Some fixes according to the discussion

* fixes

* Some fixes

* fixes
2018-10-09 09:07:00 -07:00
Philip Hyunsu Cho
ca33bf6476
Document gblinear parameters: feature_selector and top_k (#3780) 2018-10-08 22:41:54 -07:00
Philip Hyunsu Cho
813d2436d3
Produce xgboost.so for XGBoost-R on Mac OSX, so that make install works (#3767)
* Produce xgboost.so for XGBoost-R on Mac OSX, so that `make install` works

* Modernize R build instructions

* Fix crossref
2018-10-07 14:09:54 -07:00
Philip Hyunsu Cho
91903ac5d4
Fix broken doc build due to Matplotlib 3.0 release (#3764) 2018-10-07 13:34:37 -07:00
Dmitriy Rybalko
7bbb44182a update eval_metric doc (#3687) 2018-09-14 08:47:05 -07:00
mrgutkun
4b43810f51 Fix #3663: Allow sklearn API to use callbacks (#3682)
* Fix #3663: Allow sklearn API to use callbacks

* Fix lint

* Add Callback API to Python API doc
2018-09-07 13:51:26 -07:00
Philip Hyunsu Cho
5a8bbb39a1
Revert #3677 and #3674 (#3678)
* Revert "Add scikit-learn as dependency for doc build (#3677)"

This reverts commit 308f664ade0547242608e21f6198c895415f03da.

* Revert "Add scikit-learn tests (#3674)"

This reverts commit d176a0fbc8165e3afe3e42ff464ab7b253211555.
2018-09-06 20:43:17 -07:00
Philip Hyunsu Cho
308f664ade
Add scikit-learn as dependency for doc build (#3677) 2018-09-06 14:56:05 -07:00
Philip Hyunsu Cho
190d888695
Document LambdaMART objectives: pairwise, listwise (#3672)
* Document LambdaMART objectives

* Distinguish between pairwise and listwise objectives
2018-09-06 09:54:37 -07:00
Philip Hyunsu Cho
9344f081a4
Add numpy and matplotlib as requirements for doc build (#3669) 2018-09-04 20:56:18 -07:00
Andrew Thia
9254c58e4d [TREE] add interaction constraints (#3466)
* add interaction constraints

* enable both interaction and monotonic constraints at the same time

* fix lint

* add R test, fix lint, update demo

* Use dmlc::JSONReader to express interaction constraints as nested lists; Use sparse arrays for bookkeeping

* Add Python test for interaction constraints

* make R interaction constraints parameter based on feature index instead of column names, fix R coding style

* Fix lint

* Add BlueTea88 to CONTRIBUTORS.md

* Short circuit when no constraint is specified; address review comments

* Add tutorial for feature interaction constraints

* allow interaction constraints to be passed as string, remove redundant column_names argument

* Fix typo

* Address review comments

* Add comments to Python test
2018-09-04 09:35:39 -07:00
gorogm
7ef2b599c7 Link fixed. (#3640) 2018-08-27 20:25:50 -07:00
Philip Hyunsu Cho
cb4de521c1
Document CUDA requirement, lack of external memory on GPU (#3624)
* Document fact that GPU doesn't support external memory

* Document CUDA requirement
2018-08-22 22:47:10 -07:00
Philip Hyunsu Cho
4ed8a88240
Update Python API doc (#3619)
* Add XGBRanker to Python API doc

* Show inherited members of XGBRegressor in API doc, since XGBRegressor uses default methods from XGBModel

* Add table of contents to Python API doc

* Skip JVM doc download if not available

* Show inherited members for XGBRegressor and XGBRanker

* Expose XGBRanker to Python XGBoost module directory

* Add docstring to XGBRegressor.predict() and XGBRanker.predict()

* Fix rendering errors in Python docstrings

* Fix lint
2018-08-22 18:59:30 -07:00
Grant W Schneider
57f3c2f252 Remove errant $ (#3618) 2018-08-21 12:32:38 -07:00
Philip Hyunsu Cho
b13c3a8bcc
Fix #3609: Removed unused parameter 'use_buffer' (#3610) 2018-08-21 07:54:15 -07:00
trivialfis
cf2d86a4f6 Add travis sanitizers tests. (#3557)
* Add travis sanitizers tests.

* Add gcc-7 in Travis.
* Add SANITIZER_PATH for CMake.
* Enable sanitizer tests in Travis.

* Fix memory leaks in tests.

* Fix all memory leaks reported by Address Sanitizer.
* tests/cpp/helpers.h/CreateDMatrix now returns raw pointer.
2018-08-19 16:40:30 +12:00
Philip Hyunsu Cho
983cb0b374
Add option to disable default metric (#3606) 2018-08-18 11:39:20 -07:00
Grace Lam
caf4a756bf Add JSON dump functionality documentation (#3600) 2018-08-16 16:32:04 -07:00
Jakob Richter
725f4c36f2 replace nround with nrounds to match actual parameter (#3592) 2018-08-15 11:13:53 -07:00