xgboost

Author	SHA1	Message	Date
Mingjie Tang	beb7b295a8	Add tutorial for distributed training and batch prediction with Kubernetes (#4621 ) * provide the readme * update for format * reformat * reformat -2 * update again * update format * update w.r.t yinlou's comments * Add kubernetes tutorial to Table of Contents * Style edit	2019-07-14 23:27:27 -07:00
Nan Zhu	3e339d9557	contribute to community doc (#4646 ) * add community doc * update * update	2019-07-14 21:29:57 -07:00
Oleksandr Pryimak	923e6c86ba	Add to documentation how to run tests locally (#4610 ) * Add to documentation how to build native unit tests * Add instructions to run Python tests and to use Docker container [skip ci] * Fix link to pytest chapter * Add link to Google Test [skip ci] * Set PYTHONPATH [skip ci] * Revise test_python.sh for running tests locally * Update test_python.sh * Place Docker recommendation notice in a prominent place [skip ci]	2019-06-27 19:02:04 -07:00
Philip Hyunsu Cho	dd01f7c4f5	Use Sphinx 2.1+ to compile documentation [skip ci] (#4609 )	2019-06-26 16:26:22 -07:00
Philip Hyunsu Cho	cd3a3f99da	Fix doc for customized objective/metric [skip ci] (#4608 )	2019-06-26 13:40:34 -07:00
Jiaming Yuan	5b2f805e74	Doc and demo for customized metric and obj. (#4598 ) Co-Authored-By: Theodore Vasiloudis <theodoros.vasiloudis@gmail.com>	2019-06-26 16:13:12 +08:00
Philip Hyunsu Cho	1f98f18cb8	Add instruction to run formatting checks locally [skip ci] (#4591 )	2019-06-24 00:09:09 -07:00
Jiaming Yuan	2cff735126	Update doc for feature constraints and `n_gpus`. (#4596 ) * Update doc for feature constraints. * Fix some warnings. * Clean up doc for `n_gpus`.	2019-06-23 14:37:22 +08:00
Jiaming Yuan	9494950ee7	Address some sphinx warnings and errors, add doc for building doc. (#4589 )	2019-06-20 15:07:36 -07:00
Rong Ou	e94f85f0e4	Deprecate single node multi-gpu mode (#4579 ) * deprecate multi-gpu training * add single node * add warning	2019-06-19 15:51:38 +12:00
Rong Ou	ba1d848767	Remove doc about not supporting cuda 10.1 (#4578 )	2019-06-19 10:44:59 +12:00
Jiaming Yuan	2f1319f273	Add `rmsle` metric and `reg:squaredlogerror` objective (#4541 )	2019-06-11 05:48:27 +08:00
Rory Mitchell	399fabed49	Deprecate gpu_exact, bump required cuda version in docs (#4527 )	2019-06-03 09:49:05 +12:00
Rory Mitchell	972f693eaf	Fix dask API sphinx docstrings (#4507 ) * Fix dask API sphinx docstrings * Update GPU docs page	2019-05-28 16:39:26 +12:00
Rory Mitchell	09b90d9329	Add native support for Dask (#4473 ) * Add native support for Dask * Add multi-GPU demo * Add sklearn example	2019-05-27 13:29:28 +12:00
Rory Mitchell	8ddd2715ee	Add python RF documentation (#4500 )	2019-05-24 23:30:24 -07:00
Philip Hyunsu Cho	515f5f5c47	[RFC] Version 0.90 release candidate (#4475 ) * Release 0.90 * Add script to automatically generate acknowledgment * Update NEWS.md	2019-05-20 01:02:44 -07:00
Nan Zhu	adcd8ea7c6	Update xgboost4j_spark_tutorial.rst (#4476 )	2019-05-17 04:17:57 +00:00
Shaochen Shi	18e4fc3690	[jvm-packages] Automatically set maximize_evaluation_metrics if not explicitly given in XGBoost4J-Spark (#4446 ) * Automatically set maximize_evaluation_metrics if not explicitly given. * When custom_eval is set, require maximize_evaluation_metrics. * Update documents on early stop in XGBoost4J-Spark. * Fix code error.	2019-05-09 12:49:44 -07:00
Philip Hyunsu Cho	ade3f30237	Fix list formatting in missing value tutorial in XGBoost4J-Spark	2019-05-06 14:24:02 -07:00
Philip Hyunsu Cho	b511638ca1	Fix list formatting in missing value tutorial in XGBoost4J-Spark	2019-05-06 14:21:49 -07:00
Daniel Hen	eabcc0e210	[jvm-packages] Tutorial on handling missing values (#4425 ) Add tutorial on missing values and how to handle those within XGBoost.	2019-05-06 13:57:18 -07:00
tqchen	91c513a0c1	fix doc	2019-04-29 17:50:46 -07:00
Ravi Kalia	146e83f3b3	Fix typo in model.rst (#4393 )	2019-04-27 14:22:07 -07:00
Philip Hyunsu Cho	ea850ecd20	[CI] Refactor Jenkins CI pipeline + migrate all Linux tests to Jenkins (#4401 ) * All Linux tests are now in Jenkins CI * Tests are now de-coupled from builds. We can now build XGBoost with one version of CUDA/JDK and test it with another version of CUDA/JDK * Builds (compilation) are significantly faster because 1) They use C5 instances with faster CPU cores; and 2) build environment setup is cached using Docker containers	2019-04-26 18:39:12 -07:00
Nan Zhu	65db8d0626	[jvm-packages] support spark 2.4 and compatibility test with previous xgboost version (#4377 ) * bump spark version * keep float.nan * handle brokenly changed name/value * add test * add model files * add model files * update doc	2019-04-17 11:33:13 -07:00
Jiaming Yuan	207f058711	Refactor CMake scripts. (#4323 ) * Refactor CMake scripts. * Remove CMake CUDA wrapper. * Bump CMake version for CUDA. * Use CMake to handle Doxygen. * Split up CMakeList. * Export install target. * Use modern CMake. * Remove build.sh * Workaround for gpu_hist test. * Use cmake 3.12. * Revert machine.conf. * Move CLI test to gpu. * Small cleanup. * Support using XGBoost as submodule. * Fix windows * Fix cpp tests on Windows * Remove duplicated find_package.	2019-04-15 10:08:12 -07:00
Yang Yang	c7bc739ed2	Fix document about colsample_by* parameter (#4340 ) Correct the calculation mistake in colsample_by* example.	2019-04-08 11:10:04 -07:00
sriramch	2f7087eba1	Improve HostDeviceVector exception safety (#4301 ) * make the assignments of HostDeviceVector exception safe. * storing a dummy GPUDistribution instance in HDV for CPU based code. * change testxgboost binary location to build directory.	2019-03-31 22:48:58 +08:00
Jiaming Yuan	29a1356669	Deprecate `reg:linear' in favor of` reg:squarederror'. (#4267 ) * Deprecate `reg:linear' in favor of `reg:squarederror'. * Replace the use of `reg:linear'. * Replace the use of `silent`.	2019-03-17 17:55:04 +08:00
Jiaming Yuan	cf8d5b9b76	Mark CUDA 10.1 as unsupported. (#4265 )	2019-03-17 16:59:15 +08:00
Jiaming Yuan	7b1b11390a	Mark Scikit-Learn RF interface as experimental in doc. (#4258 ) * Mark Scikit-Learn RF interface as experimental in doc.	2019-03-16 00:45:32 +08:00
Andy Adinets	a36c3ed4f4	Added SKLearn-like random forest Python API. (#4148 ) * Added SKLearn-like random forest Python API. - added XGBRFClassifier and XGBRFRegressor classes to SKL-like xgboost API - also added n_gpus and gpu_id parameters to SKL classes - added documentation describing how to use xgboost for random forests, as well as existing caveats	2019-03-12 22:28:19 +08:00
Rory Mitchell	4eeeded7d1	Remove various synchronisations from cuda API calls, instrument monitor (#4205 ) * Remove various synchronisations from cuda API calls, instrument monitor with nvtx profiler ranges.	2019-03-10 15:01:23 +13:00
Philip Hyunsu Cho	331cd3e4f7	Document limitation of one-split-at-a-time Greedy tree learning heuristic (#4233 )	2019-03-08 10:05:39 -08:00
Jonas	00ea7b83c9	Fix docs for `num_parallel_tree` (#4221 ) Minor formatting correction for `num_parallel_tree`.	2019-03-06 23:47:51 +08:00
Philip Hyunsu Cho	67c38805a1	Update build doc: PyPI wheel now support multi-GPU (#4219 )	2019-03-05 13:25:31 -08:00
Adam November	0c1d5f1120	Fix snapshot artifact name in docs. (#4196 )	2019-03-03 13:27:50 -08:00
Matthew Jones	92b7577c62	[REVIEW] Enable Multi-Node Multi-GPU functionality (#4095 ) * Initial commit to support multi-node multi-gpu xgboost using dask * Fixed NCCL initialization by not ignoring the opg parameter. - it now crashes on NCCL initialization, but at least we're attempting it properly * At the root node, perform a rabit::Allreduce to get initial sum_gradient across workers * Synchronizing in a couple of more places. - now the workers don't go down, but just hang - no more "wild" values of gradients - probably needs syncing in more places * Added another missing max-allreduce operation inside BuildHistLeftRight * Removed unnecessary collective operations. * Simplified rabit::Allreduce() sync of gradient sums. * Removed unnecessary rabit syncs around ncclAllReduce. - this improves performance _significantly_ (7x faster for overall training, 20x faster for xgboost proper) * pulling in latest xgboost * removing changes to updater_quantile_hist.cc * changing use_nccl_opg initialization, removing unnecessary if statements * added definition for opaque ncclUniqueId struct to properly encapsulate GetUniqueId * placing struct defintion in guard to avoid duplicate code errors * addressing linting errors * removing * removing additional arguments to AllReduer initialization * removing distributed flag * making comm init symmetric * removing distributed flag * changing ncclCommInit to support multiple modalities * fix indenting * updating ncclCommInitRank block with necessary group calls * fix indenting * adding print statement, and updating accessor in vector * improving print statement to end-line * generalizing nccl_rank construction using rabit * assume device_ordinals is the same for every node * test, assume device_ordinals is identical for all nodes * test, assume device_ordinals is unique for all nodes * changing names of offset variable to be more descriptive, editing indenting * wrapping ncclUniqueId GetUniqueId() and aesthetic changes * adding synchronization, and tests for distributed * adding to tests * fixing broken #endif * fixing initialization of gpu histograms, correcting errors in tests * adding to contributors list * adding distributed tests to jenkins * fixing bad path in distributed test * debugging * adding kubernetes for distributed tests * adding proper import for OrderedDict * adding urllib3==1.22 to address ordered_dict import error * added sleep to allow workers to save their models for comparison * adding name to GPU contributors under docs	2019-03-02 10:03:22 +13:00
Yanbo Liang	9fefa2128d	[jvm-packages] Fix early stop with xgboost4j-spark (#4176 ) * Fix early stop with xgboost4j-spark * Update XGBoost.java * Update XGBoost.java * Update XGBoost.java To use -Float.MAX_VALUE as the lower bound, in case there is positive metric. * Only update best score if the current score is better (no update when equal) * Update xgboost-spark tutorial to fix early stopping docs.	2019-03-01 13:02:57 -08:00
Jiaming Yuan	754fe8142b	Make `HistCutMatrix::Init' be aware of groups. (#4115 ) * Add checks for group size. * Simple docs. * Search group index during hist cut matrix initialization. Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2019-02-16 04:39:41 +08:00
Nan Zhu	ae3bb9c2d5	Distributed Fast Histogram Algorithm (#4011 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * init * allow hist algo * more changes * temp * update * remove hist sync * udpate rabit * change hist size * change the histogram * update kfactor * sync per node stats * temp * update * final * code clean * update rabit * more cleanup * fix errors * fix failed tests * enforce c++11 * fix lint issue * broadcast subsampled feature correctly * revert some changes * fix lint issue * enable monotone and interaction constraints * don't specify default for monotone and interactions * update docs	2019-02-05 05:12:53 -08:00
Jiaming Yuan	8905df4a18	Perform clang-tidy on both cpp and cuda source. (#4034 ) * Basic script for using compilation database. * Add `GENERATE_COMPILATION_DATABASE' to CMake. * Rearrange CMakeLists.txt. * Add basic python clang-tidy script. * Remove modernize-use-auto. * Add clang-tidy to Jenkins * Refine logic for correct path detection In Jenkins, the project root is of form /home/ubuntu/workspace/xgboost_PR-XXXX * Run clang-tidy in CUDA 9.2 container * Use clang_tidy container	2019-02-05 16:07:43 +08:00
Jiaming Yuan	1088dff42c	Prevent training without setting up caches. (#4066 ) * Prevent training without setting up caches. * Add warning for internal functions. * Check number of features. * Address reviewer's comment.	2019-02-03 01:03:29 -08:00
Nan Zhu	e0094d996e	fix doc about max_depth (#4078 ) * fix doc * Update doc/parameter.rst Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>	2019-01-30 12:53:44 -08:00
Tatsuhito KATO	15fe2f1e7c	fix typos (#4027 )	2018-12-28 00:36:47 +08:00
Jiaming Yuan	7735252925	Document num_parallel_tree. (#4022 )	2018-12-25 22:00:58 +08:00
Nan Zhu	c055a32609	[jvm-packages]support multiple validation datasets in Spark (#3910 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * wrap iterators * enable copartition training and validationset * add parameters * converge code path and have init unit test * enable multi evals for ranking * unit test and doc * update example * fix early stopping * address the offline comments * udpate doc * test eval metrics * fix compilation issue * fix example	2018-12-17 21:03:57 -08:00
Andy Adinets	42bf90eb8f	Column sampling at individual nodes (splits). (#3971 ) * Column sampling at individual nodes (splits). * Documented colsample_bynode parameter. - also updated documentation for colsample_by* parameters * Updated documentation. * GetFeatureSet() returns shared pointer to std::vector. * Sync sampled columns across multiple processes.	2018-12-14 22:37:35 +08:00
Jiaming Yuan	e0a279114e	Unify logging facilities. (#3982 ) * Unify logging facilities. * Enhance `ConsoleLogger` to handle different verbosity. * Override macros from `dmlc`. * Don't use specialized gamma when building with GPU. * Remove verbosity cache in monitor. * Test monitor. * Deprecate `silent`. * Fix doc and messages. * Fix python test. * Fix silent tests.	2018-12-14 19:29:58 +08:00

1 2 3 4 5 ...

309 Commits