xgboost

Author	SHA1	Message	Date
Philip Hyunsu Cho	ade3f30237	Fix list formatting in missing value tutorial in XGBoost4J-Spark	2019-05-06 14:24:02 -07:00
Philip Hyunsu Cho	b511638ca1	Fix list formatting in missing value tutorial in XGBoost4J-Spark	2019-05-06 14:21:49 -07:00
Daniel Hen	eabcc0e210	[jvm-packages] Tutorial on handling missing values (#4425 ) Add tutorial on missing values and how to handle those within XGBoost.	2019-05-06 13:57:18 -07:00
tqchen	91c513a0c1	fix doc	2019-04-29 17:50:46 -07:00
Ravi Kalia	146e83f3b3	Fix typo in model.rst (#4393 )	2019-04-27 14:22:07 -07:00
Philip Hyunsu Cho	ea850ecd20	[CI] Refactor Jenkins CI pipeline + migrate all Linux tests to Jenkins (#4401 ) * All Linux tests are now in Jenkins CI * Tests are now de-coupled from builds. We can now build XGBoost with one version of CUDA/JDK and test it with another version of CUDA/JDK * Builds (compilation) are significantly faster because 1) They use C5 instances with faster CPU cores; and 2) build environment setup is cached using Docker containers	2019-04-26 18:39:12 -07:00
Nan Zhu	65db8d0626	[jvm-packages] support spark 2.4 and compatibility test with previous xgboost version (#4377 ) * bump spark version * keep float.nan * handle brokenly changed name/value * add test * add model files * add model files * update doc	2019-04-17 11:33:13 -07:00
Jiaming Yuan	207f058711	Refactor CMake scripts. (#4323 ) * Refactor CMake scripts. * Remove CMake CUDA wrapper. * Bump CMake version for CUDA. * Use CMake to handle Doxygen. * Split up CMakeList. * Export install target. * Use modern CMake. * Remove build.sh * Workaround for gpu_hist test. * Use cmake 3.12. * Revert machine.conf. * Move CLI test to gpu. * Small cleanup. * Support using XGBoost as submodule. * Fix windows * Fix cpp tests on Windows * Remove duplicated find_package.	2019-04-15 10:08:12 -07:00
Yang Yang	c7bc739ed2	Fix document about colsample_by* parameter (#4340 ) Correct the calculation mistake in colsample_by* example.	2019-04-08 11:10:04 -07:00
sriramch	2f7087eba1	Improve HostDeviceVector exception safety (#4301 ) * make the assignments of HostDeviceVector exception safe. * storing a dummy GPUDistribution instance in HDV for CPU based code. * change testxgboost binary location to build directory.	2019-03-31 22:48:58 +08:00
Jiaming Yuan	29a1356669	Deprecate `reg:linear' in favor of` reg:squarederror'. (#4267 ) * Deprecate `reg:linear' in favor of `reg:squarederror'. * Replace the use of `reg:linear'. * Replace the use of `silent`.	2019-03-17 17:55:04 +08:00
Jiaming Yuan	cf8d5b9b76	Mark CUDA 10.1 as unsupported. (#4265 )	2019-03-17 16:59:15 +08:00
Jiaming Yuan	7b1b11390a	Mark Scikit-Learn RF interface as experimental in doc. (#4258 ) * Mark Scikit-Learn RF interface as experimental in doc.	2019-03-16 00:45:32 +08:00
Andy Adinets	a36c3ed4f4	Added SKLearn-like random forest Python API. (#4148 ) * Added SKLearn-like random forest Python API. - added XGBRFClassifier and XGBRFRegressor classes to SKL-like xgboost API - also added n_gpus and gpu_id parameters to SKL classes - added documentation describing how to use xgboost for random forests, as well as existing caveats	2019-03-12 22:28:19 +08:00
Rory Mitchell	4eeeded7d1	Remove various synchronisations from cuda API calls, instrument monitor (#4205 ) * Remove various synchronisations from cuda API calls, instrument monitor with nvtx profiler ranges.	2019-03-10 15:01:23 +13:00
Philip Hyunsu Cho	331cd3e4f7	Document limitation of one-split-at-a-time Greedy tree learning heuristic (#4233 )	2019-03-08 10:05:39 -08:00
Jonas	00ea7b83c9	Fix docs for `num_parallel_tree` (#4221 ) Minor formatting correction for `num_parallel_tree`.	2019-03-06 23:47:51 +08:00
Philip Hyunsu Cho	67c38805a1	Update build doc: PyPI wheel now support multi-GPU (#4219 )	2019-03-05 13:25:31 -08:00
Adam November	0c1d5f1120	Fix snapshot artifact name in docs. (#4196 )	2019-03-03 13:27:50 -08:00
Matthew Jones	92b7577c62	[REVIEW] Enable Multi-Node Multi-GPU functionality (#4095 ) * Initial commit to support multi-node multi-gpu xgboost using dask * Fixed NCCL initialization by not ignoring the opg parameter. - it now crashes on NCCL initialization, but at least we're attempting it properly * At the root node, perform a rabit::Allreduce to get initial sum_gradient across workers * Synchronizing in a couple of more places. - now the workers don't go down, but just hang - no more "wild" values of gradients - probably needs syncing in more places * Added another missing max-allreduce operation inside BuildHistLeftRight * Removed unnecessary collective operations. * Simplified rabit::Allreduce() sync of gradient sums. * Removed unnecessary rabit syncs around ncclAllReduce. - this improves performance _significantly_ (7x faster for overall training, 20x faster for xgboost proper) * pulling in latest xgboost * removing changes to updater_quantile_hist.cc * changing use_nccl_opg initialization, removing unnecessary if statements * added definition for opaque ncclUniqueId struct to properly encapsulate GetUniqueId * placing struct defintion in guard to avoid duplicate code errors * addressing linting errors * removing * removing additional arguments to AllReduer initialization * removing distributed flag * making comm init symmetric * removing distributed flag * changing ncclCommInit to support multiple modalities * fix indenting * updating ncclCommInitRank block with necessary group calls * fix indenting * adding print statement, and updating accessor in vector * improving print statement to end-line * generalizing nccl_rank construction using rabit * assume device_ordinals is the same for every node * test, assume device_ordinals is identical for all nodes * test, assume device_ordinals is unique for all nodes * changing names of offset variable to be more descriptive, editing indenting * wrapping ncclUniqueId GetUniqueId() and aesthetic changes * adding synchronization, and tests for distributed * adding to tests * fixing broken #endif * fixing initialization of gpu histograms, correcting errors in tests * adding to contributors list * adding distributed tests to jenkins * fixing bad path in distributed test * debugging * adding kubernetes for distributed tests * adding proper import for OrderedDict * adding urllib3==1.22 to address ordered_dict import error * added sleep to allow workers to save their models for comparison * adding name to GPU contributors under docs	2019-03-02 10:03:22 +13:00
Yanbo Liang	9fefa2128d	[jvm-packages] Fix early stop with xgboost4j-spark (#4176 ) * Fix early stop with xgboost4j-spark * Update XGBoost.java * Update XGBoost.java * Update XGBoost.java To use -Float.MAX_VALUE as the lower bound, in case there is positive metric. * Only update best score if the current score is better (no update when equal) * Update xgboost-spark tutorial to fix early stopping docs.	2019-03-01 13:02:57 -08:00
Jiaming Yuan	754fe8142b	Make `HistCutMatrix::Init' be aware of groups. (#4115 ) * Add checks for group size. * Simple docs. * Search group index during hist cut matrix initialization. Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2019-02-16 04:39:41 +08:00
Nan Zhu	ae3bb9c2d5	Distributed Fast Histogram Algorithm (#4011 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * init * allow hist algo * more changes * temp * update * remove hist sync * udpate rabit * change hist size * change the histogram * update kfactor * sync per node stats * temp * update * final * code clean * update rabit * more cleanup * fix errors * fix failed tests * enforce c++11 * fix lint issue * broadcast subsampled feature correctly * revert some changes * fix lint issue * enable monotone and interaction constraints * don't specify default for monotone and interactions * update docs	2019-02-05 05:12:53 -08:00
Jiaming Yuan	8905df4a18	Perform clang-tidy on both cpp and cuda source. (#4034 ) * Basic script for using compilation database. * Add `GENERATE_COMPILATION_DATABASE' to CMake. * Rearrange CMakeLists.txt. * Add basic python clang-tidy script. * Remove modernize-use-auto. * Add clang-tidy to Jenkins * Refine logic for correct path detection In Jenkins, the project root is of form /home/ubuntu/workspace/xgboost_PR-XXXX * Run clang-tidy in CUDA 9.2 container * Use clang_tidy container	2019-02-05 16:07:43 +08:00
Jiaming Yuan	1088dff42c	Prevent training without setting up caches. (#4066 ) * Prevent training without setting up caches. * Add warning for internal functions. * Check number of features. * Address reviewer's comment.	2019-02-03 01:03:29 -08:00
Nan Zhu	e0094d996e	fix doc about max_depth (#4078 ) * fix doc * Update doc/parameter.rst Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>	2019-01-30 12:53:44 -08:00
Tatsuhito KATO	15fe2f1e7c	fix typos (#4027 )	2018-12-28 00:36:47 +08:00
Jiaming Yuan	7735252925	Document num_parallel_tree. (#4022 )	2018-12-25 22:00:58 +08:00
Nan Zhu	c055a32609	[jvm-packages]support multiple validation datasets in Spark (#3910 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * wrap iterators * enable copartition training and validationset * add parameters * converge code path and have init unit test * enable multi evals for ranking * unit test and doc * update example * fix early stopping * address the offline comments * udpate doc * test eval metrics * fix compilation issue * fix example	2018-12-17 21:03:57 -08:00
Andy Adinets	42bf90eb8f	Column sampling at individual nodes (splits). (#3971 ) * Column sampling at individual nodes (splits). * Documented colsample_bynode parameter. - also updated documentation for colsample_by* parameters * Updated documentation. * GetFeatureSet() returns shared pointer to std::vector. * Sync sampled columns across multiple processes.	2018-12-14 22:37:35 +08:00
Jiaming Yuan	e0a279114e	Unify logging facilities. (#3982 ) * Unify logging facilities. * Enhance `ConsoleLogger` to handle different verbosity. * Override macros from `dmlc`. * Don't use specialized gamma when building with GPU. * Remove verbosity cache in monitor. * Test monitor. * Deprecate `silent`. * Fix doc and messages. * Fix python test. * Fix silent tests.	2018-12-14 19:29:58 +08:00
Rory Mitchell	93f9ce9ef9	Single precision histograms on GPU (#3965 ) * Allow single precision histogram summation in gpu_hist * Add python test, reduce run-time of gpu_hist tests * Update documentation	2018-12-10 10:55:30 +13:00
Philip Hyunsu Cho	4f26053b09	Fix typo in Feature Interaction Constraints tutorial (#3975 )	2018-12-06 19:38:40 -08:00
Philip Hyunsu Cho	e9ab4a1c6c	Address #3933 : document limitation of DMLC CSV parser + recommend Pandas (#3934 )	2018-11-23 04:13:36 -08:00
Jiaming Yuan	daf77ca7b7	Enable running objectives with 0 GPU. (#3878 ) * Enable running objectives with 0 GPU. * Enable 0 GPU for objectives. * Add doc for GPU objectives. * Fix some objectives defaulted to running on all GPUs.	2018-11-13 20:19:59 +13:00
Jiacheng Xu	d810e6dec9	Fix a typo in the R-package documentation: max.deph -> max.depth (#3890 ) Signed-off-by: Jiacheng Xu <xjcmaxwellcjx@gmail.com>	2018-11-12 01:43:23 -08:00
Philip Hyunsu Cho	828d75714d	Fix #3857 : take down AWS YARN tutorial, as it is outdated (#3885 )	2018-11-08 23:08:32 -08:00
Jiaming Yuan	f1275f52c1	Fix specifying gpu_id, add tests. (#3851 ) * Rewrite gpu_id related code. * Remove normalised/unnormalised operatios. * Address difference between `Index' and `Device ID'. * Modify doc for `gpu_id'. * Better LOG for GPUSet. * Check specified n_gpus. * Remove inappropriate `device_idx' term. * Clarify GpuIdType and size_t.	2018-11-06 18:17:53 +13:00
Philip Hyunsu Cho	c22e90d5d2	Correct typo	2018-11-04 05:22:53 -08:00
Philip Hyunsu Cho	6da462234e	Move MinGW-w64 + Python section to the end, since it's 'advanced' (#3863 )	2018-11-04 05:12:27 -08:00
Philip Hyunsu Cho	a650131fc3	Update doc: colsample_bylevel now works for tree_method=hist (#3862 ) This feature was introduced by #3635	2018-11-04 02:25:25 -08:00
Philip Hyunsu Cho	583c88bce7	[jvm-packages] Require vanilla Apache Spark (#3854 )	2018-11-01 19:15:40 -07:00
Jonathan Friedman	45d321da28	Fix typo in docs (#3852 ) Fix typo in docs	2018-11-01 13:03:59 -07:00
Zhao Hang	e3c1afac6b	Update parameter.rst (#3843 )	2018-10-31 00:19:45 +13:00
Nan Zhu	5fbe230636	[jvm-packages] documenting tracker (#3831 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * documenting tracker * Make it a separate note	2018-10-25 18:53:46 -07:00
Nan Zhu	4ae225a08d	[Blocking][jvm-packages] fix the early stopping feature (#3808 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * temp * add method for classifier and regressor * update tutorial * address the comments * update	2018-10-23 14:53:13 -07:00
KOLANICH	5480e05173	Added some instructions on using MinGW-built XGBoost with python. (#3774 ) * Added some instructions on using MinGW-built XGBoost with python. * Changes according to the discussion and some additions * Fixed wording and removed redundancy. * Even more fixes * Fixed links. Removed redundancy. * Some fixes according to the discussion * fixes * Some fixes * fixes	2018-10-09 09:07:00 -07:00
Philip Hyunsu Cho	ca33bf6476	Document gblinear parameters: feature_selector and top_k (#3780 )	2018-10-08 22:41:54 -07:00
Philip Hyunsu Cho	813d2436d3	Produce xgboost.so for XGBoost-R on Mac OSX, so that `make install` works (#3767 ) * Produce xgboost.so for XGBoost-R on Mac OSX, so that `make install` works * Modernize R build instructions * Fix crossref	2018-10-07 14:09:54 -07:00
Philip Hyunsu Cho	91903ac5d4	Fix broken doc build due to Matplotlib 3.0 release (#3764 )	2018-10-07 13:34:37 -07:00

1 2 3 4 5 ...

290 Commits