xgboost

Author	SHA1	Message	Date
Matvey Turkov	61f764946f	fixed year to 2019 in conf.py, helpers.h and LICENSE (#4661 )	2019-07-15 12:29:12 -04:00
Mingjie Tang	beb7b295a8	Add tutorial for distributed training and batch prediction with Kubernetes (#4621 ) * provide the readme * update for format * reformat * reformat -2 * update again * update format * update w.r.t yinlou's comments * Add kubernetes tutorial to Table of Contents * Style edit	2019-07-14 23:27:27 -07:00
Nan Zhu	3e339d9557	contribute to community doc (#4646 ) * add community doc * update * update	2019-07-14 21:29:57 -07:00
sriramch	7a388cbf8b	Modify caching allocator/vector and fix issues relating to inability to train large datasets (#4615 )	2019-07-09 18:33:27 +12:00
Xu Xiao	cd1526d3b1	fix auc error in distributed mode caused by unbalanced dataset (#4645 )	2019-07-08 16:01:52 +08:00
Rong Ou	30204b50fe	fix spark tests on machines with many cores (#4634 )	2019-07-07 16:02:56 -07:00
Philip Hyunsu Cho	d333918f5e	[jvm-packages] Expose setMissing method in XGBoostClassificationModel / XGBoostRegressionModel (#4643 )	2019-07-07 16:02:44 -07:00
Philip Hyunsu Cho	1aaf4a679d	Fix early stopping in the Python package (#4638 ) * Fix #4630, #4421: Preserve correct ordering between metrics, and always use last metric for early stopping * Clarify semantics of early stopping in presence of multiple valid sets and metrics * Add a test * Fix lint	2019-07-07 01:01:03 -07:00
Marcos	562d9ae963	Eliminate FutureWarning: Series.base is deprecated (#4337 ) * Remove all references to data.base Should eliminate the deprecation warning in issue #4300 * Fix lint	2019-07-04 21:06:23 -07:00
Jiaming Yuan	d9a47794a5	Fix CPU hist init for sparse dataset. (#4625 ) * Fix CPU hist init for sparse dataset. * Implement sparse histogram cut. * Allow empty features. * Fix windows build, don't use sparse in distributed environment. * Comments. * Smaller threshold. * Fix windows omp. * Fix msvc lambda capture. * Fix MSVC macro. * Fix MSVC initialization list. * Fix MSVC initialization list x2. * Preserve categorical feature behavior. * Rename matrix to sparse cuts. * Reuse UseGroup. * Check for categorical data when adding cut. Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * Sanity check. * Fix comments. * Fix comment.	2019-07-04 16:27:03 -07:00
Philip Hyunsu Cho	b7a1f22d24	Empty evaluation list in early stopping should produce meaningful error message (#4633 ) * Empty evaluation list should not break early stopping * Fix lint * Update callback.py	2019-07-04 13:27:18 -07:00
Philip Hyunsu Cho	4df246191f	Add warning when save_model() is called from scikit-learn interface (#4632 )	2019-07-03 23:37:53 -07:00
Philip Hyunsu Cho	96bf91725b	Support ndcg- and map- (#4635 )	2019-07-03 22:51:48 -07:00
Philip Hyunsu Cho	4e9fad74eb	[R] Use built-in label when xgb.DMatrix is given to xgb.cv() (#4631 ) * Use built-in label when xgb.DMatrix is given to xgb.cv() * Add a test * Fix test * Bump version number	2019-07-03 01:32:40 -07:00
Oleksandr Pryimak	986fee6022	pytest tests/python fails if no pandas installed (#4620 ) * _maybe_pandas_xxx should return their arguments unchanged if no pandas installed * Tests should not assume pandas is installed * Mark tests which require pandas as such	2019-07-01 02:54:08 +08:00
Jiaming Yuan	45876bf41b	Fix external memory for get column batches. (#4622 ) * Fix external memory for get column batches. This fixes two bugs: * Use PushCSC for get column batches. * Don't remove the created temporary directory before finishing test. * Check all pages.	2019-06-30 09:56:49 +08:00
Philip Hyunsu Cho	a30176907f	Support Dask 2.0 (#4617 )	2019-06-27 20:42:35 -07:00
Oleksandr Pryimak	923e6c86ba	Add to documentation how to run tests locally (#4610 ) * Add to documentation how to build native unit tests * Add instructions to run Python tests and to use Docker container [skip ci] * Fix link to pytest chapter * Add link to Google Test [skip ci] * Set PYTHONPATH [skip ci] * Revise test_python.sh for running tests locally * Update test_python.sh * Place Docker recommendation notice in a prominent place [skip ci]	2019-06-27 19:02:04 -07:00
Rong Ou	63ec95623d	fix gpu predictor when dmatrix is mismatched with model (#4613 )	2019-06-28 11:03:02 +12:00
Egor Smirnov	4d6590be3c	Optimize ‘hist’ for multi-core CPU (#4529 ) * Initial performance optimizations for xgboost * remove includes * revert float->double * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * Check existence of _mm_prefetch and __builtin_prefetch * Fix lint * optimizations for CPU * appling comments in review * add some comments, code refactoring * fixing issues in CI * adding runtime checks * remove 1 extra check * remove extra checks in BuildHist * remove checks * add debug info * added debug info * revert changes * added comments * Apply suggestions from code review Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * apply review comments * Remove unused function CreateNewNodes() * Add descriptive comment on node_idx variable in QuantileHistMaker::Builder::BuildHistsBatch()	2019-06-27 11:33:49 -07:00
Nan Zhu	abffbe014e	[jvm-packages] delete all constraints from spark layer about obj and eval metrics and handle error in jvm layer (#4560 ) * temp * prediction part * remove supported* * add for test * fix param name * add rabit * update rabit * return value of rabit init * eliminate compilation warnings * update rabit * shutdown * update rabit again * check sparkcontext shutdown * fix logic * sleep * fix tests * test with relaxed threshold * create new thread each time * stop for job quitting * udpate rabit * update rabit * update rabit * update git modules	2019-06-27 08:47:37 -07:00
Philip Hyunsu Cho	dd01f7c4f5	Use Sphinx 2.1+ to compile documentation [skip ci] (#4609 )	2019-06-26 16:26:22 -07:00
Philip Hyunsu Cho	cd3a3f99da	Fix doc for customized objective/metric [skip ci] (#4608 )	2019-06-26 13:40:34 -07:00
Jiaming Yuan	5b2f805e74	Doc and demo for customized metric and obj. (#4598 ) Co-Authored-By: Theodore Vasiloudis <theodoros.vasiloudis@gmail.com>	2019-06-26 16:13:12 +08:00
Jiaming Yuan	8bdf15120a	Implement tree model dump with code generator. (#4602 ) * Implement tree model dump with a code generator. * Split up generators. * Implement graphviz generator. * Use pattern matching. * [Breaking] Return a Source in `to_graphviz` instead of Digraph in Python package. Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2019-06-26 15:20:44 +08:00
Nan Zhu	fe2de6f415	[jvm-packages]fix silly bug in feature scoring (#4604 )	2019-06-25 20:49:01 -07:00
Philip Hyunsu Cho	1f98f18cb8	Add instruction to run formatting checks locally [skip ci] (#4591 )	2019-06-24 00:09:09 -07:00
Jiaming Yuan	2cff735126	Update doc for feature constraints and `n_gpus`. (#4596 ) * Update doc for feature constraints. * Fix some warnings. * Clean up doc for `n_gpus`.	2019-06-23 14:37:22 +08:00
Andy Adinets	9fa29ad753	Set reg_lambda=1e-5 for scikit-learn-like random forest classes. (#4558 )	2019-06-22 08:02:13 +12:00
Philip Hyunsu Cho	30e1cb4e9e	Fix docstring for XGBModel.predict() [skip ci] (#4592 )	2019-06-21 12:44:42 +08:00
Rong Ou	77fc28427d	fix benchmark_tree.py (#4593 )	2019-06-21 11:51:48 +12:00
Jiaming Yuan	9494950ee7	Address some sphinx warnings and errors, add doc for building doc. (#4589 )	2019-06-20 15:07:36 -07:00
Rong Ou	6125521caf	fix compiler warning (#4588 )	2019-06-21 04:06:26 +08:00
Jiaming Yuan	fdf27a5b82	Fix race condition in interaction constraint. (#4587 ) * Split up the kernel to sync write. * QueryNode is no-longer used in Query, but kept for testing.	2019-06-21 02:47:48 +08:00
Rory Mitchell	221e163185	Refactor out row partitioning logic from gpu_hist, introduce caching device vectors (#4554 )	2019-06-20 18:24:09 +12:00
Philip Hyunsu Cho	0c50f8417a	[CI] Specify account ID when logging into ECR Docker registry (#4584 ) * [CI] Specify account ID when logging into ECR Docker registry * Do not display awscli login command	2019-06-19 15:08:42 -07:00
Jiaming Yuan	ae05948e32	Feature interaction for GPU Hist. (#4534 ) * GPU hist Interaction Constraints. * Duplicate related parameters. * Add tests for CPU interaction constraint. * Add better error reporting. * Thorough tests.	2019-06-19 18:11:02 +08:00
Philip Hyunsu Cho	570374effe	[CI] Remove CUDA 8.0 from CI pipeline (#4580 )	2019-06-18 23:38:03 -07:00
Rong Ou	e94f85f0e4	Deprecate single node multi-gpu mode (#4579 ) * deprecate multi-gpu training * add single node * add warning	2019-06-19 15:51:38 +12:00
sriramch	6757654337	Optimizations for quantisation on device (#4572 ) * - do not create device vectors for the entire sparse page while computing histograms... - while creating the compressed histogram indices, the row vector is created for the entire sparse page batch. this is needless as we only process chunks at a time based on a slice of the total gpu memory - this pr will allocate only as much as required to store the ppropriate row indices and the entries * - do not dereference row_ptrs once the device_vector has been created to elide host copies of those counts - instead, grab the entry counts directly from the sparsepage	2019-06-19 10:50:25 +12:00
Rong Ou	ba1d848767	Remove doc about not supporting cuda 10.1 (#4578 )	2019-06-19 10:44:59 +12:00
Daniel Stahl	7ae11c9284	[jvm-packages] updated kryo dependency to 2.22 (#4575 )	2019-06-18 09:26:54 -07:00
sriramch	90f683b25b	Set the appropriate device before freeing device memory... (#4566 ) * - set the appropriate device before freeing device memory... - pr #4532 added a global memory tracker/logger to keep track of number of (de)allocations and peak memory usage on a per device basis. - this pr adds the appropriate check to make sure that the (de)allocation counts and memory usages makes sense for the device. since verbosity is typically increased on debug/non-retail builds. * - pre-create cub allocators and reuse them - create them once and not resize them dynamically. we need to ensure that these allocators are created and destroyed exactly once so that the appropriate device id's are set	2019-06-18 14:58:05 +12:00
sriramch	a22368d210	Choose the appropriate tree method only when the tree method is auto (#4571 ) * Remove redundant checks.	2019-06-17 18:16:45 +08:00
Jiaming Yuan	66f9951d70	Mark SparsePageDmatrix destructor default. (#4568 )	2019-06-15 11:21:50 +08:00
Jiaming Yuan	c5719cc457	Offload some configurations into GBM. (#4553 ) This is part 1 of refactoring configuration. * Move tree heuristic configurations. * Split up declarations and definitions for GBTree. * Implement UseGPU in gbm.	2019-06-14 09:18:51 +08:00
sriramch	a2042b685a	- training with external memory - part 2 of 2 (#4526 ) * - training with external memory - part 2 of 2 - when external memory support is enabled, building of histogram indices are done incrementally for every sparse page - the entire set of input data is divided across multiple gpu's and the relative row positions within each device is tracked when building the compressed histogram buffer - this was tested using a mortgage dataset containing ~ 670m rows before 4xt4's could be saturated	2019-06-12 09:52:56 +12:00
Jiaming Yuan	4591039eba	Remove remaining reg:linear. (#4544 )	2019-06-11 16:04:09 +08:00
Jiaming Yuan	4e9965cb9d	Fix Python demo and doc. (#4545 ) * Remove old doc. * Fix checking __stdin__.	2019-06-11 08:58:41 +08:00
Jiaming Yuan	2f1319f273	Add `rmsle` metric and `reg:squaredlogerror` objective (#4541 )	2019-06-11 05:48:27 +08:00

1 2 3 4 5 ...

3808 Commits