xgboost

Author	SHA1	Message	Date
Henry Gouk	69454d9487	Implementation of hinge loss for binary classification (#3477 )	2018-08-07 10:06:42 +12:00
Philip Hyunsu Cho	44811f2330	Fix #3485 , #3540 : Don't use dropout for predicting test sets (#3556 ) * Fix #3485, #3540: Don't use dropout for predicting test sets Dropout (for DART) should only be used at training time. * Add regression test	2018-08-05 10:17:21 -07:00
Philip Hyunsu Cho	109473dae2	Fix #3545 : XGDMatrixCreateFromCSCEx silently discards empty trailing rows (#3553 ) * Fix #3545: XGDMatrixCreateFromCSCEx silently discards empty trailing rows Description: The bug is triggered when 1. The data matrix has empty rows at the bottom. More precisely, the rows `n-k+1`, `n-k+2`, ..., `n` of the matrix have missing values in all dimensions (`n` number of instances, `k` number of trailing rows) 2. The data matrix is given as Compressed Sparse Column (CSC) format. Diagnosis: When the CSC matrix is converted to Compressed Sparse Row (CSR) format (this is common format used for DMatrix), the trailing empty rows are silently ignored. More specifically, the row pointer (`offset`) of the newly created CSR matrix does not take account of these rows. Fix: Modify the row pointer. * Add regression test	2018-08-05 10:15:42 -07:00
Philip Hyunsu Cho	8c633d1ca3	Fix #3505 : Prevent undefined behavior due to incorrectly sized base_margin (#3555 ) The base margin will need to have length `[num_class] * [number of data points]`. Otherwise, the array holding prediction results will be only partially initialized, causing undefined behavior. Fix: check the length of the base margin. If the length is not correct, use the global bias (`base_score`) instead. Warn the user about the substitution.	2018-08-05 10:14:07 -07:00
Philip Hyunsu Cho	4a429a7c4f	Add reg:tweedie to supported objectives in XGBoost4J-Spark (#3552 )	2018-08-05 07:42:59 -07:00
Philip Hyunsu Cho	7fefd6865d	Fix #3402 : wrong fid crashes distributed algorithm (#3535 ) * Fix #3402: wrong fid crashes distributed algorithm The bug was introduced by the recent DMatrix refactor (#3301). It was partially fixed by #3408 but the example in #3402 was still failing. The example in #3402 will succeed after this fix is applied. * Explicitly specify "this" to prevent compile error * Add regression test * Add distributed test to Travis matrix * Install kubernetes Python package as dependency of dmlc tracker * Add Python dependencies * Add compile step * Reduce size of regression test case * Further reduce size of test	2018-08-04 19:20:04 -07:00
Nan Zhu	31d1baba3d	[jvm-packages] Tutorial of XGBoost4J-Spark (#3534 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * add new * update doc * finish Gang Scheduling * more * intro * Add sections: Prediction, Model persistence and ML pipeline. * Add XGBoost4j-Spark MLlib pipeline example * partial finished version * finish the doc * adjust code * fix the doc * use rst * Convert XGBoost4J-Spark tutorial to reST * Bring XGBoost4J up to date * add note about using hdfs * remove duplicate file * fix descriptions * update doc * Wrap HDFS/S3 export support as a note * update * wrap indexing_mode example in code block	2018-08-03 21:17:50 -07:00
trivialfis	34dc9155ab	Use __CUDA__ macro with __NVCC__. (#3539 ) * __CUDA__ is defined in clang. Making the change won't make clang compile xgboost, but syntax checking from clang is at least partially working.	2018-08-02 22:04:23 +12:00
Philip Hyunsu Cho	70026655b0	Clarify supported OSes for XGBoost4J published JARs (#3547 )	2018-08-01 19:51:44 -07:00
Philip Hyunsu Cho	437b368b1f	Update dmlc-core submodule (#3546 ) This bring many goodies, including: * Ability to specify delimiter and weight_column for CSV files: ```python dtrain = xgboost.DMatrix('train.csv?format=csv&label_column=0&weight_column=1&delimiter= ') ``` * Ability to choose between 0-based and 1-based indexing for LIBSVM/LIBFM files: ```python dtrain = xgboost.DMatrix('train.libsvm?indexing_mode=1') # use 1-based indexing dtest = xgboost.DMatrix('test.libsvm') # use 0-based indexing (default) dtest2 = xgboost.DMatrix('test2.libsvm?indexing_mode=-1') # use heuristic to detect 0-based / 1-based ``` * Fix a bug in float parsing (issue dmlc/dmlc-core#440)	2018-08-01 15:15:40 -07:00
Nan Zhu	6cf97b4eae	[jvm-packages] consider spark.task.cpus when controlling parallelism (#3530 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * consider spark.task.cpus when controlling parallelism * fix bug * fix conf setup * calculate requestedCores within ParallelismController * enforce spark.task.cpus = 1 * unify unit test case framework * enable spark ui	2018-07-31 06:19:45 -07:00
trivialfis	860263f814	Enable building with sanitizers. (#3525 )	2018-07-31 17:25:47 +12:00
Nan Zhu	b546321c83	[jvm-packages] the current version of xgboost does not consider missing value in prediction (#3529 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * consider missing value in prediction * handle single prediction instance * fix type conversion	2018-07-30 14:16:24 -07:00
wenduowang	3b62e75f2e	Fix bug of using list(x) function when x is string (#3432 ) * Fix bug of using list(x) function when x is string list('abcdcba') = ['a', 'b', 'c', 'd', 'c', 'b', 'a'] * Allow feature_names/feature_types to be of any type If feature_names/feature_types is iterable, e.g. tuple, list, then convert the value to list, except for string; otherwise construct a list with a single value * Delete excess whitespace * Fix whitespace to pass lint	2018-07-30 07:36:34 -07:00
jqmp	dd07c25d12	Fix typo in ElasticNet threshold function (#3527 )	2018-07-30 14:08:14 +12:00
Philip Hyunsu Cho	2bb9b9d3db	Fix typo in parameter.rst, gblinear section (#3518 )	2018-07-28 18:58:15 -07:00
Nan Zhu	b5178d3d99	[jvm-packages] a better explanation about the inconsistent issue (#3524 )	2018-07-28 17:34:39 -07:00
hlsc	5850a2558a	fix DMatrix load_row_split bug (#3431 )	2018-07-28 17:21:30 -07:00
trivialfis	8973f2cb0e	Fix building dmlc-core from xgboost. (#3522 ) Move building dmlc-core before adding DMLC_LOG_CUSTOMIZE. Fix #3520.	2018-07-28 10:35:11 -07:00
Uddeshya Singh	3363b9142e	Update faq.rst (#3521 ) Just fixing a minor typo	2018-07-28 10:34:14 -07:00
Rory Mitchell	07ff52d54c	Dynamically allocate GPU histogram memory (#3519 ) * Expand histogram memory dynamically to prevent large allocations for large tree depths (e.g. > 15) * Remove GPU memory allocation messages. These are misleading as a large number of allocations are now dynamic. * Fix appveyor R test	2018-07-28 21:22:41 +12:00
Brandon Greenwell	b5fad42da2	Issue warning when requesting bivariate plotting (#3516 )	2018-07-27 16:15:37 -07:00
Philip Hyunsu Cho	8a5209c55e	Fix model saving for 'count:possion': max_delta_step as Booster attribute (#3515 ) * Save max_delta_step as an extra attribute of Booster Fixes #3509 and #3026, where `max_delta_step` parameter gets lost during serialization. * fix lint * Use camel case for global constant * disable local variable case in clang-tidy	2018-07-27 09:55:54 -07:00
Andy Adinets	cc6a5a3666	Added finding quantiles on GPU. (#3393 ) * Added finding quantiles on GPU. - this includes datasets where weights are assigned to data rows - as the quantiles found by the new algorithm are not the same as those found by the old one, test thresholds in tests/python-gpu/test_gpu_updaters.py have been adjusted. * Adjustments and improved testing for finding quantiles on the GPU. - added C++ tests for the DeviceSketch() function - reduced one of the thresholds in test_gpu_updaters.py - adjusted the cuts found by the find_cuts_k kernel	2018-07-27 14:03:16 +12:00
Nan Zhu	e2f09db77a	[jvm-packages] minor fix for parameter name in example (#3507 )	2018-07-25 19:57:40 -07:00
Rory Mitchell	a725272e19	Correct mistake from dmatrix refactor (#3408 )	2018-07-24 15:03:36 +12:00
jqmp	e9a97e0d88	Add total_gain and total_cover importance measures (#3498 ) Add `'total_gain'` and `'total_cover'` as possible `importance_type` arguments to `Booster.get_score` in the Python package. `get_score` already accepts a `'gain'` argument, which returns each feature's average gain over all of its splits. `'total_gain'` does the same, but returns a total rather than an average. This seems more intuitively meaningful, and also matches the behavior of the R package's `xgb.importance` function. I also added an analogous `'total_cover'` command for consistency. This should resolve #3484.	2018-07-23 00:30:55 -07:00
KOLANICH	a1505de631	Added configuration for python into .editorconfig (#3494 ) * Added configuration for python into .editorconfig * Fixed forgotten change in the number of spaces	2018-07-23 00:24:10 -07:00
KOLANICH	a393d44c5d	Improved library loading a bit (#3481 ) * Improved library loading a bit * Fixed indentation. * Fixes according to the discussion * Moved the comment to a separate line. * specified exception type	2018-07-20 16:03:44 -07:00
Philip Hyunsu Cho	8e90b60c4d	Fix relpath in setup.py on Windows (#3493 ) * Fix relpath in setup.py on Windows Fixes #3480. * Use only one lib file; use 4 space indent	2018-07-20 12:28:08 -07:00
Philip Hyunsu Cho	05b089405d	Doc modernization (#3474 ) * Change doc build to reST exclusively * Rewrite Intro doc in reST; create toctree * Update parameter and contribute * Convert tutorials to reST * Convert Python tutorials to reST * Convert CLI and Julia docs to reST * Enable markdown for R vignettes * Done migrating to reST * Add guzzle_sphinx_theme to requirements * Add breathe to requirements * Fix search bar * Add link to user forum	2018-07-19 14:22:16 -07:00
Yanbo Liang	c004cea788	Expose setCustomObj & setCustomEval for XGBoostClassifier & XGBoostRegressor. (#3486 )	2018-07-17 21:16:51 -07:00
KOLANICH	b6dcbf0e07	Added .editorconfig (#3478 )	2018-07-17 20:05:55 -07:00
Rory Mitchell	0f145a0365	Resolve GPU bug on large files (#3472 ) Remove calls to thrust copy, fix indexing bug	2018-07-16 20:43:45 +12:00
Rory Mitchell	1b59316444	Updates for GPU CI tests (#3467 ) * Fail GPU CI after test failure * Fix GPU linear tests * Reduced number of GPU tests to speed up CI * Remove static allocations of device memory * Resolve illegal memory access for updater_fast_hist.cc * Fix broken r tests dependency * Update python install documentation for GPU	2018-07-16 18:05:53 +12:00
Henry Gouk	a13e29ece1	Add LASSO (#3429 ) * Allow multiple split constraints * Replace RidgePenalty with ElasticNet * Add test for checking Ridge, LASSO, and Elastic Net are implemented	2018-07-15 16:38:26 +12:00
Yanbo Liang	2f8764955c	[JVM-packages] Support single instance prediction. (#3464 ) * Support single instance prediction. * Address comments.	2018-07-12 14:17:53 -07:00
Thejaswi	2200939416	Upgrading to NCCL2 (#3404 ) * Upgrading to NCCL2 * Part - II of NCCL2 upgradation - Doc updates to build with nccl2 - Dockerfile.gpu update for a correct CI build with nccl2 - Updated FindNccl package to have env-var NCCL_ROOT to take precedence * Upgrading to v9.2 for CI workflow, since it has the nccl2 binaries available * Added NCCL2 license + copy the nccl binaries into /usr location for the FindNccl module to find * Set LD_LIBRARY_PATH variable to pick nccl2 binary at runtime * Need the nccl2 library download instructions inside Dockerfile.release as well * Use NCCL2 as a static library	2018-07-10 00:42:15 -07:00
Thejaswi	a6331925d2	Upgrade cuda version to 9.2 for CI workflows (#3460 ) - Needed by the issue #3404 - as v9.1 doesn't have a nccl2 release	2018-07-08 23:04:51 -07:00
Philip Hyunsu Cho	b40959042c	Document 0.72.1 version (#3458 )	2018-07-08 15:42:09 -07:00
kodonnell	6bed54ac39	python sklearn api: defaulting to best_ntree_limit if defined, otherwise current behaviour (#3445 ) * python sklearn api: defaulting to best_ntree_limit if defined, otherwise current behaviour * Fix whitespace	2018-07-08 14:35:52 -07:00
ngoyal2707	cb017d0c9a	[jvm-packages] removed old group_data from spark api (#3451 )	2018-07-07 22:21:01 -07:00
Nan Zhu	aa90e5c6ce	[jvm-packages] disable booster setup for xgboost4j-spark (#3456 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * disable booster setup in spark * check in parameter conversion * fix compilation issue * update exception type	2018-07-07 21:57:24 -07:00
Philip Hyunsu Cho	66e74d2223	Fix get_uint_info() (#3442 ) * Add regression test	2018-07-05 20:06:59 -07:00
Philip Hyunsu Cho	48d6e68690	Add callback interface to re-direct console output (#3438 ) * Add callback interface to re-direct console output * Exempt TrackerLogger from custom logging * Fix lint	2018-07-05 11:32:30 -07:00
Philip Hyunsu Cho	45bf4fbffb	Add a notice for binary PyPI wheel (#3443 )	2018-07-05 08:28:43 -07:00
Tianqi Chen	01aff45f26	Update README.md	2018-07-04 13:09:32 -07:00
Tianqi Chen	e62639c59b	[DOCS] Update link to readme (#3437 )	2018-07-04 12:24:33 -07:00
Yanbo Liang	aec6299c49	[jvm-packages] Expose nativeBooster for XGBoostClassificationModel and XGBoostRegressionModel. (#3428 )	2018-07-01 15:06:16 -07:00
Nikita Titov	295252249e	fixed MinGW missed dll (#3430 )	2018-07-01 16:43:33 +00:00

1 2 3 4 5 ...

3366 Commits