xgboost

Author	SHA1	Message	Date
Yun Ni	b678e1711d	[jvm-packages] Add SparkParallelismTracker to prevent job from hanging (#2697 ) * Add SparkParallelismTracker to prevent job from hanging * Code review comments * Code Review Comments * Fix unit tests * Changes and unit test to catch the corner case. * Update documentations * Small improvements * cancalAllJobs is problematic with scalatest. Remove it * Code Review Comments * Check number of executor cores beforehand, and throw exeception if any core is lost. * Address CR Comments * Add missing class * Fix flaky unit test * Address CR comments * Remove redundant param for TaskFailedListener	2017-10-16 20:18:47 -07:00
Scott Lundberg	78c4188cec	SHAP values for feature contributions (#2438 ) * SHAP values for feature contributions * Fix commenting error * New polynomial time SHAP value estimation algorithm * Update API to support SHAP values * Fix merge conflicts with updates in master * Correct submodule hashes * Fix variable sized stack allocation * Make lint happy * Add docs * Fix typo * Adjust tolerances * Remove unneeded def * Fixed cpp test setup * Updated R API and cleaned up * Fixed test typo	2017-10-12 12:35:51 -07:00
Guang Wei Yu	ff9180cd73	Add a new winning solution to demo/README.md (#2778 )	2017-10-09 18:07:07 -04:00
Julian Niedermeier	9a81c74a7b	Add xgb_model parameter to sklearn fit (#2623 ) Adding xgb_model paramter allows the continuation of model training. Model has to be saved by calling `model.get_booster().save_model(path)`	2017-10-01 08:47:17 -04:00
Icyblade Dai	6e378452f2	coding style update (#2752 ) * coding style update Current coding style varies(for example: the mixed use of single quote and double quote), and it will be confusing, especially for new users. This PR will try to follow proposal of PEP8, make the documents more readable. * minor fix	2017-10-01 08:42:15 -04:00
Rory Mitchell	4cb2f7598b	-Add experimental GPU algorithm for lossguided mode (#2755 ) -Improved GPU algorithm unit tests -Removed some thrust code to improve compile times	2017-10-01 00:18:35 +13:00
Sergei Lebedev	69c3b78a29	[jvm-packages] Implemented early stopping (#2710 ) * Allowed subsampling test from the training data frame/RDD The implementation requires storing 1 - trainTestRatio points in memory to make the sampling work. An alternative approach would be to construct the full DMatrix and then slice it deterministically into train/test. The peak memory consumption of such scenario, however, is twice the dataset size. * Removed duplication from 'XGBoost.train' Scala callers can (and should) use names to supply a subset of parameters. Method overloading is not required. * Reuse XGBoost seed parameter to stabilize train/test splitting * Added early stopping support to non-distributed XGBoost Closes #1544 * Added early-stopping to distributed XGBoost * Moved construction of 'watches' into a separate method This commit also fixes the handling of 'baseMargin' which previously was not added to the validation matrix. * Addressed review comments	2017-09-29 12:06:22 -07:00
Vadim Khotilovich	74db9757b3	[R package] GPU support (#2732 ) * [R] MSVC compatibility * [GPU] allow seed in BernoulliRng up to size_t and scale to uint32_t * R package build with cmake and CUDA * R package CUDA build fixes and cleanups * always export the R package native initialization routine on windows * update the install instructions doc * fix lint * use static_cast directly to set BernoulliRng seed * [R] demo for GPU accelerated algorithm * tidy up the R package cmake stuff * R pack cmake: installs main dependency packages if needed * [R] version bump in DESCRIPTION * update NEWS * added short missing/sparse values explanations to FAQ	2017-09-28 18:15:28 -05:00
Icyblade Dai	5c9f01d0a9	minor typo (#2751 ) * minor typo * typo * Update discoverYourData.md	2017-09-28 07:45:10 +02:00
Andrew Hannigan	5c9f0ff9d9	Check existance of seed/nthread keys before checking their value. (#2669 )	2017-09-27 03:05:59 -04:00
Philip Cho	0eaf43a5e1	A hack to fix broken search bar in doc (#2583 ) Current version of xgboost.readthedocs.io has a broken search box. Enabling themes on ReadTheDocs is known to break the search function, as reported in [this document](https://github.com/rtfd/readthedocs.org/issues/1487). To get around the bug, we replace the `searchtools.js` file with our custom version.	2017-09-27 03:05:10 -04:00
Philip Cho	31ad40b963	Make __del__ method idempotent (#2627 ) Addresses Issue #2533.	2017-09-27 03:03:55 -04:00
Tsukasa OMOTO	8d15024ac7	python: follow the default warning filters of Python (#2666 ) * python: follow the default warning filters of Python https://docs.python.org/3/library/warnings.html#default-warning-filters * update tests * update tests	2017-09-27 03:03:01 -04:00
zhxfl	178517524f	fix bug for demo/multiclass_classification/train.py (#2747 )	2017-09-25 22:37:21 -05:00
Sergei Lebedev	d570337262	[jvm-packages] (xgboost-spark) preserving num_class across save & load (#2742 ) * [bugfix] (xgboost-spark) preserving num_class across save & load * add testcase for save & load of multiclass model	2017-09-24 16:03:30 +02:00
Dmitry Mottl	c09204fa70	Update faq.md (#2727 ) Changed dead link to actual one	2017-09-20 08:17:42 +02:00
Icyblade Dai	0e85b30fdd	Fix issue 2670 (#2671 ) * fix issue 2670 * add python<3.6 compatibility * fix Index * fix Index/MultiIndex * fix lint * fix W0622 really nonsense * fix lambda * Trigger Travis * add test for MultiIndex * remove tailing whitespace	2017-09-19 15:49:41 -04:00
Dmitry Mottl	ee80f348de	Fixed links in faq.md (#2726 )	2017-09-19 09:23:24 -07:00
Nan Zhu	1190dc62a7	Update CONTRIBUTORS.md (#2719 )	2017-09-17 15:07:57 -07:00
Rory Mitchell	55ba362154	Fix cuda 9.0 compilation (#2718 )	2017-09-17 17:13:11 +12:00
Mahmoud Rawas	a7ce4d2462	Returning back LabeledPoint into public, in referece to the discussion in : https://github.com/dmlc/xgboost/pull/2532#discussion_r137172759 (#2677 )	2017-09-10 20:45:43 -07:00
Rory Mitchell	9c85903f0b	Add GPU documentation (#2695 ) * Add GPU documentation * Update Python GPU tests	2017-09-10 19:42:46 +12:00
Rory Mitchell	e6a9063344	Integer gradient summation for GPU histogram algorithm. (#2681 )	2017-09-08 15:07:29 +12:00
Rory Mitchell	15267eedf2	[GPU-Plugin] Major refactor 2 (#2664 ) * Change cmake option * Move source files * Move google tests * Move python tests * Move benchmarks * Move documentation * Remove makefile support * Fix test run * Move GPU tests	2017-09-08 09:57:16 +12:00
Yun Ni	8244f6f120	Use Sudo-enabled VM which has 7.5GB memory (#2680 )	2017-09-07 08:36:37 -07:00
Yun Ni	f04bde05fd	Add Coverage Report for Java and Python (#2667 ) * Add coverage report for java * Add coverage report for python * Increase memory for JVM unit tests * Increase memory for JVM unit tests	2017-09-05 14:46:51 -07:00
SimonAB	2e9d06443e	Add show_values option to feature importances plot (#2351 ) Adding an option to remove the values from the features importances plot in Python.	2017-08-31 12:26:54 -05:00
PSEUDOTENSOR / Jonathan McKinney	0664298bb2	Update sklearn API to pass along n_jobs to DMatrix creation (#2658 )	2017-08-31 15:24:59 +12:00
Rory Mitchell	19a53814ce	[GPU-Plugin] Major refactor (#2644 ) * Removal of redundant code/files. * Removal of exact namespace in GPU plugin * Revert double precision histograms to single precision for performance on Maxwell/Kepler	2017-08-30 10:53:52 +12:00
Sergei Lebedev	39adba51c5	Fixed compilation on Scala 2.10 (#2629 )	2017-08-28 10:59:39 -07:00
Yun Ni	a00157543d	Support instance weights for xgboost4j-spark (#2642 ) * Support instance weights for xgboost4j-spark * Use 0.001 instead of 0 for weights * Address CR comments	2017-08-28 09:03:20 -07:00
Evan Culver	ba16475c3a	Fix past participle tense in docs (#2637 )	2017-08-25 14:16:57 +02:00
Rory Mitchell	70071fc38c	Fix demo typo (#2632 )	2017-08-23 17:21:51 +02:00
Boris Kostenko	cd366ecb4b	fix build in case of spaces in path to make (#2619 )	2017-08-23 02:29:33 -03:00
Rory Mitchell	332b26df95	Update GPU acceleration demo (#2617 ) * Update GPU acceleration demo * Fix parameter formatting	2017-08-19 21:27:48 +12:00
Rory Mitchell	5661a67d20	Add parallel sort for MSVC (#2609 )	2017-08-17 17:14:39 +12:00
Rory Mitchell	ef23e424f1	[GPU-Plugin] Add GPU accelerated prediction (#2593 ) * [GPU-Plugin] Add GPU accelerated prediction * Improve allocation message * Update documentation * Resolve linker error for predictor * Add unit tests	2017-08-16 12:31:59 +12:00
Rory Mitchell	71e5e622b1	Update cub submodule again (fixes GPU build) (#2599 )	2017-08-13 22:14:40 +12:00
Rory Mitchell	ac2d0d0ac5	Updated cub submodule reference (#2597 )	2017-08-12 23:00:56 -07:00
Vadim Khotilovich	e04e2fbe2c	revert shallow submodule for cub (#2591 )	2017-08-11 20:19:04 -07:00
Sergei Lebedev	771a95aec6	[jvm-packages] Added baseMargin to ml.dmlc.xgboost4j.LabeledPoint (#2532 ) * Converted ml.dmlc.xgboost4j.LabeledPoint to Scala This allows to easily integrate LabeledPoint with Spark DataFrame APIs, which support encoding/decoding case classes out of the box. Alternative solution would be to keep LabeledPoint in Java and make it a Bean by generating boilerplate getters/setters. I have decided against that, even thought the conversion in this PR implies a public API change. I also had to remove the factory methods fromSparseVector and fromDenseVector because a) they would need to be duplicated to support overloaded calls with extra data (e.g. weight); and b) Scala would expose them via mangled $.MODULE$ which looks ugly in Java. Additionally, this commit makes it possible to switch to LabeledPoint in all public APIs and effectively to pass initial margin/group as part of the point. This seems to be the only reliable way of implementing distributed learning with these data. Note that group size format used by single-node XGBoost is not compatible with that scenario, since the partition split could divide a group into two chunks. * Switched to ml.dmlc.xgboost4j.LabeledPoint in RDD-based public APIs Note that DataFrame-based and Flink APIs are not affected by this change. * Removed baseMargin argument in favour of the LabeledPoint field * Do a single pass over the partition in buildDistributedBoosters Note that there is no formal guarantee that val repartitioned = rdd.repartition(42) repartitioned.zipPartitions(repartitioned.map(_ + 1)) { it1, it2, => ... } would do a single shuffle, but in practice it seems to be always the case. * Exposed baseMargin in DataFrame-based API * Addressed review comments * Pass baseMargin to XGBoost.trainWithDataFrame via params * Reverted MLLabeledPoint in Spark APIs As discussed, baseMargin would only be supported for DataFrame-based APIs. * Cleaned up baseMargin tests - Removed RDD-based test, since the option is no longer exposed via public APIs - Changed DataFrame-based one to check that adding a margin actually affects the prediction * Pleased Scalastyle * Addressed more review comments * Pleased scalastyle again * Fixed XGBoost.fromBaseMarginsToArray which always returned an array of NaNs even if base margin was not specified. Surprisingly this only failed a few tests.	2017-08-10 14:29:26 -07:00
PSEUDOTENSOR / Jonathan McKinney	c1104f7d0a	[GPU-Plugin] Add throw of asserts and added compute compatibility error check. (#2565 ) * [GPU-Plugin] Added compute compatibility error check, added verbose timing	2017-08-10 16:07:07 +12:00
René Scheibe	75ea07b847	Fix parameter documentation inconsistencies (#2584 ) * fix indentation - otherwise list items are rendered incorrectly * consistency: no spaces inside square brackets	2017-08-07 19:07:10 +02:00
René Scheibe	a0c5bde024	Fix typo in sklearn documentation (#2580 )	2017-08-07 19:06:11 +02:00
Vadim Khotilovich	2b3a4318c5	Several fixes (#2572 ) * repared serialization after update process; fixes #2545 * non-stratified folds in python could omit some data instances * Makefile: fixes for older makes on windows; clean R-package too * make cub to be a shallow submodule * improve $(MAKE) recovery	2017-08-06 13:03:50 -05:00
Philip Cho	70b65a282c	Use jQuery 2.2.4 (#2581 )	2017-08-05 15:37:38 -07:00
Rory Mitchell	eda9e180f0	[GPU-Plugin] Various fixes (#2579 ) * Fix test large * Add check for max_depth 0 * Update readme * Add LBS specialisation for dense data * Add bst_gpair_precise * Temporarily disable accuracy tests on test_large.py * Solve unused variable compiler warning * Fix max_bin > 1024 error	2017-08-05 22:16:23 +12:00
Philip Cho	03e213c7cd	Fix documentation for a misspelled parameter (#2569 )	2017-08-02 21:50:09 +12:00
Rory Mitchell	0e06d1805d	[WIP] Extract prediction into separate interface (#2531 ) * [WIP] Extract prediction into separate interface * Add copyright, fix linter errors * Add predictor to amalgamation * Fix documentation * Move prediction cache into predictor, add GBTreeModel * Updated predictor doc comments	2017-07-28 17:01:03 -07:00
Vadim Khotilovich	00eda28b3c	MinGW: shared library prefix and appveyor CI (#2539 ) * for MinGW, drop the 'lib' prefix from shared library name * fix defines for 'g++ 4.8 or higher' to include g++ >= 5 * fix compile warnings * [Appveyor] add MinGW with python; remove redundant jobs * [Appveyor] also do python build for one of msvc jobs	2017-07-25 01:06:47 -05:00

1 2 3 4 5 ...

3252 Commits