3592 Commits

Author SHA1 Message Date
Zhao Hang
e3c1afac6b Update parameter.rst (#3843) 2018-10-31 00:19:45 +13:00
Matthew Tovbin
d81fedb955 [jvm-packages] RabitTracker for Scala: allow specifying host ip from the xgboost-tracker.properties file (#3833) 2018-10-26 22:01:36 -07:00
Nan Zhu
5fbe230636
[jvm-packages] documenting tracker (#3831)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* documenting tracker

* Make it a separate note
2018-10-25 18:53:46 -07:00
Philip Hyunsu Cho
d83c818000
Recommend pickling as the way to save XGBClassifier / XGBRegressor / XGBRanker (#3829)
The `save_model()` and `load_model()` method only saves the part of the model
that's common to all language interfaces and do not preserve Python-specific
attributes, such as `feature_names`. More crucially, label encoder is not
preserved either; this is needed for the scikit-learn wrapper, since you may
have string labels.

Fix: Explicitly recommend pickling as the way to save scikit-learn model
objects.
2018-10-25 11:12:41 -07:00
Andy Adinets
2a59ff2f9b Multi-GPU support in GPUPredictor. (#3738)
* Multi-GPU support in GPUPredictor.

- GPUPredictor is multi-GPU
- removed DeviceMatrix, as it has been made obsolete by using HostDeviceVector in DMatrix

* Replaced pointers with spans in GPUPredictor.

* Added a multi-GPU predictor test.

* Fix multi-gpu test.

* Fix n_rows < n_gpus.

* Reinitialize shards when GPUSet is changed.
* Tests range of data.

* Remove commented code.

* Remove commented code.
2018-10-23 22:59:11 -07:00
Bruno Tremblay
32de54fdee Update R-package/R/xgb.ggplot.R (#3820)
Changed width parameter of var important ggplot from 0.05 to 0.5 to make it more visible when displaying more variables.
2018-10-23 20:52:33 -07:00
Philip Hyunsu Cho
02130af47d
Enable auto-locking of issues closed long ago (#3821)
* Enable auto-locking of issues closed long ago

Issues that were closed more than 90 days ago will be locked automatically so
that no additional comments would be allowed. We will use a bot to do
this: https://probot.github.io/apps/lock/

Background: As a maintainer, I often see people leaving comments to old issue
posts that were closed long ago. Those comments are hard to discover and assist
with, since they get buried under list of other active issues.

With the change, users who want to follow up with an old issue would be asked
to file a new issue.

* Exempt `feature-request` from auto locking

* Disable comment to avoid triggering notification
2018-10-23 19:21:58 -07:00
Nan Zhu
4ae225a08d
[Blocking][jvm-packages] fix the early stopping feature (#3808)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* temp

* add method for classifier and regressor

* update tutorial

* address the comments

* update
2018-10-23 14:53:13 -07:00
Philip Hyunsu Cho
e26b5d63b2 [jvm-packages] Upgrade Scala to 2.11.12 to address CVE-2017-15288 (#3816)
A privilege escalation vulnerability (CVE-2017-15288) has been
identified in the Scala compilation daemon. See
https://nvd.nist.gov/vuln/detail/CVE-2017-15288

Fix: Upgrade Scala to 2.11.12.
2018-10-22 10:15:30 -07:00
Philip Hyunsu Cho
abf2f661be
Fix #3708: Use dmlc::TemporaryDirectory to handle temporaries in cross-platform way (#3783)
* Fix #3708: Use dmlc::TemporaryDirectory to handle temporaries in cross-platform way

Also install git inside NVIDIA GPU container

* Update dmlc-core
2018-10-18 10:16:04 -07:00
Philip Hyunsu Cho
55ee9a92a1
Fix Python environment for distributed unit tests (#3806) 2018-10-18 00:12:02 -07:00
Philip Hyunsu Cho
b38c636d05
Fix #3523: Fix CustomGlobalRandomEngine for R (#3781)
**Symptom** Apple Clang's implementation of `std::shuffle` expects doesn't work
correctly when it is run with the random bit generator for R package:
```cpp
CustomGlobalRandomEngine::result_type
CustomGlobalRandomEngine::operator()() {
  return static_cast<result_type>(
      std::floor(unif_rand() * CustomGlobalRandomEngine::max()));
}
```

Minimial reproduction of failure (compile using Apple Clang 10.0):
```cpp
std::vector<int> feature_set(100);
std::iota(feature_set.begin(), feature_set.end(), 0);
    // initialize with 0, 1, 2, 3, ..., 99
std::shuffle(feature_set.begin(), feature_set.end(), common::GlobalRandom());
    // This returns 0, 1, 2, ..., 99, so content didn't get shuffled at all!!!
```

Note that this bug is platform-dependent; it does not appear when GCC or
upstream LLVM Clang is used.

**Diagnosis** Apple Clang's `std::shuffle` expects 32-bit integer
inputs, whereas `CustomGlobalRandomEngine::operator()` produces 64-bit
integers.

**Fix** Have `CustomGlobalRandomEngine::operator()` produce 32-bit integers.

Closes #3523.
2018-10-15 09:39:13 -07:00
Philip Hyunsu Cho
4302fc4027
Update committer list (#3788)
* Update committer list

* Update CONTRIBUTORS.md

* Minor format fix
2018-10-14 23:41:03 -07:00
Rory Mitchell
f00fd87b36
Address #2754, accuracy issues with gpu_hist (#3793)
* Address windows compilation error

* Do not allow divide by zero in weight calculation

* Update tests
2018-10-15 17:50:31 +13:00
trivialfis
516457fadc Add basic unittests for gpu-hist method. (#3785)
* Split building histogram into separated class.
* Extract `InitCompressedRow` definition.
* Basic tests for gpu-hist.
* Document the code more verbosely.
* Removed `HistCutUnit`.
* Removed some duplicated copies in `GPUHistMaker`.
* Implement LCG and use it in tests.
2018-10-15 15:47:00 +13:00
trivialfis
184efff9f9 Remove NoConstraint. (#3792) 2018-10-15 15:43:06 +13:00
Rory Mitchell
5d6baed998
Allow sklearn grid search over parameters specified as kwargs (#3791) 2018-10-14 12:44:53 +13:00
Juzer Shakir
1db28b8718 Typo fixed (#3784)
The word 'make' was been repeated twice, fixed to single.
2018-10-10 10:23:27 -07:00
KOLANICH
5480e05173 Added some instructions on using MinGW-built XGBoost with python. (#3774)
* Added some instructions on using MinGW-built XGBoost with python.

* Changes according to the discussion and some additions

* Fixed wording and removed redundancy.

* Even more fixes

* Fixed links. Removed redundancy.

* Some fixes according to the discussion

* fixes

* Some fixes

* fixes
2018-10-09 09:07:00 -07:00
weitian
9504f411c1 [jvm-packages] For training data with group, empty RDD partition threw exception (#3749) (#3750) 2018-10-09 09:03:22 -07:00
Philip Hyunsu Cho
ca33bf6476
Document gblinear parameters: feature_selector and top_k (#3780) 2018-10-08 22:41:54 -07:00
Philip Hyunsu Cho
133b8d94df
Fix Jenkins syntax (#3777) 2018-10-08 14:56:42 -07:00
Philip Hyunsu Cho
11eaf3eed1
Retry Jenkins CI tests up to 3 times to improve reliability (redux) (#3776) 2018-10-08 11:39:00 -07:00
Philip Hyunsu Cho
6d42e56c85
Retry Jenkins CI tests up to 3 times to improve reliability (redux) (#3775) 2018-10-08 11:24:01 -07:00
Philip Hyunsu Cho
7a7269e983
Retry Jenkins CI tests up to 3 times to improve reliability (#3769) 2018-10-08 09:55:39 -07:00
Philip Hyunsu Cho
ea99b53d8e
Document behavior of get_fscore() for zero-importance features (#3763) 2018-10-08 01:52:25 -07:00
Philip Hyunsu Cho
10cd7c8447
Fix #3714: preserve feature names when slicing DMatrix (#3766)
* Fix #3714: preserve feature names when slicing DMatrix

* Add test
2018-10-08 01:04:33 -07:00
Philip Hyunsu Cho
813d2436d3
Produce xgboost.so for XGBoost-R on Mac OSX, so that make install works (#3767)
* Produce xgboost.so for XGBoost-R on Mac OSX, so that `make install` works

* Modernize R build instructions

* Fix crossref
2018-10-07 14:09:54 -07:00
Philip Hyunsu Cho
c23783a0d1
Add notes to doc (#3765) 2018-10-07 14:09:09 -07:00
Philip Hyunsu Cho
91903ac5d4
Fix broken doc build due to Matplotlib 3.0 release (#3764) 2018-10-07 13:34:37 -07:00
Philip Hyunsu Cho
ae7e58b96e
Test wheel compatibility on CPU containers, for all pull requests (#3762)
* Test wheel compatibility on CPU containers, for all pull requests

* Run wheel test only when multi-GPU flag is not set
2018-10-06 20:18:58 -07:00
Saumya Bhatnagar
e0fd60f4e5 [doc] Fix link in rank demo README.md . (#3759) 2018-10-06 12:12:54 -07:00
trivialfis
4b892c2b30 Remove obsoleted QuantileHistMaker. (#3761)
Fix #3755.
2018-10-06 11:22:15 -07:00
Nan Zhu
785094db53
[jvm-packages] fix issue when spark job execution thread cannot return before we execute first() (#3758)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* sparjJobThread

* update

* fix issue when spark job execution thread cannot return before we execute first()
2018-10-05 22:20:50 -07:00
zengxy
9e73087324 [jvm-packages] support specified feature names when getModelDump and getFeatureScore (#3733)
* [jvm-packages] support specified feature names for jvm when get ModelDump and get FeatureScore (#3725)

* typo and style fix
2018-10-04 09:05:42 -07:00
Rory Mitchell
34522d56f0
Allow plug-ins to be built by cmake (#3752)
* Remove references to AVX code.

* Allow plugins to be built by cmake
2018-10-04 22:03:52 +13:00
trivialfis
c6b5df67f6 Catch dmlc::Error. (#3751)
Fix #3643.
2018-10-04 16:51:38 +13:00
weitian
efc4f85505 [jvm-packages] Fix #3489: Spark repartitionForData can potentially shuffle all data and lose ordering required for ranking objectives (#3654) 2018-10-03 08:43:55 -07:00
trivialfis
d594b11f35 Implement transform to reduce CPU/GPU code duplication. (#3643)
* Implement Transform class.
* Add tests for softmax.
* Use Transform in regression, softmax and hinge objectives, except for Cox.
* Mark old gpu objective functions deprecated.
* static_assert for softmax.
* Split up multi-gpu tests.
2018-10-02 15:06:21 +13:00
Sergei Lebedev
87aca8c244 [jvm-packages] Fixed the distributed updater check (#3739)
The updater used in distributed training is grow_histmaker and not 
grow_colmaker as the error message stated prior to this commit.
2018-10-01 11:22:01 -07:00
Rory Mitchell
70d208d68c
Dmatrix refactor stage 2 (#3395)
* DMatrix refactor 2

* Remove buffered rowset usage where possible

* Transition to c++11 style iterators for row access

* Transition column iterators to C++ 11
2018-10-01 01:29:03 +13:00
Philip Hyunsu Cho
b50bc2c1d4
Add multi-GPU unit test environment (#3741)
* Add multi-GPU unit test environment

* Better assertion message

* Temporarily disable failing test

* Distinguish between multi-GPU and single-GPU CPP tests

* Consolidate Python tests. Use attributes to distinguish multi-GPU Python tests from single-CPU counterparts
2018-09-29 11:20:58 -07:00
Philip Hyunsu Cho
baef5741df
Separate out restricted and unrestricted tasks (#3736) 2018-09-27 23:06:14 -07:00
trivialfis
5a7f7e7d49 Implement devices to devices reshard. (#3721)
* Force clearing device memory before Reshard.
* Remove calculating row_segments for gpu_hist and gpu_sketch.
* Guard against changing device.
2018-09-28 17:40:23 +12:00
Tong He
0b7fd74138 fix R check warning (#3728) 2018-09-27 17:53:49 -07:00
Philip Hyunsu Cho
51478a39c9
Fix #3730: scikit-learn 0.20 compatibility fix (#3731)
* Fix #3730: scikit-learn 0.20 compatibility fix

sklearn.cross_validation has been removed from scikit-learn 0.20,
so replace it with sklearn.model_selection

* Display test names for Python tests for clarity
2018-09-27 15:03:05 -07:00
Philip Hyunsu Cho
fbe9d41dd0 Disable flaky tests in R-package/tests/testthat/test_update.R (#3723) 2018-09-26 14:21:41 -07:00
Nan Zhu
79d854c695
[jvm-packages] fix errors in example (#3719)
* add back train method but mark as deprecated

* fix scalastyle error

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* instrumentation

* use log console

* better measurement

* fix erros in example

* update histmaker
2018-09-22 16:39:38 -07:00
BruceZhao
3b5a1f389a [R] add a demo of multi-class classification R version (#3695)
* add a demo of multi-class classification R version

* add a demo of multi-class classification result

* add intro to the demo readme

* Delete train.md

* Update README.md
2018-09-21 23:06:40 -07:00
Takahiro Kojima
2405c59352 remove extra of (#3713) 2018-09-21 11:55:39 -07:00