525 Commits

Author SHA1 Message Date
Philip Hyunsu Cho
9b0af6e882 Enable OpenMP with Apple Clang (Mac default compiler) (#5146)
* Add OpenMP as CMake target

* Require CMake 3.12, to allow linking OpenMP target to objxgboost

* Specify OpenMP compiler flag for CUDA host compiler

* Require CMake 3.16+ if the OS is Mac OSX

* Use AppleClang in Mac tests.

* Update dmlc-core
2019-12-26 16:53:12 +08:00
Jiaming Yuan
f3d7877802
Parameter validation (#5157)
* Unused code.

* Split up old colmaker parameters from train param.

* Fix dart.

* Better name.
2019-12-26 11:59:05 +08:00
Jiaming Yuan
ced3660f60
Tests for empty dmatrix. (#5159) 2019-12-26 11:51:54 +08:00
Jiaming Yuan
298ebe68ac
[Breaking] Remove learning_rates in Python. (#5155)
* Remove `learning_rates`.

It's been deprecated since we have callback.

* Set `before_iteration` of `reset_learning_rate` to False to preserve 
  the initial learning rate, and comply to the term "reset".

Closes #4709.

* Tests for various `tree_method`.
2019-12-24 14:25:48 +08:00
Jiaming Yuan
0202e04a8e
Add base margin to sklearn interface. (#5151) 2019-12-24 09:43:41 +08:00
Jiaming Yuan
1d0ca49761
Example JSON model parser and Schema. (#5137) 2019-12-23 19:47:35 +08:00
Jiaming Yuan
c8bdb652c4
Add check for length of weights. (#4872) 2019-12-21 11:30:58 +08:00
Rory Mitchell
3d04a8cc97
Use dynamic types for array interface columns instead of templates (#5108) 2019-12-21 16:08:10 +13:00
Jiaming Yuan
b915788708
Remove benchmark code in GPU test. (#5141)
* Update Jenkins script.
2019-12-21 11:00:21 +08:00
Philip Hyunsu Cho
74f545bde3 [CI] Repair download URL for Maven 3.6.1 (#5139) 2019-12-20 10:07:40 +08:00
Jiaming Yuan
27b3646d29
Tests and documents for new JSON routines. (#5120) 2019-12-18 08:44:27 +08:00
Jiaming Yuan
3136185bc5
JSON configuration IO. (#5111)
* Add saving/loading JSON configuration.
* Implement Python pickle interface with new IO routines.
* Basic tests for training continuation.
2019-12-15 17:31:53 +08:00
Jiaming Yuan
ad4a1c732c
Small refinements for JSON model. (#5112)
* Naming consistency.

* Remove duplicated test.
2019-12-11 19:49:01 +08:00
Jiaming Yuan
208ab3b1ff
Model IO in JSON. (#5110) 2019-12-11 11:20:40 +08:00
Rory Mitchell
c7cc657a4d
Use adapters for SparsePageDMatrix (#5092) 2019-12-11 15:59:23 +13:00
Jiaming Yuan
e089e16e3d
Pass pointer to model parameters. (#5101)
* Pass pointer to model parameters.

This PR de-duplicates most of the model parameters except the one in
`tree_model.h`.  One difficulty is `base_score` is a model property but can be
changed at runtime by objective function.  Hence when performing model IO, we
need to save the one provided by users, instead of the one transformed by
objective.  Here we created an immutable version of `LearnerModelParam` that
represents the value of model parameter after configuration.
2019-12-10 12:11:22 +08:00
Rory Mitchell
979f74d51a
Group builder modified for incremental building (#5098) 2019-12-10 14:33:56 +13:00
Jiaming Yuan
608ebbe444
Fix GPU ID and prediction cache from pickle (#5086)
* Hack for saving GPU ID.

* Declare prediction cache on GBTree.

* Add a simple test.

* Add `auto` option for GPU Predictor.
2019-12-07 16:02:06 +08:00
Jiaming Yuan
7ef5b78003
Implement JSON IO for updaters (#5094)
* Implement JSON IO for updaters.

* Remove parameters in split evaluator.
2019-12-07 00:24:00 +08:00
Jiaming Yuan
2dcb62ddfb
Add IO utilities. (#5091)
* Add fixed size stream for reading model stream.
* Add file extension.
2019-12-05 22:15:34 +08:00
Jiaming Yuan
64af1ecf86
[Breaking] Remove num roots. (#5059) 2019-12-05 21:58:43 +08:00
Jiaming Yuan
df9bdbbcb9
Fix parsing empty vector in parameter. (#5087) 2019-12-05 11:42:01 +08:00
Jiaming Yuan
f0ca53d9ec
Convenient methods for JSON integer. (#5089)
* Fix parsing empty object.
2019-12-05 11:01:12 +08:00
Rory Mitchell
e3c34c79be
External data adapters (#5044)
* Use external data adapters as lightweight intermediate layer between external data and DMatrix
2019-12-04 10:56:17 +13:00
Philip Hyunsu Cho
64f4361b47
[CI] Locate vcomp140.dll from System32 directory (#5078) 2019-12-02 02:09:32 -08:00
Jiaming Yuan
761e938dbe
Support dask dataframe as y for classifier. (#5077)
* Support dask dataframe as y for classifier.

* Lint.
2019-12-02 11:53:30 +08:00
Jiaming Yuan
d667ea9335
[CI] Fix Travis tests. (#5062)
- Install wget explicitly to match openssl.
- Install CMake explicitly.
- Use newer miniconda link.
- Reenable unittests.
- gcc@9 + xcode@10 for osx due to missing <_stdio.h>.  Other versions of gcc should also work.  But as homebrew pour gcc@9 after update by default, so I just stick with latest version.
- Disabled one external memory test for OSX.  Not sure about the thread implementation in there and fixing external memory is beyond the scope of this PR.
- Use Python3 with conda in jvm package.
2019-11-25 03:32:10 +08:00
Jiaming Yuan
a4f5c86276
Allow using RandomState object from Numpy in sklearn interface. (#5049) 2019-11-19 10:56:39 +08:00
Rong Ou
0afcc55d98 Support multiple batches in gpu_hist (#5014)
* Initial external memory training support for GPU Hist tree method.
2019-11-16 14:50:20 +08:00
Jiaming Yuan
97abcc7ee2
Extract interaction constraint from split evaluator. (#5034)
*  Extract interaction constraints from split evaluator.

The reason for doing so is mostly for model IO, where num_feature and interaction_constraints are copied in split evaluator. Also interaction constraint by itself is a feature selector, acting like column sampler and it's inefficient to bury it deep in the evaluator chain. Lastly removing one another copied parameter is a win.

*  Enable inc for approx tree method.

As now the implementation is spited up from evaluator class, it's also enabled for approx method.

*  Removing obsoleted code in colmaker.

They are never documented nor actually used in real world. Also there isn't a single test for those code blocks.

*  Unifying the types used for row and column.

As the size of input dataset is marching to billion, incorrect use of int is subject to overflow, also singed integer overflow is undefined behaviour. This PR starts the procedure for unifying used index type to unsigned integers. There's optimization that can utilize this undefined behaviour, but after some testings I don't see the optimization is beneficial to XGBoost.
2019-11-14 20:11:41 +08:00
sriramch
2abe69d774 - ndcg ltr implementation on gpu (#5004)
* - ndcg ltr implementation on gpu
  - this is a follow-up to the pairwise ltr implementation
2019-11-13 11:21:04 +13:00
Philip Hyunsu Cho
f4e7b707c9
Revert #4529 (#5008)
* Revert " Optimize ‘hist’ for multi-core CPU (#4529)"

This reverts commit 4d6590be3c9a043d44d9e4fe0a456a9f8179ec72.

* Fix build
2019-11-12 09:35:03 -08:00
Jiaming Yuan
7663de956c
Run training with empty DMatrix. (#4990)
This makes GPU Hist robust in distributed environment as some workers might not
be associated with any data in either training or evaluation.

* Disable rabit mock test for now: See #5012 .

* Disable dask-cudf test at prediction for now: See #5003

* Launch dask job for all workers despite they might not have any data.
* Check 0 rows in elementwise evaluation metrics.

   Using AUC and AUC-PR still throws an error.  See #4663 for a robust fix.

* Add tests for edge cases.
* Add `LaunchKernel` wrapper handling zero sized grid.
* Move some parts of allreducer into a cu file.
* Don't validate feature names when the booster is empty.

* Sync number of columns in DMatrix.

  As num_feature is required to be the same across all workers in data split
  mode.

* Filtering in dask interface now by default syncs all booster that's not
empty, instead of using rank 0.

* Fix Jenkins' GPU tests.

* Install dask-cuda from source in Jenkins' test.

  Now all tests are actually running.

* Restore GPU Hist tree synchronization test.

* Check UUID of running devices.

  The check is only performed on CUDA version >= 10.x, as 9.x doesn't have UUID field.

* Fix CMake policy and project variables.

  Use xgboost_SOURCE_DIR uniformly, add policy for CMake >= 3.13.

* Fix copying data to CPU

* Fix race condition in cpu predictor.

* Fix duplicated DMatrix construction.

* Don't download extra nccl in CI script.
2019-11-06 16:13:13 +08:00
Chen Qin
b29b8c2f34 [jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4966)
* [phase 1] expose sets of rabit configurations to spark layer

* add back mutable import

* disable ring_mincount till https://github.com/dmlc/rabit/pull/106d

* Revert "disable ring_mincount till https://github.com/dmlc/rabit/pull/106d"

This reverts commit 65e95a98e24f5eb53c6ba9ef9b2379524258984d.

* apply latest rabit

* fix build error

* apply https://github.com/dmlc/xgboost/pull/4880

* downgrade cmake in rabit

* point to rabit with DMLC_ROOT fix

* relative path of rabit install prefix

* split rabit parameters to another trait

* misc

* misc

* Delete .classpath

* Delete .classpath

* Delete .classpath

* Update XGBoostClassifier.scala

* Update XGBoostRegressor.scala

* Update GeneralParams.scala

* Update GeneralParams.scala

* Update GeneralParams.scala

* Update GeneralParams.scala

* Delete .classpath

* Update RabitParams.scala

* Update .gitignore

* Update .gitignore

* apply rabitParams to training

* use string as rabit parameter value type

* cleanup

* add rabitEnv check

* point to dmlc/rabit

* per feedback

* update private scope

* misc

* update rabit

* add rabit_timtout, fix failing test.

* split tests

* allow build jvm with rabit mock

* pass mock failures to rabit with test

* add mock error and graceful handle rabit assertion error test

* split mvn test

* remove sign for test

* update rabit

* build jvm_packages with rabit mock

* point back to dmlc/rabit

* per feedback, update scala header

* cleanup pom

* per feedback

* try fix lint

* fix lint

* per feedback, remove bootstrap_cache

* per feedback 2

* try replace dev profile with passing mvn property

* fix build error

* remove mvn property and replace with env setting to build test jar

* per feedback

* revert copyright headlines, point to dmlc/rabit

* revert python lint

* remove multiple failure test case as retry is not enabled in spark

* Update core.py

* Update core.py

* per feedback, style fix
2019-11-01 14:21:19 -07:00
Philip Hyunsu Cho
da6e74f7bb
[CI] Upload nightly builds to S3 (#4976)
* Do not store built artifacts in the Jenkins master

* Add wheel renaming script

* Upload wheels to S3 bucket

* Use env.GIT_COMMIT

* Capture git hash correctly

* Add missing import in Jenkinsfile

* Address reviewer's comments

* Put artifacts for pull requests in separate directory

* No wildcard expansion in Windows CMD
2019-10-23 21:16:05 -07:00
Jiaming Yuan
ac457c56a2
Use `UpdateAllowUnknown' for non-model related parameter. (#4961)
* Use `UpdateAllowUnknown' for non-model related parameter.

Model parameter can not pack an additional boolean value due to binary IO
format.  This commit deals only with non-model related parameter configuration.

* Add tidy command line arg for use-dmlc-gtest.
2019-10-23 05:50:12 -04:00
Jiaming Yuan
f24be2efb4 Use configure_file() to configure version only (#4974)
* Avoid writing build_config.h

* Remove build_config.h all together.

* Lint.
2019-10-22 23:47:00 -07:00
Rong Ou
5b1715d97c Write ELLPACK pages to disk (#4879)
* add ellpack source
* add batch param
* extract function to parse cache info
* construct ellpack info separately
* push batch to ellpack page
* write ellpack page.
* make sparse page source reusable
2019-10-22 23:44:32 -04:00
sriramch
310fe60b35 Pairwise ranking objective implementation on gpu (#4873)
* - pairwise ranking objective implementation on gpu
   - there are couple of more algorithms (ndcg and map) for which support will be added
     as follow-up pr's
   - with no label groups defined, get gradient is 90x faster on gpu (120m instance
     mortgage dataset)
   - it can perform by an order of magnitude faster with ~ 10 groups (and adequate cores
     for the cpu implementation)

* Add JSON config to rank obj.
2019-10-22 23:40:07 -04:00
Jiaming Yuan
5620322a48
[Breaking] Add global versioning. (#4936)
* Use CMake config file for representing version.

* Generate c and Python version file with CMake.

The generated file is written into source tree.  But unless XGBoost upgrades
its version, there will be no actual modification.  This retains compatibility
with Makefiles for R.

* Add XGBoost version the DMatrix binaries.
* Simplify prefetch detection in CMakeLists.txt
2019-10-22 23:27:26 -04:00
Jiaming Yuan
7e477a2adb
Fix data loading (#4862)
* Fix loading text data.
* Fix config regex.
* Try to explain the error better in exception.
* Update doc.
2019-10-22 12:33:14 -04:00
Philip Hyunsu Cho
95295ce026 [CI] Use latest dask (#4973)
* Remove version spec, to use latest dask always
2019-10-22 07:00:13 -04:00
Jiaming Yuan
4771bb0d41
Catch exception in transform function omp context. (#4960) 2019-10-21 17:03:38 +08:00
Jiaming Yuan
010b8f1428 Revert "[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4876)" (#4965)
This reverts commit 86ed01c4bbecef66e1bc4d02fb13116bd6130fae.
2019-10-18 14:02:35 -07:00
Chen Qin
86ed01c4bb [jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4876)
* Expose sets of rabit configurations to spark layer
2019-10-18 15:07:31 -04:00
Jiaming Yuan
31030a8d3a
Set correct file permission. (#4964) 2019-10-18 12:54:29 -04:00
Jiaming Yuan
ae536756ae
Add Model and Configurable interface. (#4945)
* Apply Configurable to objective functions.
* Apply Model to Learner and Regtree, gbm.
* Add Load/SaveConfig to objs.
* Refactor obj tests to use smart pointer.
* Dummy methods for Save/Load Model.
2019-10-18 01:56:02 -04:00
Rory Mitchell
60748b2071
Use heuristic to select histogram node, avoid rabit call (#4951) 2019-10-18 11:33:54 +13:00
Jiaming Yuan
7e72a12871
Don't set_params at the end of set_state. (#4947)
* Don't set_params at the end of set_state.

* Also fix another issue found in dask prediction.

* Add note about prediction.

Don't support other prediction modes at the moment.
2019-10-15 10:08:26 -04:00
Jiaming Yuan
2ebdec8aa6
Fix dask prediction. (#4941)
* Fix dask prediction.

* Add better error messages for wrong partition.
2019-10-14 23:19:34 -04:00