498 Commits

Author SHA1 Message Date
Jiaming Yuan
ef19480eda
Add dart to JSON schema. (#5218)
* Add dart to JSON schema.

* Use spaces instead of tab.
2020-01-28 13:29:09 +08:00
Rory Mitchell
1b3947d929
Make some GPU tests deterministic (#5229) 2020-01-26 11:53:07 +13:00
Jiaming Yuan
3eb1279bbf
Config for linear updaters. (#5222) 2020-01-25 11:26:46 +08:00
Jiaming Yuan
40680368cf
Add constraint parameters to Scikit-Learn interface. (#5227)
* Add document for constraints.

* Fix a format error in doc for objective function.
2020-01-25 11:12:02 +08:00
Philip Hyunsu Cho
44469a0ca9
Extensible binary serialization format for DMatrix::MetaInfo (#5187)
* Turn xgboost::DataType into C++11 enum class

* New binary serialization format for DMatrix::MetaInfo

* Fix clang-tidy

* Fix c++ test

* Implement new format proposal

* Move helper functions to anonymous namespace; remove unneeded field

* Fix lint

* Add shape.

* Keep only roundtrip test.

* Fix test.

* various fixes

* Update data.cc

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2020-01-23 11:33:17 -08:00
OrdoAbChao
b4f952bd22 [Breaking] Remove Scikit-Learn default parameters (#5130)
* Simplify Scikit-Learn parameter management.

* Copy base class for removing duplicated parameter signatures.
* Set all parameters to None.
* Handle None in set_param.
* Extract the doc.

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2020-01-23 20:25:20 +08:00
Rory Mitchell
aa9a68010b
uint not supported in cudf (#5225) 2020-01-23 16:59:18 +13:00
Jiaming Yuan
1891cc766d
Fix metainfo from DataFrame. (#5216)
* Fix metainfo from DataFrame.

* Unify helper functions for data and meta.
2020-01-22 16:29:44 +08:00
Rory Mitchell
9c56480c61
Support dmatrix construction from cupy array (#5206) 2020-01-22 13:15:27 +13:00
Philip Hyunsu Cho
0184f2e9f7
Explicitly use UTF-8 codepage when using MSVC (#5197)
* Explicitly use UTF-8 codepage when using MSVC

* Fix build with CUDA enabled
2020-01-14 13:30:34 -08:00
Rory Mitchell
a73e25e15f
Implement slice via adapters (#5198) 2020-01-14 12:55:41 +13:00
Kodi Arfer
f100b8d878 [Breaking] Don't drop trees during DART prediction by default (#5115)
* Simplify DropTrees calling logic

* Add `training` parameter for prediction method.

* [Breaking]: Add `training` to C API.

* Change for R and Python custom objective.

* Correct comment.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2020-01-13 21:48:30 +08:00
Jiaming Yuan
7b65698187
Enforce correct data shape. (#5191)
* Fix syncing DMatrix columns.
* notes for tree method.
* Enable feature validation for all interfaces except for jvm.
* Better tests for boosting from predictions.
* Disable validation on JVM.
2020-01-13 15:48:17 +08:00
Rory Mitchell
8cbcc53ccb
Remove old cudf constructor code (#5194) 2020-01-10 16:35:23 +13:00
Rory Mitchell
87ebfc1315
Implement cudf construction with adapters. (#5189) 2020-01-09 20:23:06 +13:00
Jiaming Yuan
ee287808fb
Lazy initialization of device vector. (#5173)
* Lazy initialization of device vector.

* Fix #5162.

* Disable copy constructor of HostDeviceVector.  Prevents implicit copying.

* Fix CPU build.

* Bring back move assignment operator.
2020-01-07 11:23:05 +08:00
Jiaming Yuan
ebc86a3afa
Disable parameter validation for Scikit-Learn interface. (#5167)
* Disable parameter validation for now.

Scikit-Learn passes all parameters down to XGBoost, whether they are used or
not.

* Add option `validate_parameters`.
2020-01-07 11:17:31 +08:00
Egor Smirnov
7b17e76c5b Optimized EvaluateSplut function (#5138)
* Add block based threading utilities.
2019-12-31 18:18:42 +08:00
Jiaming Yuan
04db125699
Quick fix for memory leak in CPU Hist. (#5153)
Closes https://github.com/dmlc/xgboost/issues/3579 .

* Don't use map.
2019-12-31 14:05:53 +08:00
K.O
018df6004e Fix feature_name crated from int64index dataframe. (#5081) 2019-12-30 12:26:22 +08:00
Jiaming Yuan
6848d0426f
Clean up Python 2 compatibility code. (#5161) 2019-12-27 18:34:53 +08:00
Jiaming Yuan
61286c6e8f
Fix wrapping GPU ID and prevent data copying. (#5160)
* Removed some data copying.

* Make sure gpu_id is valid before any configuration is carried out.
2019-12-27 16:51:08 +08:00
sriramch
ee81ba8e1f implementation of map ranking algorithm on gpu (#5129)
* - implementation of map ranking algorithm
  - also effected necessary suggestions mentioned in the earlier ranking pr's
  - made some performance improvements to the ndcg algo as well
2019-12-27 12:05:37 +13:00
Philip Hyunsu Cho
9b0af6e882 Enable OpenMP with Apple Clang (Mac default compiler) (#5146)
* Add OpenMP as CMake target

* Require CMake 3.12, to allow linking OpenMP target to objxgboost

* Specify OpenMP compiler flag for CUDA host compiler

* Require CMake 3.16+ if the OS is Mac OSX

* Use AppleClang in Mac tests.

* Update dmlc-core
2019-12-26 16:53:12 +08:00
Jiaming Yuan
f3d7877802
Parameter validation (#5157)
* Unused code.

* Split up old colmaker parameters from train param.

* Fix dart.

* Better name.
2019-12-26 11:59:05 +08:00
Jiaming Yuan
ced3660f60
Tests for empty dmatrix. (#5159) 2019-12-26 11:51:54 +08:00
Jiaming Yuan
298ebe68ac
[Breaking] Remove learning_rates in Python. (#5155)
* Remove `learning_rates`.

It's been deprecated since we have callback.

* Set `before_iteration` of `reset_learning_rate` to False to preserve 
  the initial learning rate, and comply to the term "reset".

Closes #4709.

* Tests for various `tree_method`.
2019-12-24 14:25:48 +08:00
Jiaming Yuan
0202e04a8e
Add base margin to sklearn interface. (#5151) 2019-12-24 09:43:41 +08:00
Jiaming Yuan
1d0ca49761
Example JSON model parser and Schema. (#5137) 2019-12-23 19:47:35 +08:00
Jiaming Yuan
c8bdb652c4
Add check for length of weights. (#4872) 2019-12-21 11:30:58 +08:00
Rory Mitchell
3d04a8cc97
Use dynamic types for array interface columns instead of templates (#5108) 2019-12-21 16:08:10 +13:00
Jiaming Yuan
b915788708
Remove benchmark code in GPU test. (#5141)
* Update Jenkins script.
2019-12-21 11:00:21 +08:00
Philip Hyunsu Cho
74f545bde3 [CI] Repair download URL for Maven 3.6.1 (#5139) 2019-12-20 10:07:40 +08:00
Jiaming Yuan
27b3646d29
Tests and documents for new JSON routines. (#5120) 2019-12-18 08:44:27 +08:00
Jiaming Yuan
3136185bc5
JSON configuration IO. (#5111)
* Add saving/loading JSON configuration.
* Implement Python pickle interface with new IO routines.
* Basic tests for training continuation.
2019-12-15 17:31:53 +08:00
Jiaming Yuan
ad4a1c732c
Small refinements for JSON model. (#5112)
* Naming consistency.

* Remove duplicated test.
2019-12-11 19:49:01 +08:00
Jiaming Yuan
208ab3b1ff
Model IO in JSON. (#5110) 2019-12-11 11:20:40 +08:00
Rory Mitchell
c7cc657a4d
Use adapters for SparsePageDMatrix (#5092) 2019-12-11 15:59:23 +13:00
Jiaming Yuan
e089e16e3d
Pass pointer to model parameters. (#5101)
* Pass pointer to model parameters.

This PR de-duplicates most of the model parameters except the one in
`tree_model.h`.  One difficulty is `base_score` is a model property but can be
changed at runtime by objective function.  Hence when performing model IO, we
need to save the one provided by users, instead of the one transformed by
objective.  Here we created an immutable version of `LearnerModelParam` that
represents the value of model parameter after configuration.
2019-12-10 12:11:22 +08:00
Rory Mitchell
979f74d51a
Group builder modified for incremental building (#5098) 2019-12-10 14:33:56 +13:00
Jiaming Yuan
608ebbe444
Fix GPU ID and prediction cache from pickle (#5086)
* Hack for saving GPU ID.

* Declare prediction cache on GBTree.

* Add a simple test.

* Add `auto` option for GPU Predictor.
2019-12-07 16:02:06 +08:00
Jiaming Yuan
7ef5b78003
Implement JSON IO for updaters (#5094)
* Implement JSON IO for updaters.

* Remove parameters in split evaluator.
2019-12-07 00:24:00 +08:00
Jiaming Yuan
2dcb62ddfb
Add IO utilities. (#5091)
* Add fixed size stream for reading model stream.
* Add file extension.
2019-12-05 22:15:34 +08:00
Jiaming Yuan
64af1ecf86
[Breaking] Remove num roots. (#5059) 2019-12-05 21:58:43 +08:00
Jiaming Yuan
df9bdbbcb9
Fix parsing empty vector in parameter. (#5087) 2019-12-05 11:42:01 +08:00
Jiaming Yuan
f0ca53d9ec
Convenient methods for JSON integer. (#5089)
* Fix parsing empty object.
2019-12-05 11:01:12 +08:00
Rory Mitchell
e3c34c79be
External data adapters (#5044)
* Use external data adapters as lightweight intermediate layer between external data and DMatrix
2019-12-04 10:56:17 +13:00
Philip Hyunsu Cho
64f4361b47
[CI] Locate vcomp140.dll from System32 directory (#5078) 2019-12-02 02:09:32 -08:00
Jiaming Yuan
761e938dbe
Support dask dataframe as y for classifier. (#5077)
* Support dask dataframe as y for classifier.

* Lint.
2019-12-02 11:53:30 +08:00
Jiaming Yuan
d667ea9335
[CI] Fix Travis tests. (#5062)
- Install wget explicitly to match openssl.
- Install CMake explicitly.
- Use newer miniconda link.
- Reenable unittests.
- gcc@9 + xcode@10 for osx due to missing <_stdio.h>.  Other versions of gcc should also work.  But as homebrew pour gcc@9 after update by default, so I just stick with latest version.
- Disabled one external memory test for OSX.  Not sure about the thread implementation in there and fixing external memory is beyond the scope of this PR.
- Use Python3 with conda in jvm package.
2019-11-25 03:32:10 +08:00