995 Commits

Author SHA1 Message Date
Jiaming Yuan
4771bb0d41
Catch exception in transform function omp context. (#4960) 2019-10-21 17:03:38 +08:00
Jiaming Yuan
31030a8d3a
Set correct file permission. (#4964) 2019-10-18 12:54:29 -04:00
Jiaming Yuan
ae536756ae
Add Model and Configurable interface. (#4945)
* Apply Configurable to objective functions.
* Apply Model to Learner and Regtree, gbm.
* Add Load/SaveConfig to objs.
* Refactor obj tests to use smart pointer.
* Dummy methods for Save/Load Model.
2019-10-18 01:56:02 -04:00
Rory Mitchell
60748b2071
Use heuristic to select histogram node, avoid rabit call (#4951) 2019-10-18 11:33:54 +13:00
Jiaming Yuan
2ebdec8aa6
Fix dask prediction. (#4941)
* Fix dask prediction.

* Add better error messages for wrong partition.
2019-10-14 23:19:34 -04:00
Jiaming Yuan
b61d534472
Span: use size_t' for index_type, add front' and `back'. (#4935)
* Use `size_t' for index_type.  Add `front' and `back'.

* Remove a batch of `static_cast'.
2019-10-14 09:13:33 -04:00
Jiaming Yuan
3d46bd0fa5
Ignore columnar alignment requirement. (#4928)
* Better error message for wrong type.
* Fix stride size.
2019-10-13 06:41:43 -04:00
Jiaming Yuan
4bbf062ed3
[Breaking] Update sklearn interface. (#4929)
* Remove nthread, seed, silent. Add tree_method, gpu_id, num_parallel_tree. Fix #4909.
* Check data shape. Fix #4896.
* Check element of eval_set is tuple. Fix #4875
*  Add doc for random_state with hogwild. Fixes #4919
2019-10-12 02:50:09 -04:00
Rory Mitchell
aefb1e5c2f
Resolve dask performance issues (#4914)
* Set dask client.map as impure function

* Remove nrows

* Remove slow check in verbose mode
2019-10-10 16:01:30 +13:00
Jiaming Yuan
095de3bf5f
Export c++ headers in CMake installation. (#4897)
* Move get transpose into cc.

* Clean up headers in host device vector, remove thrust dependency.

* Move span and host device vector into public.

* Install c++ headers.

* Short notes for c and c++.

Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2019-10-06 23:53:09 -04:00
Jiaming Yuan
4ab1df5fe6
Check deprecated n_gpus. (#4908) 2019-10-02 02:05:14 -04:00
Jiaming Yuan
d30e63a0a5
Support feature names/types for cudf. (#4902)
* Implement most of the pandas procedure for cudf except for type conversion.
* Requires an array of interfaces in metainfo.
2019-09-29 15:07:51 -04:00
Rong Ou
562bb0ae31 remove device shards (#4867) 2019-09-25 13:15:46 +08:00
Jiaming Yuan
0b89cd1dfa
Support gamma in GPU_Hist. (#4874)
* Just prevent building the tree instead of using an explicit pruner.
2019-09-24 10:16:08 +08:00
Jiaming Yuan
a40b72d127
Workaround isnan across different environments. (#4883) 2019-09-23 21:34:27 -04:00
Jiaming Yuan
57106a3459
Fix parsing empty json object. (#4868)
* Fix parsing empty json object.

* Better error message.
2019-09-18 03:31:46 -04:00
Jiaming Yuan
d669ea1eaa
Deprecate set group (#4864)
* Convert jvm package and R package.

* Restore for compatibility.
2019-09-17 21:26:54 -04:00
Jiaming Yuan
5374f52531
Complete cudf support. (#4850)
* Handles missing value.
* Accept all floating point and integer types.
* Move to cudf 9.0 API.
* Remove requirement on `null_count`.
* Arbitrary column types support.
2019-09-16 23:52:00 -04:00
Rong Ou
125bcec62e Move ellpack page construction into DMatrix (#4833) 2019-09-16 23:50:55 -04:00
Chen Qin
512f037e55 [rabit_bootstrap_cache ] failed xgb worker recover from other workers (#4808)
* Better recovery support.  Restarting only the failed workers.
2019-09-16 23:31:52 -04:00
Xu Xiao
c89bcc4de5 [blocking] fix parallel eval_split of hist updater (#4851)
* Don't call rabit functions inside parallel loop.
2019-09-13 09:35:03 -04:00
Jiaming Yuan
f90e7f9aa8
Some comments for row partitioner. (#4832) 2019-09-06 03:01:42 -04:00
Jiaming Yuan
a5f232feb8
Fix calling GPU predictor (#4836)
* Fix calling GPU predictor
2019-09-05 19:09:38 -04:00
Jiaming Yuan
52d44e07fe
monitor for distributed envorinment. (#4829)
* Collect statistics from other ranks in monitor.

* Workaround old GCC bug.
2019-09-05 13:18:09 +08:00
Jiaming Yuan
c0fbeff0ab
Restrict access to cfg_ in gbm. (#4801)
* Restrict access to `cfg_` in gbm.

* Verify having correct updaters.

* Remove `grow_global_histmaker`

This updater is the same as `grow_histmaker`.  The former is not in our
document so we just remove it.
2019-09-02 00:43:19 -04:00
TinkleG
2aed0ae230 Fix auc error in distributed mode (#4798)
Need more work for a complete fix.  See #4663 .
2019-09-01 02:54:14 -04:00
Rong Ou
733ed24dd9 further cleanup of single process multi-GPU code (#4810)
* use subspan in gpu predictor instead of copying
* Revise `HostDeviceVector`
2019-08-30 05:27:23 -04:00
Rong Ou
38ab79f889 Make HostDeviceVector single gpu only (#4773)
* Make HostDeviceVector single gpu only
2019-08-26 09:51:13 +12:00
Jiaming Yuan
fba298fecb
Prevent copying data to host. (#4795) 2019-08-20 23:06:27 -04:00
Jiaming Yuan
9700776597 Cudf support. (#4745)
* Initial support for cudf integration.

* Add two C APIs for consuming data and metainfo.

* Add CopyFrom for SimpleCSRSource as a generic function to consume the data.

* Add FromDeviceColumnar for consuming device data.

* Add new MetaInfo::SetInfo for consuming label, weight etc.
2019-08-19 16:51:40 +12:00
Jiaming Yuan
ab357dd41c
Remove plugin, cuda related code in automake & autoconf files (#4789)
* Build plugin example with CMake.

* Remove plugin, cuda related code in automake & autoconf files.

* Fix typo in GPU doc.
2019-08-18 16:54:34 -04:00
Jiaming Yuan
c358d95c44
Remove initializing stringstream reference. (#4788) 2019-08-18 09:59:47 -04:00
Jiaming Yuan
c81238b5c4
Clean up after removing gpu_exact. (#4777)
* Removed unused functions.
* Removed unused parameters.
* Move ValueConstraints into constraints.cuh since it's now only used in GPU_Hist.
2019-08-17 01:05:57 -04:00
Xu Xiao
ef9af33a00 [HOTFIX] distributed training with hist method (#4716)
* add parallel test for hist.EvalualiteSplit

* update test_openmp.py

* update test_openmp.py

* update test_openmp.py

* update test_openmp.py

* update test_openmp.py

* fix OMP schedule policy

* fix clang-tidy

* add logging: total_num_bins

* fix

* fix

* test

* replace guided OPENMP policy with static in updater_quantile_hist.cc
2019-08-13 11:27:29 -07:00
Jiaming Yuan
c0ffe65f5c
Mimic cuda assert output in span check. (#4762) 2019-08-13 01:44:54 -04:00
Rong Ou
c5b229632d [BREAKING] prevent multi-gpu usage (#4749)
* prevent multi-gpu usage

* fix distributed test

* combine gpu predictor tests

* set upper bound on n_gpus
2019-08-13 09:11:35 +12:00
sriramch
198f3a6c4a Enable natural copies of the batch iterators without the need of the clone method (#4748)
- the synthesized copy constructor should do the appropriate job
2019-08-09 11:47:35 -04:00
Rong Ou
19f9fd5de9 remove the qids_ field in MetaInfo (#4744) 2019-08-08 10:01:59 +08:00
Rong Ou
602484e19f Remove some unused functions as reported by cppcheck (#4743) 2019-08-07 02:42:33 -04:00
Bobby
3e2c472944 Fix model parameter recovery (#4738) 2019-08-07 02:32:10 -04:00
Rong Ou
851b5b3808 Remove gpu_exact tree method (#4742) 2019-08-07 11:43:20 +12:00
Jiaming Yuan
2a4df8e29f
Add Json integer, remove specialization. (#4739) 2019-08-06 03:10:49 -04:00
Jiaming Yuan
9c469b3844
Move bitfield into common. (#4737)
* Prepare for columnar format support.
2019-08-06 02:49:32 -04:00
Rong Ou
6edddd7966 Refactor DMatrix to return batches of different page types (#4686)
* Use explicit template parameter for specifying page type.
2019-08-03 15:10:34 -04:00
Jiaming Yuan
d2e1e4d5b4
A simple Json implementation for future use. (#4708)
* A simple Json implementation for future use.
2019-07-29 21:17:27 -04:00
Jiaming Yuan
001aaaee5f
Removed deprecated gpu objectives. (#4690) 2019-07-20 23:18:34 -04:00
Jiaming Yuan
f0064c07ab
Refactor configuration [Part II]. (#4577)
* Refactor configuration [Part II].

* General changes:
** Remove `Init` methods to avoid ambiguity.
** Remove `Configure(std::map<>)` to avoid redundant copying and prepare for
   parameter validation. (`std::vector` is returned from `InitAllowUnknown`).
** Add name to tree updaters for easier debugging.

* Learner changes:
** Make `LearnerImpl` the only source of configuration.

    All configurations are stored and carried out by `LearnerImpl::Configure()`.

** Remove booster in C API.

    Originally kept for "compatibility reason", but did not state why.  So here
    we just remove it.

** Add a `metric_names_` field in `LearnerImpl`.
** Remove `LazyInit`.  Configuration will always be lazy.
** Run `Configure` before every iteration.

* Predictor changes:
** Allocate both cpu and gpu predictor.
** Remove cpu_predictor from gpu_predictor.

    `GBTree` is now used to dispatch the predictor.

** Remove some GPU Predictor tests.

* IO

No IO changes.  The binary model format stability is tested by comparing
hashing value of save models between two commits
2019-07-20 08:34:56 -04:00
sriramch
7a388cbf8b Modify caching allocator/vector and fix issues relating to inability to train large datasets (#4615) 2019-07-09 18:33:27 +12:00
Xu Xiao
cd1526d3b1 fix auc error in distributed mode caused by unbalanced dataset (#4645) 2019-07-08 16:01:52 +08:00
Jiaming Yuan
d9a47794a5 Fix CPU hist init for sparse dataset. (#4625)
* Fix CPU hist init for sparse dataset.

* Implement sparse histogram cut.
* Allow empty features.

* Fix windows build, don't use sparse in distributed environment.

* Comments.

* Smaller threshold.

* Fix windows omp.

* Fix msvc lambda capture.

* Fix MSVC macro.

* Fix MSVC initialization list.

* Fix MSVC initialization list x2.

* Preserve categorical feature behavior.

* Rename matrix to sparse cuts.
* Reuse UseGroup.
* Check for categorical data when adding cut.

Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>

* Sanity check.

* Fix comments.

* Fix comment.
2019-07-04 16:27:03 -07:00