15 Commits

Author SHA1 Message Date
Jiaming Yuan
2ebdec8aa6
Fix dask prediction. (#4941)
* Fix dask prediction.

* Add better error messages for wrong partition.
2019-10-14 23:19:34 -04:00
Jiaming Yuan
5374f52531
Complete cudf support. (#4850)
* Handles missing value.
* Accept all floating point and integer types.
* Move to cudf 9.0 API.
* Remove requirement on `null_count`.
* Arbitrary column types support.
2019-09-16 23:52:00 -04:00
Jiaming Yuan
9700776597 Cudf support. (#4745)
* Initial support for cudf integration.

* Add two C APIs for consuming data and metainfo.

* Add CopyFrom for SimpleCSRSource as a generic function to consume the data.

* Add FromDeviceColumnar for consuming device data.

* Add new MetaInfo::SetInfo for consuming label, weight etc.
2019-08-19 16:51:40 +12:00
Rong Ou
19f9fd5de9 remove the qids_ field in MetaInfo (#4744) 2019-08-08 10:01:59 +08:00
Rong Ou
6edddd7966 Refactor DMatrix to return batches of different page types (#4686)
* Use explicit template parameter for specifying page type.
2019-08-03 15:10:34 -04:00
Rory Mitchell
70d208d68c
Dmatrix refactor stage 2 (#3395)
* DMatrix refactor 2

* Remove buffered rowset usage where possible

* Transition to c++11 style iterators for row access

* Transition column iterators to C++ 11
2018-10-01 01:29:03 +13:00
Andy Adinets
72cd1517d6 Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage. (#3446)
* Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage.

- added distributions to HostDeviceVector
- using HostDeviceVector for labels, weights and base margings in MetaInfo
- using HostDeviceVector for offset and data in SparsePage
- other necessary refactoring

* Added const version of HostDeviceVector API calls.

- const versions added to calls that can trigger data transfers, e.g. DevicePointer()
- updated the code that uses HostDeviceVector
- objective functions now accept const HostDeviceVector<bst_float>& for predictions

* Updated src/linear/updater_gpu_coordinate.cu.

* Added read-only state for HostDeviceVector sync.

- this means no copies are performed if both host and devices access
  the HostDeviceVector read-only

* Fixed linter and test errors.

- updated the lz4 plugin
- added ConstDeviceSpan to HostDeviceVector
- using device % dh::NVisibleDevices() for the physical device number,
  e.g. in calls to cudaSetDevice()

* Fixed explicit template instantiation errors for HostDeviceVector.

- replaced HostDeviceVector<unsigned int> with HostDeviceVector<int>

* Fixed HostDeviceVector tests that require multiple GPUs.

- added a mock set device handler; when set, it is called instead of cudaSetDevice()
2018-08-30 14:28:47 +12:00
liuliang01
0cf88d036f Add qid like ranklib format (#2749)
* add qid for https://github.com/dmlc/xgboost/issues/2748

* change names

* change spaces

* change qid to bst_uint type

* change qid type to size_t

* change qid first to SIZE_MAX

* change qid type from size_t to uint64_t

* update dmlc-core

* fix qids name error

* fix group_ptr_ error

* Style fix

* Add qid handling logic to SparsePage

* New MetaInfo format + backward compatibility fix

Old MetaInfo format (1.0) doesn't contain qid field. We still want to be able
to read from MetaInfo files saved in old format. Also, define a new format
(2.0) that contains the qid field. This way, we can distinguish files that
contain qid and those that do not.

* Update MetaInfo test

* Simply group assignment logic

* Explicitly set qid=nullptr in NativeDataIter

NativeDataIter's callback does not support qid field. Users of NativeDataIter
will need to call setGroup() function separately to set group information.

* Save qids_ in SaveBinary()

* Upgrade dmlc-core submodule

* Add a test for reading qid

* Add contributor

* Check the size of qids_

* Document qid format
2018-06-30 20:24:03 +00:00
Rory Mitchell
a96039141a
Dmatrix refactor stage 1 (#3301)
* Use sparse page as singular CSR matrix representation

* Simplify dmatrix methods

* Reduce statefullness of batch iterators

* BREAKING CHANGE: Remove prob_buffer_row parameter. Users are instead recommended to sample their dataset as a preprocessing step before using XGBoost.
2018-06-07 10:25:58 +12:00
Rory Mitchell
ccf80703ef
Clang-tidy static analysis (#3222)
* Clang-tidy static analysis

* Modernise checks

* Google coding standard checks

* Identifier renaming according to Google style
2018-04-19 18:57:13 +12:00
Alex Bain
dbaa5d0bdf Disable invalid check for completely sparse batch that results in failed assertion for issue #1827 (#2213) 2017-04-21 09:28:02 -07:00
Ubuntu
724eda2435 remove reserve for more aggressive memory generation 2016-02-10 11:17:18 -08:00
tqchen
2dc6c2dc52 [R] enable R compile
[R] Enable R build for windows and linux
2016-01-16 10:24:02 -08:00
tqchen
d75e3ed05d [LIBXGBOOST] pass demo running. 2016-01-16 10:24:01 -08:00
tqchen
46bcba7173 [DATA] basic data refactor done, basic version of csr source. 2016-01-16 10:24:00 -08:00