10 Commits

Author SHA1 Message Date
Jiaming Yuan
3515931305
Initial support for external memory in gradient index. (#7183)
* Add hessian to batch param in preparation of new approx impl.
* Extract a push method for gradient index matrix.
* Use span instead of vector ref for hessian in sketching.
* Create a binary format for gradient index.
2021-09-13 12:40:56 +08:00
Jiaming Yuan
bd1f3a38f0
Rewrite sparse dmatrix using callbacks. (#7092)
- Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves.
- Remove use of threaded iterator and IO queue.
- Remove `page_size`.
- Make sure the number of pages in memory is bounded.
- Make sure the cache can not be violated.
- Provide an interface for internal algorithms to process data asynchronously.
2021-07-16 12:33:31 +08:00
Jiaming Yuan
c766f143ab
Refactor external memory formats. (#7089)
* Save base_rowid.
* Return write size.
* Remove unused function.
2021-07-08 04:04:51 +08:00
Philip Hyunsu Cho
1d22a9be1c
Revert "Reorder includes. (#5749)" (#5771)
This reverts commit d3a0efbf162f3dceaaf684109e1178c150b32de3.
2020-06-09 10:29:28 -07:00
Jiaming Yuan
d3a0efbf16
Reorder includes. (#5749)
* Reorder includes.

* R.
2020-06-03 17:30:47 +12:00
Rory Mitchell
c7cc657a4d
Use adapters for SparsePageDMatrix (#5092) 2019-12-11 15:59:23 +13:00
Jiaming Yuan
6ec7e300bd
Fix external memory race in colmaker. (#4980)
* Move `GetColDensity` out of omp parallel block.
2019-10-25 04:11:13 -04:00
Rong Ou
5b1715d97c Write ELLPACK pages to disk (#4879)
* add ellpack source
* add batch param
* extract function to parse cache info
* construct ellpack info separately
* push batch to ellpack page
* write ellpack page.
* make sparse page source reusable
2019-10-22 23:44:32 -04:00
Jiaming Yuan
2e618af743
Fix cpplint. (#4157)
* Add comment after #endif.
* Add missing headers.
2019-02-18 00:16:29 +08:00
Rory Mitchell
a96039141a
Dmatrix refactor stage 1 (#3301)
* Use sparse page as singular CSR matrix representation

* Simplify dmatrix methods

* Reduce statefullness of batch iterators

* BREAKING CHANGE: Remove prob_buffer_row parameter. Users are instead recommended to sample their dataset as a preprocessing step before using XGBoost.
2018-06-07 10:25:58 +12:00