xgboost

Author	SHA1	Message	Date
Jiaming Yuan	5817840858	Remove `omp_get_max_threads` in data. (#7588 )	2022-01-24 02:44:07 +08:00
Jiaming Yuan	58a6723eb1	Initial support for multioutput regression. (#7514 ) * Add num target model parameter, which is configured from input labels. * Change elementwise metric and indexing for weights. * Add demo. * Add tests.	2021-12-18 09:28:38 +08:00
Jiaming Yuan	5b1161bb64	Convert labels into tensor. (#7456 ) * Add a new ctor to tensor for `initilizer_list`. * Change labels from host device vector to tensor. * Rename the field from `labels_` to `labels` since it's a public member.	2021-12-17 00:58:35 +08:00
Jiaming Yuan	70b12d898a	[dask] Fix ddqdm with empty partition. (#7510 ) * Fix empty partition. * war.	2021-12-16 20:37:29 +08:00
Jiaming Yuan	85cbd32c5a	Add range-based slicing to tensor view. (#7453 )	2021-11-27 13:42:36 +08:00
Jiaming Yuan	557ffc4bf5	Reduce base margin to 2 dim for now. (#7455 )	2021-11-27 00:46:13 +08:00
Jiaming Yuan	d33854af1b	[Breaking] Accept multi-dim meta info. (#7405 ) This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.	2021-11-18 23:02:54 +08:00
Jiaming Yuan	55ee272ea8	Extend array interface to handle ndarray. (#7434 ) * Extend array interface to handle ndarray. The `ArrayInterface` class is extended to support multi-dim array inputs. Previously this class handles only 2-dim (vector is also matrix). This PR specifies the expected dimension at compile-time and the array interface can perform various checks automatically for input data. Also, adapters like CSR are more rigorous about their input. Lastly, row vector and column vector are handled without intervention from the caller.	2021-11-16 09:52:15 +08:00
Jiaming Yuan	d4274bc556	Fix typo. (#7433 )	2021-11-15 01:28:11 +08:00
Jiaming Yuan	ac9bfaa4f2	Handle missing values in dataframe with category dtype. (#7331 ) * Replace -1 in pandas initializer. * Unify `IsValid` functor. * Mimic pandas data handling in cuDF glue code. * Check invalid categories. * Fix DDM sketching.	2021-10-28 03:33:54 +08:00
Jiaming Yuan	d1f00fb0b7	Stricter validation for group. (#7345 )	2021-10-21 12:13:33 +08:00
Jiaming Yuan	0ed979b096	Support more input types for categorical data. (#7220 ) * Support more input types for categorical data. * Shorten the type name from "categorical" to "c". * Tests for np/cp array and scipy csr/csc/coo. * Specify the type for feature info.	2021-09-16 20:39:30 +08:00
Jiaming Yuan	3515931305	Initial support for external memory in gradient index. (#7183 ) * Add hessian to batch param in preparation of new approx impl. * Extract a push method for gradient index matrix. * Use span instead of vector ref for hessian in sketching. * Create a binary format for gradient index.	2021-09-13 12:40:56 +08:00
Philip Hyunsu Cho	336af4f974	Work around a segfault observed in SparsePage::Push() (#7161 ) * Work around a segfault observed in SparsePage::Push() * Revert "Work around a segfault observed in SparsePage::Push()" This reverts commit 30934844d00908750a5442082eb4769b1489f6a9. * Don't call vector::resize() inside OpenMP block * Set GITHUB_PAT env var to fix R tests * Use built-in GITHUB_TOKEN	2021-08-08 02:12:30 -07:00
Jiaming Yuan	8a84be37b8	Pass scikit learn estimator checks for regressor. (#7130 ) * Check data shape. * Check labels.	2021-08-03 18:58:20 +08:00
Jiaming Yuan	e6088366df	Export Python Interface for external memory. (#7070 ) * Add Python iterator interface. * Add tests. * Add demo. * Add documents. * Handle empty dataset.	2021-07-22 15:15:53 +08:00
Jiaming Yuan	bd1f3a38f0	Rewrite sparse dmatrix using callbacks. (#7092 ) - Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves. - Remove use of threaded iterator and IO queue. - Remove `page_size`. - Make sure the number of pages in memory is bounded. - Make sure the cache can not be violated. - Provide an interface for internal algorithms to process data asynchronously.	2021-07-16 12:33:31 +08:00
Jiaming Yuan	4cf95a6041	Support numpy array interface (#6998 )	2021-05-27 16:08:22 +08:00
ShvetsKS	8825670c9c	Memory consumption fix for row-major adapters (#6779 ) Co-authored-by: Kirill Shvets <kirill.shvets@intel.com> Co-authored-by: fis <jm.yuan@outlook.com>	2021-03-26 08:44:30 +08:00
Jiaming Yuan	bcc0277338	Re-implement ROC-AUC. (#6747 ) * Re-implement ROC-AUC. * Binary * MultiClass * LTR * Add documents. This PR resolves a few issues: - Define a value when the dataset is invalid, which can happen if there's an empty dataset, or when the dataset contains only positive or negative values. - Define ROC-AUC for multi-class classification. - Define weighted average value for distributed setting. - A correct implementation for learning to rank task. Previous implementation is just binary classification with averaging across groups, which doesn't measure ordered learning to rank.	2021-03-20 16:52:40 +08:00
Jiaming Yuan	f20074e826	Check for invalid data. (#6742 )	2021-03-04 14:37:20 +08:00
Louis Desreumaux	9b530e5697	Improve OpenMP exception handling (#6680 )	2021-02-25 13:56:16 +08:00
Jiaming Yuan	5d48d40d9a	Fix DMatrix slice with feature types. (#6689 )	2021-02-09 08:13:51 +08:00
Jiaming Yuan	dbb5208a0a	Use __array_interface__ for creating DMatrix from CSR. (#6675 ) * Use __array_interface__ for creating DMatrix from CSR. * Add configuration.	2021-02-05 21:09:47 +08:00
Jiaming Yuan	f2f7dd87b8	Use view for `SparsePage` exclusively. (#6590 )	2021-01-11 18:04:55 +08:00
Jiaming Yuan	80065d571e	[dask] Add DaskXGBRanker (#6576 ) * Initial support for distributed LTR using dask. * Support `qid` in libxgboost. * Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]` to avoid duplicated code. * Define `DaskXGBRanker`. The dask ranker doesn't support group structure, instead it uses query id and convert to group ptr internally.	2021-01-08 18:35:09 +08:00
Jiaming Yuan	347f593169	Accept numpy array for DMatrix slice index. (#6368 )	2020-12-16 14:42:52 +08:00
ShvetsKS	512b464cfa	Disable HT for DMatrix creation (#6386 ) Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>	2020-11-14 22:18:33 +08:00
Jiaming Yuan	43efadea2e	Deterministic data partitioning for external memory (#6317 ) * Make external memory data partitioning deterministic. * Change the meaning of `page_size` from bytes to number of rows. * Design a data pool. * Note for external memory. * Enable unity build on Windows CI. * Force garbage collect on test.	2020-11-11 06:11:06 +08:00
Jiaming Yuan	b180223d18	Cleanup RABIT. (#6290 ) * Remove recovery and MPI speed tests. * Remove readme. * Remove Python binding. * Add checks in C API.	2020-10-27 08:48:22 +08:00
Jiaming Yuan	ddf37cca30	Unify thread configuration. (#6186 )	2020-10-19 16:05:42 +08:00
Jiaming Yuan	b5f52f0b1b	Validate weights are positive values. (#6115 )	2020-09-15 09:03:55 +08:00
Jiaming Yuan	20c95be625	Expand categorical node. (#6028 ) Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-08-25 18:53:57 +08:00
ShvetsKS	24f2e6c97e	Optimize DMatrix build time. (#5877 ) Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>	2020-08-20 01:37:03 +08:00
Qi Zhang	989ddd036f	Swap byte-order in binary serializer to support big-endian arch (#5813 ) * fixed some endian issues * Use dmlc::ByteSwap() to simplify code * Fix lint check * [CI] Add test for s390x * Download latest CMake on s390x * Fix a bug in my code * Save magic number in dmatrix with byteswap on big-endian machine * Save version in binary with byteswap on big-endian machine * Load scalar with byteswap in MetaInfo * Add a debugging message * Handle arrays correctly when byteswapping * EOF can also be 255 * Handle magic number in MetaInfo carefully * Skip Tree.Load test for big-endian, since the test manually builds little-endian binary model * Handle missing packages in Python tests * Don't use boto3 in model compatibility tests * Add s390 Docker file for local testing * Add model compatibility tests * Add R compatibility test * Revert "Add R compatibility test" This reverts commit c2d2bdcb7dbae133cbb927fcd20f7e83ee2b18a8. Co-authored-by: Qi Zhang <q.zhang@ibm.com> Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-08-18 14:47:17 -07:00
Jiaming Yuan	4d99c58a5f	Feature weights (#5962 )	2020-08-18 19:55:41 +08:00
Philip Hyunsu Cho	487ab0ce73	[BLOCKING] Handle empty rows in data iterators correctly (#5929 ) * [jvm-packages] Handle empty rows in data iterators correctly * Fix clang-tidy error * last empty row * Add comments [skip ci] Co-authored-by: Nan Zhu <nanzhu@uber.com>	2020-07-25 13:46:19 -07:00
Philip Hyunsu Cho	4af857f95d	Add explicit template specialization for portability (#5921 ) * Add explicit template specializations * Adding Specialization for FileAdapterBatch	2020-07-22 12:31:17 -07:00
Jiaming Yuan	7c2686146e	Dask device dmatrix (#5901 ) * Fix softprob with empty dmatrix.	2020-07-17 13:17:43 +08:00
Jiaming Yuan	93c44a9a64	Move feature names and types of DMatrix from Python to C++. (#5858 ) * Add thread local return entry for DMatrix. * Save feature name and feature type in binary file. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-07-07 09:40:13 +08:00
Jiaming Yuan	1a0801238e	Implement iterative DMatrix. (#5837 )	2020-07-03 11:44:52 +08:00
Jiaming Yuan	c4d721200a	Implement extend method for meta info. (#5800 ) * Implement extend for host device vector.	2020-06-20 03:32:03 +08:00
Jiaming Yuan	306e38ff31	Avoid including `c_api.h` in header files. (#5782 )	2020-06-12 16:24:24 +08:00
Philip Hyunsu Cho	1d22a9be1c	Revert "Reorder includes. (#5749 )" (#5771 ) This reverts commit d3a0efbf162f3dceaaf684109e1178c150b32de3.	2020-06-09 10:29:28 -07:00
Jiaming Yuan	d3a0efbf16	Reorder includes. (#5749 ) * Reorder includes. * R.	2020-06-03 17:30:47 +12:00
Philip Hyunsu Cho	8de7f1928e	Fix build on big endian CPUs (#5617 ) * Fix build on big endian CPUs * Clang-tidy	2020-04-29 21:56:34 -07:00
Jiaming Yuan	e726dd9902	Set device in device dmatrix. (#5596 )	2020-04-25 13:42:53 +08:00
Jiaming Yuan	29a4cfe400	Group aware GPU sketching. (#5551 ) * Group aware GPU weighted sketching. * Distribute group weights to each data point. * Relax the test. * Validate input meta info. * Fix metainfo copy ctor.	2020-04-20 17:18:52 +08:00
Jiaming Yuan	e1f22baf8c	Fix slice and get info. (#5552 )	2020-04-18 18:00:13 +08:00
Jiaming Yuan	6671b42dd4	Use ellpack for prediction only when sparsepage doesn't exist. (#5504 )	2020-04-10 12:15:46 +08:00

1 2 3 4

159 Commits