xgboost

Author	SHA1	Message	Date
Jiaming Yuan	badeff1d74	Init estimation for regression. (#8272 )	2023-01-11 02:04:56 +08:00
Bobby Wang	4e12f3e1bc	[Breaking][jvm-packages] Bump rapids version to 22.12.0 (#8648 ) * [jvm-packages] Bump rapids version to 22.12.0 This PR bumps spark version to 3.1.1 and the rapids version to 22.12.0, which results in the latest xgboost can't run with the old rapids packages.	2023-01-07 18:59:17 +08:00
Jiaming Yuan	f73520bfff	Bump development version to 2.0. (#8390 )	2022-10-28 15:21:19 +08:00
Rong Ou	668b8a0ea4	[Breaking] Switch from rabit to the collective communicator (#8257 ) * Switch from rabit to the collective communicator * fix size_t specialization * really fix size_t * try again * add include * more include * fix lint errors * remove rabit includes * fix pylint error * return dict from communicator context * fix communicator shutdown * fix dask test * reset communicator mocklist * fix distributed tests * do not save device communicator * fix jvm gpu tests * add python test for federated communicator * Update gputreeshap submodule Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-05 14:39:01 -08:00
Jiaming Yuan	f835368bcf	Mark next release as 1.7 instead of 2.0 (#8281 )	2022-09-28 14:33:37 +08:00
Bobby Wang	8d247f0d64	[jvm-packages] fix spark-rapids compatibility issue (#8240 ) * [jvm-packages] fix spark-rapids compatibility issue spark-rapids (from 22.10) has shimmed GpuColumnVector, which means we can't call it directly. So this PR call the UnshimmedGpuColumnVector	2022-09-22 23:31:29 +08:00
Rong Ou	7d43e74e71	JNI wrapper for the collective communicator (#8242 )	2022-09-21 04:20:25 +08:00
Bobby Wang	a68580e2a7	[jvm-packages] fix executor crashing issue when transforming on xgboost4j-spark-gpu (#8025 ) * [jvm-packages] fix executor crashing issue when transforming on xgboost4j-spark-gpu the API XGBoosterSetParam is not thread-safe. Dring the phase of transforming, XGBoost runs several transforming tasks at a time, and each of them will set the "gpu_id" and "predictor" parameters, so if several tasks (multi-threads) all XGBoosterSetParam simultaneously, it may cause the memory to be corrupted and cause SIGSEGV. This PR first get the booster from broadcast and set to the correct gpu_id and predictor, and then all transforming taskes will use the same booster to do the transforming.	2022-06-24 01:18:41 +08:00
Rong Ou	e5ec546da5	[Breaking] Remove rabit support for custom reductions and `grow_local_histmaker` updater (#7992 )	2022-06-21 15:08:23 +08:00
Bobby Wang	545fd4548e	[jvm-packages] refactor xgboost read/write (#7956 ) 1. Removed the duplicated Default XGBoost read/write which is copied from spark 2.3.x 2. Put some utils into util package	2022-06-01 11:38:49 +08:00
Bobby Wang	6275cdc486	[jvm-packages] add format option when saving a model (#7940 )	2022-05-30 15:49:59 +08:00
Bobby Wang	fbc3d861bb	[jvm-packages] remove default parameters (#7938 )	2022-05-28 10:31:19 +08:00
Bobby Wang	5ef33adf68	[jvm-packges] set the correct objective if user doesn't explicitly set it (#7781 )	2022-05-18 14:05:18 +08:00
Bobby Wang	b41cf92dc2	[jvm-packages] move dmatrix building into rabit context for cpu pipeline (#7908 )	2022-05-17 14:52:25 +08:00
Bobby Wang	11e46e4bc0	[Breaking][jvm-packages] make classification model be xgboost-compatible (#7896 )	2022-05-14 15:43:05 +08:00
Bobby Wang	9fa7ed1743	[Breaking][jvm-packages] remove timeoutRequestWorkers parameter (#7839 )	2022-05-13 16:26:25 +08:00
Bobby Wang	a94e1b172e	[jvm-packages] Fix model compatibility (#7845 )	2022-04-28 02:05:38 +08:00
Bobby Wang	dc2e699656	[Breaking][jvm-packages] Use barrier execution mode (#7836 ) With the introduction of the barrier execution mode. we don't need to kill SparkContext when some xgboost tasks failed. Instead, Spark will handle the errors for us. So in this PR, `killSparkContextOnWorkerFailure` parameter is deleted.	2022-04-25 17:09:52 +08:00
Bobby Wang	c45665a55a	[jvm-packages] move the dmatrix building into rabit context (#7823 ) This fixes the QuantileDeviceDMatrix in distributed environment.	2022-04-23 00:06:50 +08:00
Bobby Wang	2d83b2ad8f	[jvm-packages] add hostIp and python exec for rabit tracker (#7808 )	2022-04-15 16:28:43 +08:00
Bobby Wang	3f536b5308	[jvm-packages] fix evaluation when featuresCols is used (#7798 )	2022-04-13 12:52:50 +08:00
Bobby Wang	118192f116	[jvm-packages] xgboost4j-spark should work when featuresCols is specified (#7789 )	2022-04-08 13:21:04 +08:00
Bobby Wang	2454407f3a	[jvm-packages] unify setFeaturesCol API for XGBoostRegressor (#7784 )	2022-04-05 13:35:33 +08:00
Jiaming Yuan	522636cb52	Bump version. (#7769 )	2022-03-31 06:33:22 +08:00
Bobby Wang	89aa8ddf52	[jvm-packages] fix the prediction issue for multi:softmax (#7694 )	2022-02-24 01:09:45 +08:00
Bobby Wang	e3e6de5ed9	[jvm-packages] unify the set features API (#7692 ) xgboost4j-spark provides 2 sets of API for setting features, one for CPU, another for GPU, which may cause confusion. This PR removes the GPU API and adds an override CPU function setFeaturesCol to accept Array[String] parameters.	2022-02-23 03:37:25 +08:00
Jiaming Yuan	ac7a36367c	[jvm-packages] Implement new `save_raw` in jvm-packages. (#7570 ) * New `toByteArray` that accepts a parameter for format.	2022-01-19 16:00:14 +08:00
Jiaming Yuan	001503186c	Rewrite approx (#7214 ) This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing. The rewrite has many benefits: - Support for both `max_leaves` and `max_depth`. - Support for `grow_policy`. - Support for mono constraint. - Support for feature weights. - Support for easier bin configuration (`max_bin`). - Support for categorical data. - Faster performance for most of the datasets. (many times faster) - Support for prediction cache. - Significantly better performance for external memory. - Unites the code base between approx and hist.	2022-01-10 21:15:05 +08:00
Bobby Wang	e8c1eb99e4	[jvm-package] Clean up the legacy gpu support tests (#7523 )	2021-12-21 09:15:51 +08:00
Bobby Wang	24e25802a7	[jvm-packages] Add Rapids plugin support (#7491 ) * Add GPU pre-processing pipeline.	2021-12-17 13:11:12 +08:00
Bobby Wang	7cfb310eb4	Rework transform (#7440 ) extract the common part of transform code from XGBoostClassifier and XGBoostRegressor	2021-11-18 15:48:57 +08:00
Bobby Wang	cb685607b2	[jvm-packages] Rework the train pipeline (#7401 ) 1. Add PreXGBoost to build RDD[Watches] from Dataset 2. Feed RDD[Watches] built from PreXGBoost to XGBoost to train	2021-11-10 17:51:38 +08:00
Bobby Wang	b81ebbef62	[jvm-packages] Fix json4s binary compatibility issue (#7376 ) Spark 3.2 depends on 3.7.0-M11 which has changed some implicited functions' signatures. And it will result the xgboost4j built against spark 3.0/3.1 failed when saving the model.	2021-10-30 03:20:57 +08:00
nicovdijk	31a307cf6b	[XGBoost4J-Spark] Serialization for custom objective and eval (#7274 ) * added type hints to custom_obj and custom_eval for Spark persistence Co-authored-by: Bobby Wang <wbo4958@gmail.com>	2021-10-21 16:22:23 +08:00
Bobby Wang	4fd149b3a2	[jvm-packages] update checkstyle (#7335 ) * [jvm-packages] update scalastyle 1. bump scalastyle-maven-plugin and maven-checkstyle-plugin to latest 2. remove unused imports * fix code style check	2021-10-18 18:42:01 +08:00
Jiaming Yuan	f7caac2563	Bump version to 1.6.0 in master. (#7259 )	2021-10-07 16:09:26 +08:00
Jiaming Yuan	146549260a	Bump version to 1.5.0 snapshot in master. (#6875 )	2021-04-22 01:53:44 +08:00
Bobby Wang	2c684ffd32	[jvm-packages] fix "key not found: train" issue (#6842 ) * [jvm-packages] fix "key not found: train" issue * fix bug	2021-04-18 23:28:39 -07:00
Bobby Wang	49c22c23b4	[jvm-packages] fix early stopping doesn't work even without custom_eval setting (#6738 ) * [jvm-packages] fix early stopping doesn't work even without custom_eval setting * remove debug info * resolve comment	2021-03-06 20:19:40 -08:00
Bobby Wang	9d2832a3a3	fix potential TaskFailedListener's callback won't be called (#6612 ) there is possibility that onJobStart of TaskFailedListener won't be called, if the job is submitted before the other thread adds addSparkListener. detail can be found at https://github.com/dmlc/xgboost/pull/6019#issuecomment-760937628	2021-01-21 14:20:32 +08:00
Philip Hyunsu Cho	0d483cb7c1	Bump version to 1.4.0 snapshot in master (#6486 )	2020-12-10 07:38:08 -08:00
zhang_jf	cc581b3b6b	Misleading exception information: no such param of "allow_non_zero_missing" (#6418 )	2020-11-20 19:33:34 +08:00
Nan Zhu	4d1d5d4010	[jvm-packages] fix potential unit test suites aborted issue (#6373 ) * fix race conditio * code cleaning rm pom.xml-e * clean again * fix compilation issue * recover * avoid using getOrCreate * interrupt zombie threads * safe guard * fix deadlock * Update SparkParallelismTracker.scala	2020-11-17 10:59:26 -08:00
Jiaming Yuan	d61b628bf5	Remove RABIT CMake targets. (#6275 ) * Now it's built as part of libxgboost. * Set correct C API error in RABIT initialization and finalization. * Remove redundant message. * Guard the tracker print C API.	2020-10-27 01:30:20 +08:00
Jiaming Yuan	b5c2a47b20	Drop single point model recovery (#6262 ) * Pass rabit params in JVM package. * Implement timeout using poll timeout parameter. * Remove OOB data check.	2020-10-21 15:27:03 +08:00
Christian Lorentzen	cf4f019ed6	[Breaking] Change default evaluation metric for classification to logloss / mlogloss (#6183 ) * Change DefaultEvalMetric of classification from error to logloss * Change default binary metric in plugin/example/custom_obj.cc * Set old error metric in python tests * Set old error metric in R tests * Fix missed eval metrics and typos in R tests * Fix setting eval_metric twice in R tests * Add warning for empty eval_metric for classification * Fix Dask tests Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-10-02 12:06:47 -07:00
Philip Hyunsu Cho	33577ef5d3	Add MAPE metric (#6119 )	2020-09-14 18:45:27 -07:00
Bobby Wang	0e2d5669f6	[jvm-packages] cancel job instead of killing SparkContext (#6019 ) * cancel job instead of killing SparkContext This PR changes the default behavior that kills SparkContext. Instead, This PR cancels jobs when coming across task failed. That means the SparkContext is still alive even some exceptions happen. * add a parameter to control if killing SparkContext * cancel the jobs the failed task belongs to * remove the jobId from the map when one job failed. * resolve comments	2020-09-02 14:20:59 -07:00
Anthony D'Amato	ada964f16e	Clean the way deterministic paritioning is computed (#6033 ) We propose to only use the rowHashCode to compute the partitionKey, adding the FeatureValue hashCode does not bring more value and would make the computation slower. Even though a collision would appear at 0.2% with MurmurHash3 this is bearable for partitioning, this won't have any impact on the data balancing.	2020-08-30 14:38:23 -07:00
FelixYBW	3a990433f9	set maxBins to 256. Align with c code in src/tree/param.h (#6066 )	2020-08-28 15:06:11 +03:00

1 2 3 4 5 ...

253 Commits