xgboost

Author	SHA1	Message	Date
Bobby Wang	6275cdc486	[jvm-packages] add format option when saving a model (#7940 )	2022-05-30 15:49:59 +08:00
Bobby Wang	fbc3d861bb	[jvm-packages] remove default parameters (#7938 )	2022-05-28 10:31:19 +08:00
Bobby Wang	5ef33adf68	[jvm-packges] set the correct objective if user doesn't explicitly set it (#7781 )	2022-05-18 14:05:18 +08:00
Bobby Wang	b41cf92dc2	[jvm-packages] move dmatrix building into rabit context for cpu pipeline (#7908 )	2022-05-17 14:52:25 +08:00
Bobby Wang	11e46e4bc0	[Breaking][jvm-packages] make classification model be xgboost-compatible (#7896 )	2022-05-14 15:43:05 +08:00
Bobby Wang	9fa7ed1743	[Breaking][jvm-packages] remove timeoutRequestWorkers parameter (#7839 )	2022-05-13 16:26:25 +08:00
Bobby Wang	a94e1b172e	[jvm-packages] Fix model compatibility (#7845 )	2022-04-28 02:05:38 +08:00
Bobby Wang	dc2e699656	[Breaking][jvm-packages] Use barrier execution mode (#7836 ) With the introduction of the barrier execution mode. we don't need to kill SparkContext when some xgboost tasks failed. Instead, Spark will handle the errors for us. So in this PR, `killSparkContextOnWorkerFailure` parameter is deleted.	2022-04-25 17:09:52 +08:00
Bobby Wang	c45665a55a	[jvm-packages] move the dmatrix building into rabit context (#7823 ) This fixes the QuantileDeviceDMatrix in distributed environment.	2022-04-23 00:06:50 +08:00
Bobby Wang	2d83b2ad8f	[jvm-packages] add hostIp and python exec for rabit tracker (#7808 )	2022-04-15 16:28:43 +08:00
Bobby Wang	3f536b5308	[jvm-packages] fix evaluation when featuresCols is used (#7798 )	2022-04-13 12:52:50 +08:00
Bobby Wang	118192f116	[jvm-packages] xgboost4j-spark should work when featuresCols is specified (#7789 )	2022-04-08 13:21:04 +08:00
Bobby Wang	2454407f3a	[jvm-packages] unify setFeaturesCol API for XGBoostRegressor (#7784 )	2022-04-05 13:35:33 +08:00
Jiaming Yuan	522636cb52	Bump version. (#7769 )	2022-03-31 06:33:22 +08:00
Bobby Wang	89aa8ddf52	[jvm-packages] fix the prediction issue for multi:softmax (#7694 )	2022-02-24 01:09:45 +08:00
Bobby Wang	e3e6de5ed9	[jvm-packages] unify the set features API (#7692 ) xgboost4j-spark provides 2 sets of API for setting features, one for CPU, another for GPU, which may cause confusion. This PR removes the GPU API and adds an override CPU function setFeaturesCol to accept Array[String] parameters.	2022-02-23 03:37:25 +08:00
Jiaming Yuan	ac7a36367c	[jvm-packages] Implement new `save_raw` in jvm-packages. (#7570 ) * New `toByteArray` that accepts a parameter for format.	2022-01-19 16:00:14 +08:00
Jiaming Yuan	001503186c	Rewrite approx (#7214 ) This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing. The rewrite has many benefits: - Support for both `max_leaves` and `max_depth`. - Support for `grow_policy`. - Support for mono constraint. - Support for feature weights. - Support for easier bin configuration (`max_bin`). - Support for categorical data. - Faster performance for most of the datasets. (many times faster) - Support for prediction cache. - Significantly better performance for external memory. - Unites the code base between approx and hist.	2022-01-10 21:15:05 +08:00
Bobby Wang	e8c1eb99e4	[jvm-package] Clean up the legacy gpu support tests (#7523 )	2021-12-21 09:15:51 +08:00
Bobby Wang	24e25802a7	[jvm-packages] Add Rapids plugin support (#7491 ) * Add GPU pre-processing pipeline.	2021-12-17 13:11:12 +08:00
Bobby Wang	7cfb310eb4	Rework transform (#7440 ) extract the common part of transform code from XGBoostClassifier and XGBoostRegressor	2021-11-18 15:48:57 +08:00
Bobby Wang	cb685607b2	[jvm-packages] Rework the train pipeline (#7401 ) 1. Add PreXGBoost to build RDD[Watches] from Dataset 2. Feed RDD[Watches] built from PreXGBoost to XGBoost to train	2021-11-10 17:51:38 +08:00
Bobby Wang	b81ebbef62	[jvm-packages] Fix json4s binary compatibility issue (#7376 ) Spark 3.2 depends on 3.7.0-M11 which has changed some implicited functions' signatures. And it will result the xgboost4j built against spark 3.0/3.1 failed when saving the model.	2021-10-30 03:20:57 +08:00
nicovdijk	31a307cf6b	[XGBoost4J-Spark] Serialization for custom objective and eval (#7274 ) * added type hints to custom_obj and custom_eval for Spark persistence Co-authored-by: Bobby Wang <wbo4958@gmail.com>	2021-10-21 16:22:23 +08:00
Bobby Wang	4fd149b3a2	[jvm-packages] update checkstyle (#7335 ) * [jvm-packages] update scalastyle 1. bump scalastyle-maven-plugin and maven-checkstyle-plugin to latest 2. remove unused imports * fix code style check	2021-10-18 18:42:01 +08:00
Jiaming Yuan	f7caac2563	Bump version to 1.6.0 in master. (#7259 )	2021-10-07 16:09:26 +08:00
Jiaming Yuan	146549260a	Bump version to 1.5.0 snapshot in master. (#6875 )	2021-04-22 01:53:44 +08:00
Bobby Wang	2c684ffd32	[jvm-packages] fix "key not found: train" issue (#6842 ) * [jvm-packages] fix "key not found: train" issue * fix bug	2021-04-18 23:28:39 -07:00
Bobby Wang	49c22c23b4	[jvm-packages] fix early stopping doesn't work even without custom_eval setting (#6738 ) * [jvm-packages] fix early stopping doesn't work even without custom_eval setting * remove debug info * resolve comment	2021-03-06 20:19:40 -08:00
Bobby Wang	9d2832a3a3	fix potential TaskFailedListener's callback won't be called (#6612 ) there is possibility that onJobStart of TaskFailedListener won't be called, if the job is submitted before the other thread adds addSparkListener. detail can be found at https://github.com/dmlc/xgboost/pull/6019#issuecomment-760937628	2021-01-21 14:20:32 +08:00
Philip Hyunsu Cho	0d483cb7c1	Bump version to 1.4.0 snapshot in master (#6486 )	2020-12-10 07:38:08 -08:00
zhang_jf	cc581b3b6b	Misleading exception information: no such param of "allow_non_zero_missing" (#6418 )	2020-11-20 19:33:34 +08:00
Nan Zhu	4d1d5d4010	[jvm-packages] fix potential unit test suites aborted issue (#6373 ) * fix race conditio * code cleaning rm pom.xml-e * clean again * fix compilation issue * recover * avoid using getOrCreate * interrupt zombie threads * safe guard * fix deadlock * Update SparkParallelismTracker.scala	2020-11-17 10:59:26 -08:00
Jiaming Yuan	d61b628bf5	Remove RABIT CMake targets. (#6275 ) * Now it's built as part of libxgboost. * Set correct C API error in RABIT initialization and finalization. * Remove redundant message. * Guard the tracker print C API.	2020-10-27 01:30:20 +08:00
Jiaming Yuan	b5c2a47b20	Drop single point model recovery (#6262 ) * Pass rabit params in JVM package. * Implement timeout using poll timeout parameter. * Remove OOB data check.	2020-10-21 15:27:03 +08:00
Christian Lorentzen	cf4f019ed6	[Breaking] Change default evaluation metric for classification to logloss / mlogloss (#6183 ) * Change DefaultEvalMetric of classification from error to logloss * Change default binary metric in plugin/example/custom_obj.cc * Set old error metric in python tests * Set old error metric in R tests * Fix missed eval metrics and typos in R tests * Fix setting eval_metric twice in R tests * Add warning for empty eval_metric for classification * Fix Dask tests Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-10-02 12:06:47 -07:00
Philip Hyunsu Cho	33577ef5d3	Add MAPE metric (#6119 )	2020-09-14 18:45:27 -07:00
Bobby Wang	0e2d5669f6	[jvm-packages] cancel job instead of killing SparkContext (#6019 ) * cancel job instead of killing SparkContext This PR changes the default behavior that kills SparkContext. Instead, This PR cancels jobs when coming across task failed. That means the SparkContext is still alive even some exceptions happen. * add a parameter to control if killing SparkContext * cancel the jobs the failed task belongs to * remove the jobId from the map when one job failed. * resolve comments	2020-09-02 14:20:59 -07:00
Anthony D'Amato	ada964f16e	Clean the way deterministic paritioning is computed (#6033 ) We propose to only use the rowHashCode to compute the partitionKey, adding the FeatureValue hashCode does not bring more value and would make the computation slower. Even though a collision would appear at 0.2% with MurmurHash3 this is bearable for partitioning, this won't have any impact on the data balancing.	2020-08-30 14:38:23 -07:00
FelixYBW	3a990433f9	set maxBins to 256. Align with c code in src/tree/param.h (#6066 )	2020-08-28 15:06:11 +03:00
Philip Hyunsu Cho	b3193052b3	Bump version to 1.3.0 snapshot in master (#6052 )	2020-08-23 17:13:46 -07:00
Anthony D'Amato	f58e41bad8	Fix deterministic partitioning with dataset containing Double.NaN (#5996 ) The functions featureValueOfSparseVector or featureValueOfDenseVector could return a Float.NaN if the input vectore was containing any missing values. This would make fail the partition key computation and most of the vectors would end up in the same partition. We fix this by avoid returning a NaN and simply use the row HashCode in this case. We added a test to ensure that the repartition is indeed now uniform on input dataset containing values by checking that the partitions size variance is below a certain threshold. Signed-off-by: Anthony D'Amato <anthony.damato@hotmail.fr>	2020-08-18 18:55:37 -07:00
Jiaming Yuan	f93f1c03fc	Rabit update. (#5978 ) * Remove parameter on JVM Packages.	2020-08-11 09:17:32 +08:00
Shaochen Shi	71197d1dfa	[jvm-packages] Fix wrong method name `setAllowZeroForMissingValue`. (#5740 ) * Allow non-zero for missing value when training. * Fix wrong method names. * Add a unit test * Move the getter/setter unit test to MissingValueHandlingSuite Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-08-01 17:16:42 -07:00
Jiaming Yuan	75b8c22b0b	Fix prediction heuristic (#5955 ) * Relax check for prediction. * Relax test in spark test. * Add tests in C++.	2020-07-29 19:24:07 +08:00
Bobby Wang	8943eb4314	[BLOCKING] [jvm-packages] add gpu_hist and enable gpu scheduling (#5171 ) * [jvm-packages] add gpu_hist tree method * change updater hist to grow_quantile_histmaker * add gpu scheduling * pass correct parameters to xgboost library * remove debug info * add use.cuda for pom * add CI for gpu_hist for jvm * add gpu unit tests * use gpu node to build jvm * use nvidia-docker * Add CLI interface to create_jni.py using argparse Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-07-26 21:53:24 -07:00
Philip Hyunsu Cho	487ab0ce73	[BLOCKING] Handle empty rows in data iterators correctly (#5929 ) * [jvm-packages] Handle empty rows in data iterators correctly * Fix clang-tidy error * last empty row * Add comments [skip ci] Co-authored-by: Nan Zhu <nanzhu@uber.com>	2020-07-25 13:46:19 -07:00
Bobby Wang	9f85e92602	[jvm-packages] update spark dependency to 3.0.0 (#5836 )	2020-07-12 20:58:30 -07:00
Zhang Zhang	1813804e36	Add new parameter singlePrecisionHistogram to xgboost4j-spark (#5811 ) Expose the existing 'singlePrecisionHistogram' param to the Spark layer.	2020-07-08 16:29:35 -07:00
Philip Hyunsu Cho	073b625bde	Bump version to 1.2.0 snapshot in master (#5733 )	2020-05-31 00:11:34 -07:00

1 2 3 4 5

243 Commits