xgboost

Author	SHA1	Message	Date
nicovdijk	a6bcd54b47	[jvm-packages] Fix for space in sys.executable path in create_jni.py (#7358 )	2021-10-25 13:45:11 +08:00
nicovdijk	31a307cf6b	[XGBoost4J-Spark] Serialization for custom objective and eval (#7274 ) * added type hints to custom_obj and custom_eval for Spark persistence Co-authored-by: Bobby Wang <wbo4958@gmail.com>	2021-10-21 16:22:23 +08:00
nicovdijk	74bab6e504	Control logging for early stopping using shouldPrint() (#7326 )	2021-10-21 12:12:06 +08:00
Bobby Wang	4fd149b3a2	[jvm-packages] update checkstyle (#7335 ) * [jvm-packages] update scalastyle 1. bump scalastyle-maven-plugin and maven-checkstyle-plugin to latest 2. remove unused imports * fix code style check	2021-10-18 18:42:01 +08:00
Jiaming Yuan	f7caac2563	Bump version to 1.6.0 in master. (#7259 )	2021-10-07 16:09:26 +08:00
Jiaming Yuan	fbd58bf190	[jvm-packages] Create demo and test for xgboost4j early stopping. (#7252 )	2021-09-25 03:29:27 +08:00
Bobby Wang	0ee11dac77	[jvm-packages][xgboost4j-gpu] Support GPU dataframe and `DeviceQuantileDMatrix` (#7195 ) Following classes are added to support dataframe in java binding: - `Column` is an abstract type for a single column in tabular data. - `ColumnBatch` is an abstract type for dataframe. - `CuDFColumn` is an implementaiton of `Column` that consume cuDF column - `CudfColumnBatch` is an implementation of `ColumnBatch` that consumes cuDF dataframe. - `DeviceQuantileDMatrix` is the interface for quantized data. The Java implementation mimics the Python interface and uses `__cuda_array_interface__` protocol for memory indexing. One difference is on JVM package, the data batch is staged on the host as java iterators cannot be reset. Co-authored-by: jiamingy <jm.yuan@outlook.com>	2021-09-24 14:25:00 +08:00
Jiaming Yuan	9f63d6fead	[jvm-packages] Deprecate constructors with implicit missing value. (#7225 )	2021-09-17 04:35:04 +08:00
Martin Petříček	46c46829ce	Fix model loading from stream (#7067 ) Fix bug introduced in 17913713b554d820a8ce94226d854b4a5f1d8bbc (allow loading from byte array) When loading model from stream, only last buffer read from the input stream is used to construct the model. This may work for models smaller than 1 MiB (if you are lucky enough to read the whole model at once), but will always fail if the model is larger.	2021-08-15 21:04:33 +08:00
Jiaming Yuan	7017dd5a26	[JVM-Packages] Use Python tracker in XGBoost for JVM package. (#7132 )	2021-07-27 16:20:42 +08:00
naveenkb	9f7f8b976d	[XGBoost4J-Spark] bestIteration and bestScore for early stopping (#7095 )	2021-07-19 18:46:49 +08:00
Jiaming Yuan	663136aa08	Implement feature score for linear model. (#7048 ) * Add feature score support for linear model. * Port R interface to the new implementation. * Add linear model support in Python. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2021-06-25 14:34:02 +08:00
ShvetsKS	57c732655e	Merge lossgude and depthwise strategies for CPU hist (#7007 ) * fix java/scala test: max depth is also valid parameter for lossguide Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>	2021-06-03 01:49:43 +08:00
Adam Pocock	2320aa0da2	Making the Java library loader emit helpful error messages on missing dependencies. (#6926 )	2021-05-19 14:53:56 +08:00
Andrew Ziem	3e7e426b36	Fix spelling in documents (#6948 ) * Update roxygen2 doc. Co-authored-by: fis <jm.yuan@outlook.com>	2021-05-11 20:44:36 +08:00
Philip Hyunsu Cho	ec6ce08cd0	[jvm-packages] Make it easier to release GPU/CPU code artifacts to Maven Central (#6940 )	2021-05-04 14:00:03 -07:00
Jiaming Yuan	74b41637de	Revert "[jvm-packages] Add `XGBOOST_RABIT_TRACKER_IP_FOR_TEST` to set rabit tracker IP. (#6869 )" (#6886 ) This reverts commit 2828da3c4c951baa45d1bb6f85c7b3a6657cd607.	2021-04-21 11:20:10 -07:00
Bobby Wang	2828da3c4c	[jvm-packages] Add `XGBOOST_RABIT_TRACKER_IP_FOR_TEST` to set rabit tracker IP. (#6869 ) * Add `XGBOOST_RABIT_TRACKER_IP_FOR_TEST` to set rabit tracker IP * change spark and rabit tracker IP to 127.0.0.1on GitHub Action. Co-authored-by: fis <jm.yuan@outlook.com>	2021-04-22 02:00:22 +08:00
Jiaming Yuan	146549260a	Bump version to 1.5.0 snapshot in master. (#6875 )	2021-04-22 01:53:44 +08:00
Bobby Wang	2c684ffd32	[jvm-packages] fix "key not found: train" issue (#6842 ) * [jvm-packages] fix "key not found: train" issue * fix bug	2021-04-18 23:28:39 -07:00
Viktor Szathmáry	b65e3c4444	[jvm] reduce scala-compiler, scalatest dependency scopes (#6730 ) * [jvm] reduce scala-compiler, scalatest dependency scopes * [jvm] workaround for GpuTestSuite scalatest dependency * scalatest scope tweak	2021-04-07 15:22:08 -07:00
Bobby Wang	49c22c23b4	[jvm-packages] fix early stopping doesn't work even without custom_eval setting (#6738 ) * [jvm-packages] fix early stopping doesn't work even without custom_eval setting * remove debug info * resolve comment	2021-03-06 20:19:40 -08:00
Honza Sterba	17913713b5	[jvm] Add ability to load booster direct from byte array (#6655 ) * Add ability to load booster direct from byte array * fix compiler error * move InputStream to byte-buffer conversion - move it from Booster to XGBoost facade class	2021-02-23 11:28:27 -08:00
Adam Pocock	fec66d033a	[jvm-packages] JVM library loader extensions (#6630 ) * [java] extending the library loader to use both OS and CPU architecture. * Simplifying create_jni.py's architecture detection. * Tidying up the architecture detection in create_jni.py	2021-01-25 15:51:39 +08:00
Bobby Wang	9d2832a3a3	fix potential TaskFailedListener's callback won't be called (#6612 ) there is possibility that onJobStart of TaskFailedListener won't be called, if the job is submitted before the other thread adds addSparkListener. detail can be found at https://github.com/dmlc/xgboost/pull/6019#issuecomment-760937628	2021-01-21 14:20:32 +08:00
Philip Hyunsu Cho	0d483cb7c1	Bump version to 1.4.0 snapshot in master (#6486 )	2020-12-10 07:38:08 -08:00
zhang_jf	cc581b3b6b	Misleading exception information: no such param of "allow_non_zero_missing" (#6418 )	2020-11-20 19:33:34 +08:00
Nan Zhu	4d1d5d4010	[jvm-packages] fix potential unit test suites aborted issue (#6373 ) * fix race conditio * code cleaning rm pom.xml-e * clean again * fix compilation issue * recover * avoid using getOrCreate * interrupt zombie threads * safe guard * fix deadlock * Update SparkParallelismTracker.scala	2020-11-17 10:59:26 -08:00
Jiaming Yuan	dfac5f89e9	Group CLI demo into subdirectory. (#6258 ) CLI is not most developed interface. Putting them into correct directory can help new users to avoid it as most of the use cases are from a language binding.	2020-10-28 14:40:44 -07:00
Jiaming Yuan	b180223d18	Cleanup RABIT. (#6290 ) * Remove recovery and MPI speed tests. * Remove readme. * Remove Python binding. * Add checks in C API.	2020-10-27 08:48:22 +08:00
Jiaming Yuan	d61b628bf5	Remove RABIT CMake targets. (#6275 ) * Now it's built as part of libxgboost. * Set correct C API error in RABIT initialization and finalization. * Remove redundant message. * Guard the tracker print C API.	2020-10-27 01:30:20 +08:00
Jiaming Yuan	b5c2a47b20	Drop single point model recovery (#6262 ) * Pass rabit params in JVM package. * Implement timeout using poll timeout parameter. * Remove OOB data check.	2020-10-21 15:27:03 +08:00
dependabot[bot]	06e453ddf4	Bump junit from 4.11 to 4.13.1 in /jvm-packages/xgboost4j (#6230 ) Bumps [junit](https://github.com/junit-team/junit4) from 4.11 to 4.13.1. - [Release notes](https://github.com/junit-team/junit4/releases) - [Changelog](https://github.com/junit-team/junit4/blob/main/doc/ReleaseNotes4.11.md) - [Commits](https://github.com/junit-team/junit4/compare/r4.11...r4.13.1) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2020-10-13 19:46:19 -07:00
dependabot[bot]	b51a717deb	Bump junit from 4.11 to 4.13.1 in /jvm-packages/xgboost4j-gpu (#6233 ) Bumps [junit](https://github.com/junit-team/junit4) from 4.11 to 4.13.1. - [Release notes](https://github.com/junit-team/junit4/releases) - [Changelog](https://github.com/junit-team/junit4/blob/main/doc/ReleaseNotes4.11.md) - [Commits](https://github.com/junit-team/junit4/compare/r4.11...r4.13.1) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2020-10-13 19:44:56 -07:00
Philip Hyunsu Cho	c991eb612d	[jvm-packages] Fix up build for xgboost4j-gpu, xgboost4j-spark-gpu (#6216 ) * [CI] Clean up build for JVM packages * Use correct path for saving native lib * Fix groupId of maven-surefire-plugin * Fix stashing of xgboost4j_jar_gpu * [CI] Don't run xgboost4j-tester with GPU, since it doesn't use gpu_hist	2020-10-09 14:08:15 -07:00
Christian Lorentzen	cf4f019ed6	[Breaking] Change default evaluation metric for classification to logloss / mlogloss (#6183 ) * Change DefaultEvalMetric of classification from error to logloss * Change default binary metric in plugin/example/custom_obj.cc * Set old error metric in python tests * Set old error metric in R tests * Fix missed eval metrics and typos in R tests * Fix setting eval_metric twice in R tests * Add warning for empty eval_metric for classification * Fix Dask tests Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-10-02 12:06:47 -07:00
Nan Zhu	c932fb50a1	[jvm-packages]add xgboost4j-gpu/xgboost4j-spark-gpu module to facilitate release (#6136 ) * add xgboost4j-gpu/xgboost4j-spark-gpu module to facilitate release * Update pom.xml	2020-09-20 09:20:38 -07:00
Philip Hyunsu Cho	33577ef5d3	Add MAPE metric (#6119 )	2020-09-14 18:45:27 -07:00
Hristo Iliev	da61d9460b	[jvm-packages] Add getNumFeature method (#6075 ) * Add getNumFeature to the Java API * Add getNumFeature to the Scala API * Add unit tests for getNumFeature Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-09-07 20:57:46 -07:00
Bobby Wang	0e2d5669f6	[jvm-packages] cancel job instead of killing SparkContext (#6019 ) * cancel job instead of killing SparkContext This PR changes the default behavior that kills SparkContext. Instead, This PR cancels jobs when coming across task failed. That means the SparkContext is still alive even some exceptions happen. * add a parameter to control if killing SparkContext * cancel the jobs the failed task belongs to * remove the jobId from the map when one job failed. * resolve comments	2020-09-02 14:20:59 -07:00
Anthony D'Amato	ada964f16e	Clean the way deterministic paritioning is computed (#6033 ) We propose to only use the rowHashCode to compute the partitionKey, adding the FeatureValue hashCode does not bring more value and would make the computation slower. Even though a collision would appear at 0.2% with MurmurHash3 this is bearable for partitioning, this won't have any impact on the data balancing.	2020-08-30 14:38:23 -07:00
FelixYBW	3a990433f9	set maxBins to 256. Align with c code in src/tree/param.h (#6066 )	2020-08-28 15:06:11 +03:00
Philip Hyunsu Cho	9c14e430af	[CI] Improve JVM test in GitHub Actions (#5930 ) * [CI] Improve JVM test in GitHub Actions * Use env var for Wagon options [skip ci] * Move the retry flag to pom.xml [skip ci] * Export env var RABIT_MOCK to run Spark tests [skip ci] * Correct location of env var * Re-try up to 5 times [skip ci] * Don't run distributed training test on Windows * Fix typo * Update main.yml	2020-08-25 10:14:46 -07:00
Philip Hyunsu Cho	b3193052b3	Bump version to 1.3.0 snapshot in master (#6052 )	2020-08-23 17:13:46 -07:00
Philip Hyunsu Cho	4729458a36	[jvm-packages] [doc] Update install doc for JVM packages (#6051 )	2020-08-23 14:14:53 -07:00
Anthony D'Amato	f58e41bad8	Fix deterministic partitioning with dataset containing Double.NaN (#5996 ) The functions featureValueOfSparseVector or featureValueOfDenseVector could return a Float.NaN if the input vectore was containing any missing values. This would make fail the partition key computation and most of the vectors would end up in the same partition. We fix this by avoid returning a NaN and simply use the row HashCode in this case. We added a test to ensure that the repartition is indeed now uniform on input dataset containing values by checking that the partitions size variance is below a certain threshold. Signed-off-by: Anthony D'Amato <anthony.damato@hotmail.fr>	2020-08-18 18:55:37 -07:00
Jiaming Yuan	f93f1c03fc	Rabit update. (#5978 ) * Remove parameter on JVM Packages.	2020-08-11 09:17:32 +08:00
Shaochen Shi	71197d1dfa	[jvm-packages] Fix wrong method name `setAllowZeroForMissingValue`. (#5740 ) * Allow non-zero for missing value when training. * Fix wrong method names. * Add a unit test * Move the getter/setter unit test to MissingValueHandlingSuite Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-08-01 17:16:42 -07:00
Philip Hyunsu Cho	3fcfaad577	Add CMake flag to log C API invocations, to aid debugging (#5925 ) * Add CMake flag to log C API invocations, to aid debugging * Remove unnecessary parentheses	2020-07-30 19:24:28 -07:00
Jiaming Yuan	75b8c22b0b	Fix prediction heuristic (#5955 ) * Relax check for prediction. * Relax test in spark test. * Add tests in C++.	2020-07-29 19:24:07 +08:00

... 2 3 4 5 6 ...

531 Commits