xgboost

Author	SHA1	Message	Date
Philip Hyunsu Cho	6852d0afd5	Separate out restricted and unrestricted tasks (#3736 )	2018-09-27 23:20:06 -07:00
Philip Hyunsu Cho	c0bd296354	Fix #3730 : scikit-learn 0.20 compatibility fix (#3731 ) * Fix #3730: scikit-learn 0.20 compatibility fix sklearn.cross_validation has been removed from scikit-learn 0.20, so replace it with sklearn.model_selection * Display test names for Python tests for clarity	2018-09-27 15:04:25 -07:00
Philip Hyunsu Cho	09142c94f5	Disable flaky tests in R-package/tests/testthat/test_update.R (#3723 )	2018-09-26 14:57:46 -07:00
Philip Cho	ba4244ef51	Mask tests for 32-bit Windows that fail due to difference between x87 and SSE	2018-09-05 16:18:20 -07:00
Philip Hyunsu Cho	a46b0ac2d2	Fix CRAN check by removing reference to std::cerr (#3660 ) * Fix CRAN check by removing reference to std::cerr * Mask tests that fail on 32-bit Windows R	2018-09-05 12:05:31 -07:00
gorogm	4bc7e94603	Link fixed. (#3640 )	2018-09-05 12:04:46 -07:00
Philip Hyunsu Cho	a899e8f4cd	Document CUDA requirement, lack of external memory on GPU (#3624 ) * Document fact that GPU doesn't support external memory * Document CUDA requirement	2018-09-05 12:03:46 -07:00
Philip Cho	f9a833f525	Update Python API doc (#3619 ) * Show inherited members of XGBRegressor in API doc, since XGBRegressor uses default methods from XGBModel * Add table of contents to Python API doc * Skip JVM doc download if not available * Show inherited members for XGBRegressor * Add docstring to XGBRegressor.predict() * Fix rendering errors in Python docstrings * Fix lint	2018-09-05 12:02:11 -07:00
Grant W Schneider	1afd2f1b2d	Remove errant $ (#3618 )	2018-09-05 11:55:14 -07:00
Philip Hyunsu Cho	b1d76d533d	Fix #3609 : Removed unused parameter 'use_buffer' (#3610 )	2018-09-05 11:54:54 -07:00
Philip Hyunsu Cho	9d70655c42	Fix #3598 : document that custom objective can't contain colon (:) (#3601 )	2018-09-05 11:54:19 -07:00
Grace Lam	dd1fda449c	Add JSON dump functionality documentation (#3600 )	2018-09-05 11:53:31 -07:00
Jakob Richter	324f3b5259	replace nround with nrounds to match actual parameter (#3592 )	2018-09-05 11:52:34 -07:00
Philip Cho	24e08c2638	Add version to doc sidebar	2018-08-13 01:46:05 -07:00
Philip Hyunsu Cho	96826a3515	Release version 0.80 (#3541 ) * Up versions * Write release note for 0.80 v0.80	2018-08-13 01:38:37 -07:00
Mathew	06ef4db4cc	Fix Spark 2.2 Support (Amending #3062 ) (#3325 ) This pull request amends the broken #3062 allow Spark 2.2 to work. Please note this won't work in Spark <=2.1 as sc.removeSparkListener was implemented in Spark 2.2. (So perhaps a more general method is better, although that is what was attempted in #3062) This PR fixes: #3208, #3151 and the discussion in #1927. I do find it strange that #3062 dose not work in Spark 2.2, it's probably due to some sort of public/private issue in the org.apache.spark.scheduler.LiveListenerBus class inheritance (In Spark itself). The error is: `java.lang.NoSuchMethodError: org.apache.spark.scheduler.LiveListenerBus.removeListener(Ljava/lang/Object;)V`	2018-08-12 18:35:20 -07:00
Rory Mitchell	645996b12f	Remove accidental SparsePage copies (#3583 )	2018-08-12 17:49:38 -07:00
Philip Hyunsu Cho	0b607fb884	Add link to XGBoost4J-Spark tutorial on AWS Yarn tutorial (#3582 )	2018-08-12 07:27:28 -07:00
Philip Hyunsu Cho	4202332783	Clarify multi-GPU training, binary wheels, Pandas integration (#3581 ) * Clarify multi-GPU training, binary wheels, Pandas integration * Add a note about multi-GPU on gpu/index.rst	2018-08-11 19:21:28 -07:00
Matthew Tovbin	7300002516	[jvm-packages] Use treeLimit param in getTreeLimit (#3575 )	2018-08-10 09:38:58 -07:00
Philip Hyunsu Cho	9c647d8130	Bring XGBoost4J Intro up-to-date (#3574 )	2018-08-10 09:08:19 -07:00
Philip Hyunsu Cho	2e7c3a0ed5	Refined logic for locating git branch inside ReadTheDocs (#3573 )	2018-08-09 15:28:12 -07:00
Philip Hyunsu Cho	aa4ee6a0e4	[BLOCKING] Adding JVM doc build to Jenkins CI (#3567 ) * Adding Java/Scala doc build to Jenkins CI * Deploy built doc to S3 bucket * Build doc only for branches * Build doc first, to get doc faster for branch updates * Have ReadTheDocs download doc tarball from S3 * Update JVM doc links * Put doc build commands in a script * Specify Spark 2.3+ requirement for XGBoost4J-Spark * Build GPU wheel without NCCL, to reduce binary size	2018-08-09 13:27:01 -07:00
Matthew Tovbin	bad76048d1	Eliminate use of System.out + proper error logging (#3572 )	2018-08-09 10:06:17 -07:00
Rory Mitchell	bbb771f32e	Refactor parts of fast histogram utilities (#3564 ) * Refactor parts of fast histogram utilities * Removed byte packing from column matrix	2018-08-09 17:59:57 +12:00
Philip Hyunsu Cho	3c72654e3b	Revert "Fix #3485 , #3540 : Don't use dropout for predicting test sets" (#3563 ) * Revert "Fix #3485, #3540: Don't use dropout for predicting test sets (#3556)" This reverts commit `44811f2330`. * Document behavior of predict() for DART booster * Add notice to parameter.rst	2018-08-08 09:48:55 -07:00
Zeno Gantner	e3e776bd58	grammar fixes and typos (#3568 )	2018-08-08 09:48:27 -07:00
Nan Zhu	1c08b3b2ea	[jvm-packages] enable predictLeaf/predictContrib/treeLimit in 0.8 (#3532 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * partial finish * no test * add test cases * add test cases * address comments * add test for regressor * fix typo	2018-08-07 14:01:18 -07:00
Philip Hyunsu Cho	246ec92163	Update broken links (#3565 ) Fix #3559 Fix #3562	2018-08-07 05:27:39 -07:00
trivialfis	55caad6e49	Remove redundant FindGTest.cmake. (#3533 ) During removal of FindGTest.cmake, also * Fix gtest include dirs. * Remove some blanks and use PWD for gtest dir.	2018-08-07 10:08:08 +12:00
Henry Gouk	69454d9487	Implementation of hinge loss for binary classification (#3477 )	2018-08-07 10:06:42 +12:00
Philip Hyunsu Cho	44811f2330	Fix #3485 , #3540 : Don't use dropout for predicting test sets (#3556 ) * Fix #3485, #3540: Don't use dropout for predicting test sets Dropout (for DART) should only be used at training time. * Add regression test	2018-08-05 10:17:21 -07:00
Philip Hyunsu Cho	109473dae2	Fix #3545 : XGDMatrixCreateFromCSCEx silently discards empty trailing rows (#3553 ) * Fix #3545: XGDMatrixCreateFromCSCEx silently discards empty trailing rows Description: The bug is triggered when 1. The data matrix has empty rows at the bottom. More precisely, the rows `n-k+1`, `n-k+2`, ..., `n` of the matrix have missing values in all dimensions (`n` number of instances, `k` number of trailing rows) 2. The data matrix is given as Compressed Sparse Column (CSC) format. Diagnosis: When the CSC matrix is converted to Compressed Sparse Row (CSR) format (this is common format used for DMatrix), the trailing empty rows are silently ignored. More specifically, the row pointer (`offset`) of the newly created CSR matrix does not take account of these rows. Fix: Modify the row pointer. * Add regression test	2018-08-05 10:15:42 -07:00
Philip Hyunsu Cho	8c633d1ca3	Fix #3505 : Prevent undefined behavior due to incorrectly sized base_margin (#3555 ) The base margin will need to have length `[num_class] * [number of data points]`. Otherwise, the array holding prediction results will be only partially initialized, causing undefined behavior. Fix: check the length of the base margin. If the length is not correct, use the global bias (`base_score`) instead. Warn the user about the substitution.	2018-08-05 10:14:07 -07:00
Philip Hyunsu Cho	4a429a7c4f	Add reg:tweedie to supported objectives in XGBoost4J-Spark (#3552 )	2018-08-05 07:42:59 -07:00
Philip Hyunsu Cho	7fefd6865d	Fix #3402 : wrong fid crashes distributed algorithm (#3535 ) * Fix #3402: wrong fid crashes distributed algorithm The bug was introduced by the recent DMatrix refactor (#3301). It was partially fixed by #3408 but the example in #3402 was still failing. The example in #3402 will succeed after this fix is applied. * Explicitly specify "this" to prevent compile error * Add regression test * Add distributed test to Travis matrix * Install kubernetes Python package as dependency of dmlc tracker * Add Python dependencies * Add compile step * Reduce size of regression test case * Further reduce size of test	2018-08-04 19:20:04 -07:00
Nan Zhu	31d1baba3d	[jvm-packages] Tutorial of XGBoost4J-Spark (#3534 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * add new * update doc * finish Gang Scheduling * more * intro * Add sections: Prediction, Model persistence and ML pipeline. * Add XGBoost4j-Spark MLlib pipeline example * partial finished version * finish the doc * adjust code * fix the doc * use rst * Convert XGBoost4J-Spark tutorial to reST * Bring XGBoost4J up to date * add note about using hdfs * remove duplicate file * fix descriptions * update doc * Wrap HDFS/S3 export support as a note * update * wrap indexing_mode example in code block	2018-08-03 21:17:50 -07:00
trivialfis	34dc9155ab	Use __CUDA__ macro with __NVCC__. (#3539 ) * __CUDA__ is defined in clang. Making the change won't make clang compile xgboost, but syntax checking from clang is at least partially working.	2018-08-02 22:04:23 +12:00
Philip Hyunsu Cho	70026655b0	Clarify supported OSes for XGBoost4J published JARs (#3547 )	2018-08-01 19:51:44 -07:00
Philip Hyunsu Cho	437b368b1f	Update dmlc-core submodule (#3546 ) This bring many goodies, including: * Ability to specify delimiter and weight_column for CSV files: ```python dtrain = xgboost.DMatrix('train.csv?format=csv&label_column=0&weight_column=1&delimiter= ') ``` * Ability to choose between 0-based and 1-based indexing for LIBSVM/LIBFM files: ```python dtrain = xgboost.DMatrix('train.libsvm?indexing_mode=1') # use 1-based indexing dtest = xgboost.DMatrix('test.libsvm') # use 0-based indexing (default) dtest2 = xgboost.DMatrix('test2.libsvm?indexing_mode=-1') # use heuristic to detect 0-based / 1-based ``` * Fix a bug in float parsing (issue dmlc/dmlc-core#440)	2018-08-01 15:15:40 -07:00
Nan Zhu	6cf97b4eae	[jvm-packages] consider spark.task.cpus when controlling parallelism (#3530 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * consider spark.task.cpus when controlling parallelism * fix bug * fix conf setup * calculate requestedCores within ParallelismController * enforce spark.task.cpus = 1 * unify unit test case framework * enable spark ui	2018-07-31 06:19:45 -07:00
trivialfis	860263f814	Enable building with sanitizers. (#3525 )	2018-07-31 17:25:47 +12:00
Nan Zhu	b546321c83	[jvm-packages] the current version of xgboost does not consider missing value in prediction (#3529 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * consider missing value in prediction * handle single prediction instance * fix type conversion	2018-07-30 14:16:24 -07:00
wenduowang	3b62e75f2e	Fix bug of using list(x) function when x is string (#3432 ) * Fix bug of using list(x) function when x is string list('abcdcba') = ['a', 'b', 'c', 'd', 'c', 'b', 'a'] * Allow feature_names/feature_types to be of any type If feature_names/feature_types is iterable, e.g. tuple, list, then convert the value to list, except for string; otherwise construct a list with a single value * Delete excess whitespace * Fix whitespace to pass lint	2018-07-30 07:36:34 -07:00
jqmp	dd07c25d12	Fix typo in ElasticNet threshold function (#3527 )	2018-07-30 14:08:14 +12:00
Philip Hyunsu Cho	2bb9b9d3db	Fix typo in parameter.rst, gblinear section (#3518 )	2018-07-28 18:58:15 -07:00
Nan Zhu	b5178d3d99	[jvm-packages] a better explanation about the inconsistent issue (#3524 )	2018-07-28 17:34:39 -07:00
hlsc	5850a2558a	fix DMatrix load_row_split bug (#3431 )	2018-07-28 17:21:30 -07:00
trivialfis	8973f2cb0e	Fix building dmlc-core from xgboost. (#3522 ) Move building dmlc-core before adding DMLC_LOG_CUSTOMIZE. Fix #3520.	2018-07-28 10:35:11 -07:00
Uddeshya Singh	3363b9142e	Update faq.rst (#3521 ) Just fixing a minor typo	2018-07-28 10:34:14 -07:00

1 2 3 4 5 ...

3396 Commits