xgboost

Author	SHA1	Message	Date
Andy Adinets	72cd1517d6	Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage. (#3446 ) * Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage. - added distributions to HostDeviceVector - using HostDeviceVector for labels, weights and base margings in MetaInfo - using HostDeviceVector for offset and data in SparsePage - other necessary refactoring * Added const version of HostDeviceVector API calls. - const versions added to calls that can trigger data transfers, e.g. DevicePointer() - updated the code that uses HostDeviceVector - objective functions now accept const HostDeviceVector<bst_float>& for predictions * Updated src/linear/updater_gpu_coordinate.cu. * Added read-only state for HostDeviceVector sync. - this means no copies are performed if both host and devices access the HostDeviceVector read-only * Fixed linter and test errors. - updated the lz4 plugin - added ConstDeviceSpan to HostDeviceVector - using device % dh::NVisibleDevices() for the physical device number, e.g. in calls to cudaSetDevice() * Fixed explicit template instantiation errors for HostDeviceVector. - replaced HostDeviceVector<unsigned int> with HostDeviceVector<int> * Fixed HostDeviceVector tests that require multiple GPUs. - added a mock set device handler; when set, it is called instead of cudaSetDevice()	2018-08-30 14:28:47 +12:00
Andy Adinets	58d783df16	Fixed issue 3605. (#3628 ) * Fixed issue 3605. - https://github.com/dmlc/xgboost/issues/3605 * Fixed the bug in a better way. * Added a test to catch the bug. * Fixed linter errors.	2018-08-28 10:50:52 -07:00
Rory Mitchell	78bea0d204	Add google test for a column sampling, restore metainfo tests (#3637 ) * Add google test for a column sampling, restore metainfo tests * Update metainfo test for visual studio * Fix multi-GPU bug introduced in #3635	2018-08-28 16:10:26 +12:00
gorogm	7ef2b599c7	Link fixed. (#3640 )	2018-08-27 20:25:50 -07:00
Rory Mitchell	686e990ffc	GPU memory usage fixes + column sampling refactor (#3635 ) * Remove thrust copy calls * Fix histogram memory usage * Cap extreme histogram memory usage * More efficient column sampling * Use column sampler across updaters * More efficient split evaluation on GPU with column sampling	2018-08-27 16:26:46 +12:00
trivialfis	60787ecebc	Merge generic device helper functions into gpu set. (#3626 ) * Remove the use of old NDevices* functions. * Use GPUSet in timer.h.	2018-08-26 18:14:23 +12:00
Nan Zhu	3261002099	[jvm-packages] throw ControlThrowable instead of InterruptedException (#3632 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * interrupted exception is not rethrown	2018-08-25 20:30:21 -07:00
Philip Hyunsu Cho	cb4de521c1	Document CUDA requirement, lack of external memory on GPU (#3624 ) * Document fact that GPU doesn't support external memory * Document CUDA requirement	2018-08-22 22:47:10 -07:00
Philip Hyunsu Cho	4ed8a88240	Update Python API doc (#3619 ) * Add XGBRanker to Python API doc * Show inherited members of XGBRegressor in API doc, since XGBRegressor uses default methods from XGBModel * Add table of contents to Python API doc * Skip JVM doc download if not available * Show inherited members for XGBRegressor and XGBRanker * Expose XGBRanker to Python XGBoost module directory * Add docstring to XGBRegressor.predict() and XGBRanker.predict() * Fix rendering errors in Python docstrings * Fix lint	2018-08-22 18:59:30 -07:00
Nan Zhu	4912c1f9c6	[jvm-packages] fix checkpoint save/load (#3614 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix update checkpoint func	2018-08-21 12:34:24 -07:00
Grant W Schneider	57f3c2f252	Remove errant $ (#3618 )	2018-08-21 12:32:38 -07:00
Shiki-H	24a268a2e3	sklearn api for ranking (#3560 ) * added xgbranker * fixed predict method and ranking test * reformatted code in accordance with pep8 * fixed lint error * fixed docstring and added checks on objective * added ranking demo for python * fixed suffix in rank.py	2018-08-21 08:26:48 -07:00
Philip Hyunsu Cho	b13c3a8bcc	Fix #3609 : Removed unused parameter 'use_buffer' (#3610 )	2018-08-21 07:54:15 -07:00
trivialfis	cf2d86a4f6	Add travis sanitizers tests. (#3557 ) * Add travis sanitizers tests. * Add gcc-7 in Travis. * Add SANITIZER_PATH for CMake. * Enable sanitizer tests in Travis. * Fix memory leaks in tests. * Fix all memory leaks reported by Address Sanitizer. * tests/cpp/helpers.h/CreateDMatrix now returns raw pointer.	2018-08-19 16:40:30 +12:00
Philip Hyunsu Cho	983cb0b374	Add option to disable default metric (#3606 )	2018-08-18 11:39:20 -07:00
Grace Lam	993e62b9e7	Add JSON model dump functionality (#3603 ) * Add JSON model dump functionality * Fix lint	2018-08-17 16:18:43 -07:00
Matthew Tovbin	b53a5a262c	[jvm-packages] getTreeLimit return type should be Int	2018-08-17 09:36:00 -07:00
Philip Hyunsu Cho	ac7fc1306b	Fix #3598 : document that custom objective can't contain colon (:) (#3601 )	2018-08-16 19:05:40 -07:00
Grace Lam	caf4a756bf	Add JSON dump functionality documentation (#3600 )	2018-08-16 16:32:04 -07:00
trivialfis	7c82dc92b2	Fix accessing DMatrix.handle before set. (#3599 ) Close #3597.	2018-08-16 15:26:06 -07:00
Jakob Richter	725f4c36f2	replace nround with nrounds to match actual parameter (#3592 )	2018-08-15 11:13:53 -07:00
Nan Zhu	73bd590a1d	[jvm-packages] add the missing scm urls (#3589 ) for some reason this part was missing in master branch????	2018-08-14 15:05:23 -07:00
trivialfis	9265964ee7	Fix ptrdiff_t namespace in Span. (#3588 ) Fix #3587.	2018-08-15 10:04:55 +12:00
trivialfis	2c502784ff	Span class. (#3548 ) * Add basic Span class based on ISO++20. * Use Span<Entry const> instead of Inst in SparsePage. * Add DeviceSpan in HostDeviceVector, use it in regression obj.	2018-08-14 17:58:11 +12:00
Matthew Tovbin	2b7a1c5780	[jvm-packages] Avoid loosing precision when computing probabilities by converting to Double early (#3576 )	2018-08-13 14:05:07 -07:00
Matthew Tovbin	ce0f0568a6	Make sure 'thresholds' are considered when executing predict method (#3577 )	2018-08-13 14:04:47 -07:00
Philip Hyunsu Cho	6288f6d563	Update JVM packages version to 0.81-SNAPSHOT (#3584 )	2018-08-13 10:17:52 -07:00
Philip Hyunsu Cho	96826a3515	Release version 0.80 (#3541 ) * Up versions * Write release note for 0.80 v0.80	2018-08-13 01:38:37 -07:00
Mathew	06ef4db4cc	Fix Spark 2.2 Support (Amending #3062 ) (#3325 ) This pull request amends the broken #3062 allow Spark 2.2 to work. Please note this won't work in Spark <=2.1 as sc.removeSparkListener was implemented in Spark 2.2. (So perhaps a more general method is better, although that is what was attempted in #3062) This PR fixes: #3208, #3151 and the discussion in #1927. I do find it strange that #3062 dose not work in Spark 2.2, it's probably due to some sort of public/private issue in the org.apache.spark.scheduler.LiveListenerBus class inheritance (In Spark itself). The error is: `java.lang.NoSuchMethodError: org.apache.spark.scheduler.LiveListenerBus.removeListener(Ljava/lang/Object;)V`	2018-08-12 18:35:20 -07:00
Rory Mitchell	645996b12f	Remove accidental SparsePage copies (#3583 )	2018-08-12 17:49:38 -07:00
Philip Hyunsu Cho	0b607fb884	Add link to XGBoost4J-Spark tutorial on AWS Yarn tutorial (#3582 )	2018-08-12 07:27:28 -07:00
Philip Hyunsu Cho	4202332783	Clarify multi-GPU training, binary wheels, Pandas integration (#3581 ) * Clarify multi-GPU training, binary wheels, Pandas integration * Add a note about multi-GPU on gpu/index.rst	2018-08-11 19:21:28 -07:00
Matthew Tovbin	7300002516	[jvm-packages] Use treeLimit param in getTreeLimit (#3575 )	2018-08-10 09:38:58 -07:00
Philip Hyunsu Cho	9c647d8130	Bring XGBoost4J Intro up-to-date (#3574 )	2018-08-10 09:08:19 -07:00
Philip Hyunsu Cho	2e7c3a0ed5	Refined logic for locating git branch inside ReadTheDocs (#3573 )	2018-08-09 15:28:12 -07:00
Philip Hyunsu Cho	aa4ee6a0e4	[BLOCKING] Adding JVM doc build to Jenkins CI (#3567 ) * Adding Java/Scala doc build to Jenkins CI * Deploy built doc to S3 bucket * Build doc only for branches * Build doc first, to get doc faster for branch updates * Have ReadTheDocs download doc tarball from S3 * Update JVM doc links * Put doc build commands in a script * Specify Spark 2.3+ requirement for XGBoost4J-Spark * Build GPU wheel without NCCL, to reduce binary size	2018-08-09 13:27:01 -07:00
Matthew Tovbin	bad76048d1	Eliminate use of System.out + proper error logging (#3572 )	2018-08-09 10:06:17 -07:00
Rory Mitchell	bbb771f32e	Refactor parts of fast histogram utilities (#3564 ) * Refactor parts of fast histogram utilities * Removed byte packing from column matrix	2018-08-09 17:59:57 +12:00
Philip Hyunsu Cho	3c72654e3b	Revert "Fix #3485 , #3540 : Don't use dropout for predicting test sets" (#3563 ) * Revert "Fix #3485, #3540: Don't use dropout for predicting test sets (#3556)" This reverts commit `44811f2330`. * Document behavior of predict() for DART booster * Add notice to parameter.rst	2018-08-08 09:48:55 -07:00
Zeno Gantner	e3e776bd58	grammar fixes and typos (#3568 )	2018-08-08 09:48:27 -07:00
Nan Zhu	1c08b3b2ea	[jvm-packages] enable predictLeaf/predictContrib/treeLimit in 0.8 (#3532 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * partial finish * no test * add test cases * add test cases * address comments * add test for regressor * fix typo	2018-08-07 14:01:18 -07:00
Philip Hyunsu Cho	246ec92163	Update broken links (#3565 ) Fix #3559 Fix #3562	2018-08-07 05:27:39 -07:00
trivialfis	55caad6e49	Remove redundant FindGTest.cmake. (#3533 ) During removal of FindGTest.cmake, also * Fix gtest include dirs. * Remove some blanks and use PWD for gtest dir.	2018-08-07 10:08:08 +12:00
Henry Gouk	69454d9487	Implementation of hinge loss for binary classification (#3477 )	2018-08-07 10:06:42 +12:00
Philip Hyunsu Cho	44811f2330	Fix #3485 , #3540 : Don't use dropout for predicting test sets (#3556 ) * Fix #3485, #3540: Don't use dropout for predicting test sets Dropout (for DART) should only be used at training time. * Add regression test	2018-08-05 10:17:21 -07:00
Philip Hyunsu Cho	109473dae2	Fix #3545 : XGDMatrixCreateFromCSCEx silently discards empty trailing rows (#3553 ) * Fix #3545: XGDMatrixCreateFromCSCEx silently discards empty trailing rows Description: The bug is triggered when 1. The data matrix has empty rows at the bottom. More precisely, the rows `n-k+1`, `n-k+2`, ..., `n` of the matrix have missing values in all dimensions (`n` number of instances, `k` number of trailing rows) 2. The data matrix is given as Compressed Sparse Column (CSC) format. Diagnosis: When the CSC matrix is converted to Compressed Sparse Row (CSR) format (this is common format used for DMatrix), the trailing empty rows are silently ignored. More specifically, the row pointer (`offset`) of the newly created CSR matrix does not take account of these rows. Fix: Modify the row pointer. * Add regression test	2018-08-05 10:15:42 -07:00
Philip Hyunsu Cho	8c633d1ca3	Fix #3505 : Prevent undefined behavior due to incorrectly sized base_margin (#3555 ) The base margin will need to have length `[num_class] * [number of data points]`. Otherwise, the array holding prediction results will be only partially initialized, causing undefined behavior. Fix: check the length of the base margin. If the length is not correct, use the global bias (`base_score`) instead. Warn the user about the substitution.	2018-08-05 10:14:07 -07:00
Philip Hyunsu Cho	4a429a7c4f	Add reg:tweedie to supported objectives in XGBoost4J-Spark (#3552 )	2018-08-05 07:42:59 -07:00
Philip Hyunsu Cho	7fefd6865d	Fix #3402 : wrong fid crashes distributed algorithm (#3535 ) * Fix #3402: wrong fid crashes distributed algorithm The bug was introduced by the recent DMatrix refactor (#3301). It was partially fixed by #3408 but the example in #3402 was still failing. The example in #3402 will succeed after this fix is applied. * Explicitly specify "this" to prevent compile error * Add regression test * Add distributed test to Travis matrix * Install kubernetes Python package as dependency of dmlc tracker * Add Python dependencies * Add compile step * Reduce size of regression test case * Further reduce size of test	2018-08-04 19:20:04 -07:00
Nan Zhu	31d1baba3d	[jvm-packages] Tutorial of XGBoost4J-Spark (#3534 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * add new * update doc * finish Gang Scheduling * more * intro * Add sections: Prediction, Model persistence and ML pipeline. * Add XGBoost4j-Spark MLlib pipeline example * partial finished version * finish the doc * adjust code * fix the doc * use rst * Convert XGBoost4J-Spark tutorial to reST * Bring XGBoost4J up to date * add note about using hdfs * remove duplicate file * fix descriptions * update doc * Wrap HDFS/S3 export support as a note * update * wrap indexing_mode example in code block	2018-08-03 21:17:50 -07:00

1 2 3 4 5 ...

3409 Commits