xgboost

Author	SHA1	Message	Date
fuhaoda	dd60fc23e6	Simplify INI-style config reader using C++11 STL (#4478 ) * simplify the config.h file * revise config.h * revised config.h * revise format * revise format issues * revise whitespace issues * revise whitespace namespace format issues * revise namespace format issues * format issues * format issues * format issues * format issues * Revert submodule changes * minor change * Update src/common/config.h Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * address format issue from trivialfis * Use correct cub submodule	2019-05-30 11:57:56 -07:00
Jiaming Yuan	b48f895027	Fix prediction from loaded pickle. (#4516 )	2019-05-30 15:05:09 +08:00
sriramch	fed665ae8a	- training with external memory part 1 of 2 (#4486 ) * - training with external memory part 1 of 2 - this pr focuses on computing the quantiles using multiple gpus on a dataset that uses the external cache capabilities - there will a follow-up pr soon after this that will support creation of histogram indices on large dataset as well - both of these changes are required to support training with external memory - the sparse pages in dmatrix are taken in batches and the the cut matrices are incrementally built - also snuck in some (perf) changes related to sketches aggregation amongst multiple features across multiple sparse page batches. instead of aggregating the summary inside each device and merged later, it is aggregated in-place when the device is working on different rows but the same feature	2019-05-30 08:18:34 +12:00
sriramch	6e16900711	Fix crash with approx tree method on cpu (#4510 )	2019-05-30 01:11:29 +08:00
Jiaming Yuan	c589eff941	De-duplicate GPU parameters. (#4454 ) * Only define `gpu_id` and `n_gpus` in `LearnerTrainParam` * Pass LearnerTrainParam through XGBoost vid factory method. * Disable all GPU usage when GPU related parameters are not specified (fixes XGBoost choosing GPU over aggressively). * Test learner train param io. * Fix gpu pickling.	2019-05-29 11:55:57 +08:00
sriramch	a3fedbeaa8	- fix issues with training with external memory on cpu (#4487 ) * - fix issues with training with external memory on cpu - use the batch size to determine the correct number of rows in a batch - use the right number of threads in omp parallalization if the batch size is less than the default omp max threads (applicable for the last batch) * - handle scenarios where last batch size is < available number of threads - augment tests such that we can test all scenarios (batch size <, >, = number of threads)	2019-05-29 12:31:30 +12:00
Rory Mitchell	972f693eaf	Fix dask API sphinx docstrings (#4507 ) * Fix dask API sphinx docstrings * Update GPU docs page	2019-05-28 16:39:26 +12:00
yellowdolphin	3f7e5d9c47	add dll_path for cygwin users (#4499 )	2019-05-27 12:04:28 +08:00
Rory Mitchell	09b90d9329	Add native support for Dask (#4473 ) * Add native support for Dask * Add multi-GPU demo * Add sklearn example	2019-05-27 13:29:28 +12:00
Jiaming Yuan	55e645c5f5	Revert hist init optimization. (#4502 )	2019-05-26 08:57:41 +08:00
Rory Mitchell	8ddd2715ee	Add python RF documentation (#4500 )	2019-05-24 23:30:24 -07:00
Jiaming Yuan	0ce300e73a	[jvm-packages] Add back reg:linear for scala. (#4490 ) * Add back reg:linear for scala. * Fix linter.	2019-05-23 15:02:08 -07:00
Bryan Woods	278562db13	Add support for cross-validation using query ID (#4474 ) * adding support for matrix slicing with query ID for cross-validation * hail mary test of unrar installation for windows tests * trying to modify tests to run in Github CI * Remove dependency on wget and unrar * Save error log from R test * Relax assertion in test_training * Use int instead of bool in C function interface * Revise R interface * Add XGDMatrixSliceDMatrixEx and keep old XGDMatrixSliceDMatrix for API compatibility	2019-05-23 10:45:02 -07:00
Sean Owen	5a567ec249	Ensure pandas DataFrame column names are treated as strings in type error message (#4481 )	2019-05-21 16:19:35 +08:00
Philip Hyunsu Cho	515f5f5c47	[RFC] Version 0.90 release candidate (#4475 ) * Release 0.90 * Add script to automatically generate acknowledgment * Update NEWS.md v0.90	2019-05-20 01:02:44 -07:00
Nan Zhu	adcd8ea7c6	Update xgboost4j_spark_tutorial.rst (#4476 )	2019-05-17 04:17:57 +00:00
Philip Hyunsu Cho	cf2400036e	[CI] Add Python and C++ tests for Windows GPU target (#4469 ) * Add CMake option to use bundled gtest from dmlc-core, so that it is easy to build XGBoost with gtest on Windows * Consistently apply OpenMP flag to all targets. Force enable OpenMP when USE_CUDA is turned on. * Insert vcomp140.dll into Windows wheels * Add C++ and Python tests for CPU and GPU targets (CUDA 9.0, 10.0, 10.1) * Prevent spurious msbuild failure * Add GPU tests * Upgrade dmlc-core	2019-05-16 01:06:46 +00:00
ras44	3e930e4f2d	added JSON vignette (#4439 )	2019-05-15 01:35:44 +00:00
Rong Ou	a9ec2dd295	only copy the model once when predicting multiple batches (#4457 )	2019-05-15 11:04:22 +12:00
Rong Ou	df2cdaca50	add cuda 10.1 support (#4468 )	2019-05-14 18:30:58 +00:00
Philip Hyunsu Cho	c6f2a7e186	[CI] Add Windows GPU to Jenkins CI pipeline (#4463 ) * Fix #4462: Use /MT flag consistently for MSVC target * First attempt at Windows CI * Distinguish stages in Linux and Windows pipelines * Try running CMake in Windows pipeline * Add build step	2019-05-14 04:45:06 +00:00
Philip Hyunsu Cho	e7d17ec4f4	[CI] Build XGBoost wheels with CUDA 9.0 (#4459 ) * [CI] Build XGBoost wheels with CUDA 9.0 * Do not call archiveArtifacts for 8.0 wheel	2019-05-11 16:35:02 -07:00
Philip Hyunsu Cho	b5f7cbfadf	[CI] Cache two R build Docker containers (#4458 )	2019-05-11 10:54:00 -07:00
Rong Ou	be0f346ec9	mgpu predictor using explicit offsets (#4438 ) * mgpu prediction using explicit sharding	2019-05-11 09:35:06 +12:00
Rory Mitchell	d16d9a9988	Correctly determine cuda version (#4453 )	2019-05-10 19:46:57 +12:00
Philip Hyunsu Cho	6ff994126a	[BLOCKING][CI] Upgrade to Spark 2.4.3 (#4414 ) * [CI] Upgrade to Spark 2.4.2 * Pass Spark version to build script * Allow multiple --build-arg in ci_build.sh * Fix syntax * Fix container name * Update pom.xml * Fix container name * Update Jenkinsfile * Update pom.xml * Update Dockerfile.jvm_cross	2019-05-09 21:36:59 -07:00
Shaochen Shi	18e4fc3690	[jvm-packages] Automatically set maximize_evaluation_metrics if not explicitly given in XGBoost4J-Spark (#4446 ) * Automatically set maximize_evaluation_metrics if not explicitly given. * When custom_eval is set, require maximize_evaluation_metrics. * Update documents on early stop in XGBoost4J-Spark. * Fix code error.	2019-05-09 12:49:44 -07:00
Jiaming Yuan	8da4907e89	Enable building with shared NCCL. (#4447 ) * Add `BUILD_WITH_SHARED_NCCL` to CMake.	2019-05-09 23:19:58 +08:00
Philip Hyunsu Cho	ade3f30237	Fix list formatting in missing value tutorial in XGBoost4J-Spark	2019-05-06 14:24:02 -07:00
Philip Hyunsu Cho	b511638ca1	Fix list formatting in missing value tutorial in XGBoost4J-Spark	2019-05-06 14:21:49 -07:00
Daniel Hen	eabcc0e210	[jvm-packages] Tutorial on handling missing values (#4425 ) Add tutorial on missing values and how to handle those within XGBoost.	2019-05-06 13:57:18 -07:00
Jiaming Yuan	5de7e12704	Change obj name to `reg:squarederror` in learner. (#4427 ) * Change memory dump size in R test.	2019-05-06 21:35:35 +08:00
Xin Yin	8d1098a983	In AUC and AUCPR metrics, detect whether weights are per-instance or per-group (#4216 ) * In AUC and AUCPR metrics, detect whether weights are per-instance or per-group * Fix C++ style check * Add a test for weighted AUC	2019-05-04 00:53:04 -07:00
Philip Hyunsu Cho	9252b686ae	Make AUCPR work with multiple query groups (#4436 ) * Make AUCPR work with multiple query groups * Check AUCPR <= 1.0 in distributed setting	2019-05-03 10:34:44 -07:00
ras44	2be85fc62a	max_digits10 guarantees float decimal roundtrip (#4435 ) 2 additional digits are not needed to guarantee that casting the decimal representation will result in the same float, see https://github.com/dmlc/xgboost/issues/3980#issuecomment-458702440	2019-05-02 20:01:26 -07:00
Rong Ou	feb6ae3e18	Initial support for external memory in gpu_predictor (#4284 )	2019-05-03 13:01:27 +12:00
ras44	54980b8959	Fix typo in xgboost_R.h (#4432 )	2019-05-02 19:18:34 +08:00
Philip Hyunsu Cho	c1e4a0f2c6	Upgrade dmlc-core (#4430 )	2019-05-02 18:38:22 +08:00
Philip Hyunsu Cho	bfddc2c42c	Make CMakeLists.txt compatible with CMake 3.3 (#4420 ) * Make CMakeLists.txt compatible with CMake 3.3; require CMake 3.11 for MSVC * Use CMake 3.12 when sanitizer is enabled * Disable funroll-loops for MSVC * Use cmake version in container name * Add missing arg * Fix egrep use in ci_build.sh * Display CMake version * Do not set OpenMP_CXX_LIBRARIES for MSVC * Use cmake_minimum_required()	2019-05-02 11:49:32 +08:00
Philip Hyunsu Cho	17df5fd296	Simplify bound checking in feature interaction constraints (#4428 )	2019-05-01 16:59:53 -07:00
Xu Xiao	4c74336384	Use feature interaction constraints to narrow search space for split candidates (#4341 ) * Use feature interaction constraints to narrow search space for split candidates. * fix clang-tidy broken at updater_quantile_hist.cc:535:3 * make const * fix * try to fix exception thrown in java_test * fix suspected mistake which cause EvaluateSplit error * try fix * Fix bug: feature ID and node ID swapped in argument * Rename CheckValidation() to CheckFeatureConstraint() for clarity * Do not create temporary vector validFeatures, to enable parallelism	2019-04-30 20:59:58 -07:00
Philip Hyunsu Cho	ba98e0cdf2	Add additional Python tests to test training under constraints (#4426 )	2019-04-30 18:23:39 -07:00
Rong Ou	eaab364a63	More explict sharding methods for device memory (#4396 ) * Rename the Reshard method to Shard * Add a new Reshard method for sharding a vector that's already sharded	2019-05-01 11:47:22 +12:00
Xu Xiao	797ba8e72d	[jvm-packages] fix compatibility problem of spark version (#4411 ) * fix compatibility problem of spark version on MissingValueHandlingSuite.scala * call setHandleInvalid by runtime reflection	2019-04-30 09:13:05 -07:00
Nan Zhu	253fdd8a42	[jvm-packages] fix the split of input (#4417 )	2019-04-29 18:52:40 -07:00
tqchen	91c513a0c1	fix doc	2019-04-29 17:50:46 -07:00
Rory Mitchell	5e582b0fa7	Combine thread launches into single launch per tree for gpu_hist (#4343 ) * Combine thread launches into single launch per tree for gpu_hist algorithm. * Address deprecation warning * Add manual column sampler constructor * Turn off omp dynamic to get a guaranteed number of threads * Enable openmp in cuda code	2019-04-29 09:58:34 +12:00
Ravi Kalia	146e83f3b3	Fix typo in model.rst (#4393 )	2019-04-27 14:22:07 -07:00
Bozhao	5dfb27fb2d	Update demo readme's use case section with BentoML (#4400 )	2019-04-27 14:21:17 -07:00
Jiaming Yuan	77c03538b0	Fix node reuse. (#4404 ) * Reinitialize `_sindex` when reallocating a deleted node.	2019-04-27 13:03:23 +08:00

1 2 3 4 5 ...

3748 Commits