xgboost

Author	SHA1	Message	Date
PSEUDOTENSOR / Jonathan McKinney	ca7fc9fda3	[GPU-Plugin] Fix gpu_hist to allow matrices with more than just 2^{32} elements. Also fixed CPU hist algorithm. (#2518 )	2017-07-18 11:19:27 +12:00
Rory Mitchell	c85bf9859e	[GPU-Plugin] Improved load balancing search (#2521 )	2017-07-17 11:50:57 +12:00
Michal Malohlava	33ee7d1615	[BUILD] Dockerfile and Jenkinsfile revisited (#2514 ) Includes: - Dockerfile changes - Dockerfile clean up - Fix execution privileges of files used from Dockerfile. - New Dockerfile entrypoint to replace with_user script - Defined a placeholders for CPU testing (script and Dockerfile) - Jenkinsfile - Jenkins file milestone defined - Single source code checkout and propagation via stash/unstash - Bash needs to be explicitly used in launching make build, since we need access to environment - Jenkinsfile build factory for cmake and make style of jobs - Archivation of artifacts (.so, .whl, *.egg) produced by cmake build Missing: - CPU testing - Python3 env build and testing	2017-07-13 17:51:47 +12:00
Sergei Lebedev	66874f5777	[jvm-packages] Deduplicated train/test data access in tests (#2507 ) * [jvm-packages] Deduplicated train/test data access in tests All datasets are now available via a unified API, e.g. Agaricus.test. The only exception is the dermatology data which requires parsing a CSV file. * Inlined Utils.buildTrainingRDD The default number of partitions for local mode is equal to the number of available CPUs. * Replaced dataset names with problem types	2017-07-12 09:13:55 -07:00
Rory Mitchell	530f01e21c	[GPU-Plugin] Add load balancing search to gpu_hist. Add compressed iterator. (#2504 )	2017-07-11 22:36:39 +12:00
Philip Cho	64c8f6fa6d	Use old parallel algorithm for histogram construction by default (#2501 ) It has been reported that new parallel algorithm (#2493) results in excessive message usage (see issue #2326). Until issues are resolved, XGBoost should use the old parallel algorithm by default. The user would have to specify `enable_feature_grouping=1` manually to enable the new algorithm.	2017-07-10 09:35:48 -07:00
Jeff Macaluso	be1f76a06a	Fixed Spacing (#2498 ) Fixed spacing under "Model Complexity" section	2017-07-08 09:17:45 -07:00
Vadim Khotilovich	7350085955	Fix broken make on windows (#2499 ) * fix Makefile for make on windows * clean up compilation warnings * fix for `no file name for include` make warning	2017-07-08 09:17:31 -07:00
Philip Cho	ba820847f9	Patch to improve multithreaded performance scaling (#2493 ) * Patch to improve multithreaded performance scaling Change parallel strategy for histogram construction. Instead of partitioning data rows among multiple threads, partition feature columns instead. Useful heuristics for assigning partitions have been adopted from LightGBM project. * Add missing header to satisfy MSVC * Restore max_bin and related parameters to TrainParam * Fix lint error * inline functions do not require static keyword * Feature grouping algorithm accepting FastHistParam Feature grouping algorithm accepts many parameters (3+), and it gets annoying to pass them one by one. Instead, simply pass the reference to FastHistParam. The definition of FastHistParam has been moved to a separate header file to accomodate this change.	2017-07-07 08:25:07 -07:00
Rory Mitchell	6bfc472bec	Update nccl (#2494 )	2017-07-07 12:36:26 +12:00
Qiang Kou (KK)	e7530bdffc	Not use -msse2 on power or arm arch. close #2446 (#2475 )	2017-07-06 20:06:55 -04:00
69guitar1015	9091493250	Update bosch.py (#2482 ) - fix deprecated expression on StratifiedKFold - use range instead of xrange	2017-07-06 20:05:09 -04:00
Rory Mitchell	e939192978	Cmake improvements (#2487 ) * Cmake improvements * Add google test to cmake	2017-07-06 18:05:11 +12:00
Sergei Lebedev	8ceeb32bad	Fixed a signature of XGBoostModel.predict (#2476 ) Prior to this commit XGBoostModel.predict produced an RDD with an array of predictions for each partition, effectively changing the shape wrt the input RDD. A more natural contract for prediction API is that given an RDD it returns a new RDD with the same number of elements. This allows the users to easily match inputs with predictions. This commit removes one layer of nesting in XGBoostModel.predict output. Even though the change is clearly non-backward compatible, I still think it is well justified. See discussion in 06bd5dca for motivation.	2017-07-02 21:42:46 -07:00
Rory Mitchell	ed8bc4521e	[GPU-Plugin] Resolve double compilation issue (#2479 )	2017-07-03 13:29:10 +12:00
Rory Mitchell	5f1b0bb386	[GPU-Plugin] Unify gpu_gpair/bst_gpair. Refactor. (#2477 )	2017-07-01 17:31:13 +12:00
Sergei Lebedev	d535340459	[jvm-packages] Exposed baseMargin (#2450 ) * Disabled excessive Spark logging in tests * Fixed a singature of XGBoostModel.predict Prior to this commit XGBoostModel.predict produced an RDD with an array of predictions for each partition, effectively changing the shape wrt the input RDD. A more natural contract for prediction API is that given an RDD it returns a new RDD with the same number of elements. This allows the users to easily match inputs with predictions. This commit removes one layer of nesting in XGBoostModel.predict output. Even though the change is clearly non-backward compatible, I still think it is well justified. * Removed boxing in XGBoost.fromDenseToSparseLabeledPoints * Inlined XGBoost.repartitionData An if is more explicit than an opaque method name. * Moved XGBoost.convertBoosterToXGBoostModel to XGBoostModel * Check the input dimension in DMatrix.setBaseMargin Prior to this commit providing an array of incorrect dimensions would have resulted in memory corruption. Maybe backport this to C++? * Reduced nesting in XGBoost.buildDistributedBoosters * Ensured consistent naming of the params map * Cleaned up DataBatch to make it easier to comprehend * Made scalastyle happy * Added baseMargin to XGBoost.train and trainWithRDD * Deprecated XGBoost.train It is ambiguous and work only for RDDs. * Addressed review comments * Revert "Fixed a singature of XGBoostModel.predict" This reverts commit 06bd5dcae7780265dd57e93ed7d4135f4e78f9b4. * Addressed more review comments * Fixed NullPointerException in buildDistributedBoosters	2017-06-30 08:27:24 -07:00
PSEUDOTENSOR / Jonathan McKinney	6b287177c8	[GPU-Plugin] Multi-GPU gpu_id bug fixes for grow_gpu_hist and grow_gpu methods, and additional documentation for the gpu plugin. (#2463 )	2017-06-30 20:04:17 +12:00
Yaguang	91dae84a00	Update URL for "Multiclass logloss". (#2469 ) The original URL shows 404 Error.	2017-06-30 08:06:09 +02:00
Rory Mitchell	48f3003302	[GPU-Plugin] Change GPU plugin to use tree_method parameter, bump cmake version to 3.5 for GPU plugin, add compute architecture 3.5, remove unused cmake files (#2455 )	2017-06-29 16:19:45 +12:00
Sergei Lebedev	88488fdbb9	Fixed shared library loading in the Python package (#2461 ) * Fixed DLL name on Windows in ``xgboost.libpath`` * Added support for OS X to ``xgboost.libpath`` * Use .dylib for shared library on OS X This does not affect the JNI library, because it is not trully cross-platform in the Makefile-build anyway.	2017-06-29 11:50:50 +12:00
Edi Bice	2911597f3d	[jvm-packages] Expose prediction feature contribution on the Java side (#2441 ) * Exposed prediction feature contribution on the Java side * was not supplying the newly added argument * Exposed from Scala-side as well * formatting (keep declaration in one line unless exceeding 100 chars)	2017-06-28 13:34:51 -07:00
Sergei Lebedev	d01a31088b	[jvm-packages] Test xgboost4j on Windows (#2451 )	2017-06-26 11:19:18 -07:00
Zex Li	9bcbaa8869	Add build failure message (#2397 ) * Add build failure message * quit on error	2017-06-25 22:32:11 -04:00
Ryuichi Yamamoto	70ba492eb7	doc: Fix broken links in contribute.md (#2435 )	2017-06-25 22:31:14 -04:00
Sergei Lebedev	91e778c6db	[jvm-packages] JNI Cosmetics (#2448 ) * [jvm-packages] Ensure the native library is loaded once Previously any class using XGBoostJNI queried NativeLibLoader to make sure the native library is loaded. This commit moves the initXGBoost call to XGBoostJNI, effectively delegating the initialization to the class loader. Note also, that now XGBoostJNI would NOT suppress an IOException if it occured in initXGBoost. * [jvm-packages] Fused JNIErrorHandle with XGBoostJNI There was no reason for having a separate class.	2017-06-23 11:49:30 -07:00
Rory Mitchell	0e48f87529	[GPU-Plugin] Make node_idx type 32 bit for hist algo. Set default n_gpus to 1. (#2445 )	2017-06-23 18:26:45 +12:00
ebernhardson	169c983b5f	[jvm-packages] Release dmatrix when no longer needed (#2436 ) When using xgboost4j-spark I had executors getting killed much more often than i would expect by yarn for overrunning their memory limits, based on the memoryOverhead provided. It looks like a significant amount of this is because dmatrix's were being created but not released, because they were only released when the GC decided it was time to cleanup the references. Rather than waiting for the GC, relesae the DMatrix's when we know they are no longer necessary.	2017-06-22 09:20:55 -07:00
Rory Mitchell	1899f9e744	[GPU-Plugin] Add basic continuous integration for GPU plugin. (#2431 )	2017-06-22 10:15:28 -04:00
Sergei Lebedev	2cb51f7097	[jvm-packages] Another pack of build/CI improvements (#2422 ) * [jvm-packages] Fixed compilation on Windows * [jvm-packages] Build the JNI bindings on Appveyor * [jvm-packages] Build & test on OS X * [jvm-packages] Re-applied the CMake build changes reverted by #2395 * Fixed Appveyor JVM build * Muted Maven on Travis * Don't link with libawt * "linux2"->"linux" Python2.x and 3.X use slightly different values for ``sys.platform``.	2017-06-21 12:28:35 -07:00
Alfredo Cambera	46b9889cc5	Update build_trouble_shooting.md (#2430 ) I had to fight with my linux box for a day to find the solution to the problem. I hope than this may help other users to save some time.	2017-06-20 21:36:10 -07:00
Pierre PACI	ee3d680e89	Fix Typo in documentation (#2416 ) The objective section was missing a space and thus all the bullet were are the same level.	2017-06-17 09:22:59 -07:00
Bernie Gray	cd7659937b	[R] many minor changes to increase the robustness of the R code (#2404 ) * many minor changes to increase robustness of R code * fixing which mistake in xgb.model.dt.tree.R and a few cosmetics	2017-06-15 22:56:23 -05:00
Sergei Lebedev	0db37c05bd	[jvm-packages] Deterministically XGBoost training on exception (#2405 ) Previously the code relied on the tracker process being terminated by the OS, which was not the case on Windows. Closes #2394	2017-06-12 20:19:28 -07:00
Thejaswi	34dfe2f6de	[GPU-Plugin] Support for building to specific GPU architectures (#2390 ) * Support for builing gpu-plugins to specific GPU architectures 1. Option GPU_COMPUTE_VER exposed from both Makefile and CMakeLists.txt 2. updater_gpu documentation updated accordingly * Re-introduced GPU_COMPUTE_VER option in the cmake flow. This seems to fix the compile-time, rdc=true and copy-constructor related errors seen and discussed in PR #2390.	2017-06-13 09:51:38 +12:00
wxchan	65d2513714	[python-package] fix sklearn n_jobs/nthreads and seed/random_state bug (#2378 ) * add a testcase causing RuntimeError * move seed/random_state/nthread/n_jobs check to get_xgb_params() * fix failed test	2017-06-12 09:33:42 -04:00
PSEUDOTENSOR / Jonathan McKinney	41efe32aa5	[GPU-Plugin] Multi-GPU for grow_gpu_hist histogram method using NVIDIA NCCL. (#2395 )	2017-06-12 05:06:08 +12:00
Nan Zhu	e24f25e0c6	add Qubole example (#2401 )	2017-06-09 20:33:26 -07:00
Sergei Lebedev	3820ab6a0b	[jvm-packages] Minor improvements to the CMake build (#2379 ) * [jvm-packages] Fixed JNI_OnLoad overload It does not compile on Windows without proper export flags. * [jvm-packages] Use JNI types directly where appropriate * Removed lib hack from CMake build Prior to this commit the CMake build use hardcoded lib prefix for libxgboost and libxgboost4j. Unfortunatelly this did not play well with Windows, which does not use the lib- prefix.	2017-06-09 08:25:09 -07:00
Sergei Lebedev	37c27ab8e8	[jvm-packages] Replaced create_jni.{bat,sh} with a Python version (#2383 ) * [jvm-packages] Replaced create_jni.{bat,sh} with a Python version This allows to have a single script for all platforms. * [jvm-packages] Added all configuration options to create_jni.py	2017-06-07 21:55:45 -07:00
Vadim Khotilovich	c82276386d	[R] xgb.importance: fix for multiclass gblinear, new 'trees' parameter (#2388 )	2017-06-07 13:13:21 -05:00
Xiaoguang Sun	2ae56ca84f	Use int32_t explicitly when serializing version (#2389 ) Use int32_t explicitly when serializing version field of dmatrix in binary format. On ILP64 architectures, although very little, size of int is 64 bits.	2017-06-07 10:03:42 -07:00
Thejaswi	85b2fb3eee	[GPU-Plugin] Integration of a faster version of grow_gpu plugin into mainstream (#2360 ) * Integrating a faster version of grow_gpu plugin 1. Removed the older files to reduce duplication 2. Moved all of the grow_gpu files under 'exact' folder 3. All of them are inside 'exact' namespace to avoid any conflicts 4. Fixed a bug in benchmark.py while running only 'grow_gpu' plugin 5. Added cub and googletest submodules to ease integration and unit-testing 6. Updates to CMakeLists.txt to directly build cuda objects into libxgboost * Added support for building gpu plugins through make flow 1. updated makefile and config.mk to add right targets 2. added unit-tests for gpu exact plugin code * 1. Added support for building gpu plugin using 'make' flow as well 2. Updated instructions for building and testing gpu plugin * Fix travis-ci errors for PR#2360 1. lint errors on unit-tests 2. removed googletest, instead depended upon dmlc-core provide gtest cache * Some more fixes to travis-ci lint failures PR#2360 * Added Rory's copyrights to the files containing code from both. * updated copyright statement as per Rory's request * moved the static datasets into a script to generate them at runtime * 1. memory usage print when silent=0 2. tests/ and test/ folder organization 3. removal of the dependency of googletest for just building xgboost 4. coding style updates for .cuh as well * Fixes for compilation warnings * add cuda object files as well when JVM_BINDINGS=ON	2017-06-06 09:39:53 +12:00
Sergei Lebedev	2d9052bc7d	libxgboost4j is now part of the CMake build (#2373 ) * [jvm-packages] Added libxgboost4j to CMake build * [jvm-packages] Wired CMake build into create_jni.sh * User newer CMake version on Travis * Lowered CMake version constraints * Fixed various quirks in the new CMake build	2017-06-03 17:14:51 -07:00
Jakub Zakrzewski	ed6384ecbf	[Python] Use appropriate integer types when calling native code. (#2361 ) Don't use implicit conversions to c_int, which incidentally happen to work on (some) 64-bit platforms, but: * may lead to truncation of the input value to a 32-bit signed int, * cause segfaults on some 32-bit architectures (tested on Ubuntu ARM, but is also the likely cause of issue #1707). Also, when passing references use explicit 64-bit integers, where needed, instead of c_ulong, which is not guaranteed to be this large.	2017-06-02 10:16:54 -07:00
Artem Krylysov	ed8da45f9d	Fix C API header compatibility with C compilers (#2369 )	2017-06-02 10:14:30 -07:00
Sergei Lebedev	97abfc487a	[jvm-packages] Fixed checkstyle excludes on Windows (#2370 ) XGBoostJNI.java was not excluded on Windows, probably because the path specified in 'checkstyle-suppressions.xml' used UNIX file separators.	2017-06-02 10:14:13 -07:00
Michaël Benesty	8e2a1ff2bf	Improve setinfo documentation on R package (#2357 )	2017-05-30 20:08:31 +02:00
Sergei Lebedev	433269c335	Minor improvements to xgboost/jvm-packages build (#2356 ) * Specified 'exec-maven-plugin' version * Changed 'create_jni.sh' to fail on error and also report each of the executed commands, which makes it easier to debug.	2017-05-30 17:51:27 +02:00
davidt0x	b29b7d1d76	Fixed loop bound in create.new.tree.features (#2328 ) for loop in create.new.tree.features was referencing length(trees) as the upper bound of the loop. trees is a base R dataset and not the model that the code is generating. Changed loop boundary to model$niter which should be the number of trees.	2017-05-30 17:50:33 +02:00

... 2 3 4 5 6 ...

3247 Commits