* [WIP] Add lower and upper bounds on the label for survival analysis
* Update test MetaInfo.SaveLoadBinary to account for extra two fields
* Don't clear qids_ for version 2 of MetaInfo
* Add SetInfo() and GetInfo() method for lower and upper bounds
* changes to aft
* Add parameter class for AFT; use enum's to represent distribution and event type
* Add AFT metric
* changes to neg grad to grad
* changes to binomial loss
* changes to overflow
* changes to eps
* changes to code refactoring
* changes to code refactoring
* changes to code refactoring
* Re-factor survival analysis
* Remove aft namespace
* Move function bodies out of AFTNormal and AFTLogistic, to reduce clutter
* Move function bodies out of AFTLoss, to reduce clutter
* Use smart pointer to store AFTDistribution and AFTLoss
* Rename AFTNoiseDistribution enum to AFTDistributionType for clarity
The enum class was not a distribution itself but a distribution type
* Add AFTDistribution::Create() method for convenience
* changes to extreme distribution
* changes to extreme distribution
* changes to extreme
* changes to extreme distribution
* changes to left censored
* deleted cout
* changes to x,mu and sd and code refactoring
* changes to print
* changes to hessian formula in censored and uncensored
* changes to variable names and pow
* changes to Logistic Pdf
* changes to parameter
* Expose lower and upper bound labels to R package
* Use example weights; normalize log likelihood metric
* changes to CHECK
* changes to logistic hessian to standard formula
* changes to logistic formula
* Comply with coding style guideline
* Revert back Rabit submodule
* Revert dmlc-core submodule
* Comply with coding style guideline (clang-tidy)
* Fix an error in AFTLoss::Gradient()
* Add missing files to amalgamation
* Address @RAMitchell's comment: minimize future change in MetaInfo interface
* Fix lint
* Fix compilation error on 32-bit target, when size_t == bst_uint
* Allocate sufficient memory to hold extra label info
* Use OpenMP to speed up
* Fix compilation on Windows
* Address reviewer's feedback
* Add unit tests for probability distributions
* Make Metric subclass of Configurable
* Address reviewer's feedback: Configure() AFT metric
* Add a dummy test for AFT metric configuration
* Complete AFT configuration test; remove debugging print
* Rename AFT parameters
* Clarify test comment
* Add a dummy test for AFT loss for uncensored case
* Fix a bug in AFT loss for uncensored labels
* Complete unit test for AFT loss metric
* Simplify unit tests for AFT metric
* Add unit test to verify aggregate output from AFT metric
* Use EXPECT_* instead of ASSERT_*, so that we run all unit tests
* Use aft_loss_param when serializing AFTObj
This is to be consistent with AFT metric
* Add unit tests for AFT Objective
* Fix OpenMP bug; clarify semantics for shared variables used in OpenMP loops
* Add comments
* Remove AFT prefix from probability distribution; put probability distribution in separate source file
* Add comments
* Define kPI and kEulerMascheroni in probability_distribution.h
* Add probability_distribution.cc to amalgamation
* Remove unnecessary diff
* Address reviewer's feedback: define variables where they're used
* Eliminate all INFs and NANs from AFT loss and gradient
* Add demo
* Add tutorial
* Fix lint
* Use 'survival:aft' to be consistent with 'survival:cox'
* Move sample data to demo/data
* Add visual demo with 1D toy data
* Add Python tests
Co-authored-by: Philip Cho <chohyu01@cs.washington.edu>
* Simplify DropTrees calling logic
* Add `training` parameter for prediction method.
* [Breaking]: Add `training` to C API.
* Change for R and Python custom objective.
* Correct comment.
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
* Fix syncing DMatrix columns.
* notes for tree method.
* Enable feature validation for all interfaces except for jvm.
* Better tests for boosting from predictions.
* Disable validation on JVM.
- Install wget explicitly to match openssl.
- Install CMake explicitly.
- Use newer miniconda link.
- Reenable unittests.
- gcc@9 + xcode@10 for osx due to missing <_stdio.h>. Other versions of gcc should also work. But as homebrew pour gcc@9 after update by default, so I just stick with latest version.
- Disabled one external memory test for OSX. Not sure about the thread implementation in there and fixing external memory is beyond the scope of this PR.
- Use Python3 with conda in jvm package.
* Extract interaction constraints from split evaluator.
The reason for doing so is mostly for model IO, where num_feature and interaction_constraints are copied in split evaluator. Also interaction constraint by itself is a feature selector, acting like column sampler and it's inefficient to bury it deep in the evaluator chain. Lastly removing one another copied parameter is a win.
* Enable inc for approx tree method.
As now the implementation is spited up from evaluator class, it's also enabled for approx method.
* Removing obsoleted code in colmaker.
They are never documented nor actually used in real world. Also there isn't a single test for those code blocks.
* Unifying the types used for row and column.
As the size of input dataset is marching to billion, incorrect use of int is subject to overflow, also singed integer overflow is undefined behaviour. This PR starts the procedure for unifying used index type to unsigned integers. There's optimization that can utilize this undefined behaviour, but after some testings I don't see the optimization is beneficial to XGBoost.
* provide the readme
* update for format
* reformat
* reformat -2
* update again
* update format
* update w.r.t yinlou's comments
* Add kubernetes tutorial to Table of Contents
* Style edit
* add interaction constraints
* enable both interaction and monotonic constraints at the same time
* fix lint
* add R test, fix lint, update demo
* Use dmlc::JSONReader to express interaction constraints as nested lists; Use sparse arrays for bookkeeping
* Add Python test for interaction constraints
* make R interaction constraints parameter based on feature index instead of column names, fix R coding style
* Fix lint
* Add BlueTea88 to CONTRIBUTORS.md
* Short circuit when no constraint is specified; address review comments
* Add tutorial for feature interaction constraints
* allow interaction constraints to be passed as string, remove redundant column_names argument
* Fix typo
* Address review comments
* Add comments to Python test
* Revert "Fix #3485, #3540: Don't use dropout for predicting test sets (#3556)"
This reverts commit 44811f233071c5805d70c287abd22b155b732727.
* Document behavior of predict() for DART booster
* Add notice to parameter.rst
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* add new
* update doc
* finish Gang Scheduling
* more
* intro
* Add sections: Prediction, Model persistence and ML pipeline.
* Add XGBoost4j-Spark MLlib pipeline example
* partial finished version
* finish the doc
* adjust code
* fix the doc
* use rst
* Convert XGBoost4J-Spark tutorial to reST
* Bring XGBoost4J up to date
* add note about using hdfs
* remove duplicate file
* fix descriptions
* update doc
* Wrap HDFS/S3 export support as a note
* update
* wrap indexing_mode example in code block
* Change doc build to reST exclusively
* Rewrite Intro doc in reST; create toctree
* Update parameter and contribute
* Convert tutorials to reST
* Convert Python tutorials to reST
* Convert CLI and Julia docs to reST
* Enable markdown for R vignettes
* Done migrating to reST
* Add guzzle_sphinx_theme to requirements
* Add breathe to requirements
* Fix search bar
* Add link to user forum
* Extended monotonic constraints support to 'hist' tree method.
* Added monotonic constraints tests.
* Fix the signature of NoConstraint::CalcSplitGain()
* Document monotonic constraint support in 'hist'
* Update signature of Update to account for latest refactor