doc
This commit is contained in:
parent
f332750359
commit
c7282acb2a
@ -7,3 +7,22 @@ Rabit Documentation
|
|||||||
|
|
||||||
Parameters
|
Parameters
|
||||||
====
|
====
|
||||||
|
This section list all the parameters that can be passed to rabit::Init function as argv.
|
||||||
|
All the parameters are passed in as string in format of ```parameter-name=parameter-value```.
|
||||||
|
In most setting these parameters have default value or will be automatically detected,
|
||||||
|
and do not need to be manually configured.
|
||||||
|
|
||||||
|
* rabit_tracker_uri [passed in automatically by tracker]
|
||||||
|
- The uri/ip of rabit tracker
|
||||||
|
* rabit_tracker_port [passed in automatically by tracker]
|
||||||
|
- The port of rabit tracker
|
||||||
|
* rabit_task_id [automatically detected]
|
||||||
|
- The unique identifier of computing process
|
||||||
|
- When running on hadoop, this is automatically extracted from enviroment variable
|
||||||
|
* rabit_reduce_buffer [default = 256MB]
|
||||||
|
- The memory buffer used to store intermediate result of reduction
|
||||||
|
- Format "digits + unit", can be 128M, 1G
|
||||||
|
* rabit_global_replica [default = 5]
|
||||||
|
- Number of replication copies of result kept for each Allreduce/Broadcast call
|
||||||
|
* rabit_local_replica [default = 2]
|
||||||
|
- Number of replication of local model in check point
|
||||||
|
|||||||
@ -268,6 +268,9 @@ Because the different nature of the two types of models, different strategy will
|
|||||||
nodes (selected using a ring replication strategy). The checkpoint is only saved in the memory without touching the disk which makes rabit programs more efficient.
|
nodes (selected using a ring replication strategy). The checkpoint is only saved in the memory without touching the disk which makes rabit programs more efficient.
|
||||||
User is encouraged to use ```global_model``` only when is sufficient for better efficiency.
|
User is encouraged to use ```global_model``` only when is sufficient for better efficiency.
|
||||||
|
|
||||||
|
To enable a model class to be checked pointed, user can implement a [serialization interface](../include/rabit_serialization.h). The serialization interface already
|
||||||
|
provide serialization functions of STL vector and string. For python API, user can checkpoint any python object that can be pickled.
|
||||||
|
|
||||||
There is a special Checkpoint function called [LazyCheckpoint](http://homes.cs.washington.edu/~tqchen/rabit/doc/namespacerabit.html#a99f74c357afa5fba2c80cc0363e4e459),
|
There is a special Checkpoint function called [LazyCheckpoint](http://homes.cs.washington.edu/~tqchen/rabit/doc/namespacerabit.html#a99f74c357afa5fba2c80cc0363e4e459),
|
||||||
which can be used for ```global_model``` only cases under certain condition.
|
which can be used for ```global_model``` only cases under certain condition.
|
||||||
When LazyCheckpoint is called, no action is taken and the rabit engine only remembers the pointer to the model.
|
When LazyCheckpoint is called, no action is taken and the rabit engine only remembers the pointer to the model.
|
||||||
@ -282,6 +285,7 @@ LazyCheckPoint, code1, Allreduce, code2, Broadcast, code3, LazyCheckPoint
|
|||||||
The user must only change the model in code3. Such condition can usually be satiesfied in many scenarios, and user can use LazyCheckpoint to further
|
The user must only change the model in code3. Such condition can usually be satiesfied in many scenarios, and user can use LazyCheckpoint to further
|
||||||
improve the efficiency of the program.
|
improve the efficiency of the program.
|
||||||
|
|
||||||
|
|
||||||
Compile Programs with Rabit
|
Compile Programs with Rabit
|
||||||
====
|
====
|
||||||
Rabit is a portable library, to use it, you only need to include the rabit header file.
|
Rabit is a portable library, to use it, you only need to include the rabit header file.
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user