56aad86231
adding incomplete kmeans. I'm having a problem with the broadcast, and still need to implement the logic
nachocano
2014-12-03 01:16:13 -08:00
ed1de6df80
change AllReduce to Allreduce
tqchen
2014-12-02 21:11:48 -08:00
2e536eda29
check in the recover strategy
tqchen
2014-11-30 11:42:59 -08:00
155ed3a814
seems a OK version of reset, start to work on decide exec
tqchen
2014-11-29 22:22:51 -08:00
5b0bb53184
refactor code style, reset link still need thoughts
tqchen
2014-11-29 20:15:27 -08:00
42505f473d
finish reset link log
tqchen
2014-11-29 15:14:43 -08:00
98756c068a
livelock in oob send recv
tqchen
2014-11-28 21:58:15 -08:00
aa54a038f2
livelock in oob send recv
tqchen
2014-11-28 21:56:58 -08:00
a30075794b
initial version of robust engine, add discard link, need more random mock test, next milestone will be recovery
tqchen
2014-11-28 15:56:12 -08:00
a8128493c2
execute it like this: ./test.sh 4 4000 testcase0.conf ./
nachocano
2014-11-28 01:48:26 -08:00
faed8285cd
execute it like ./test.sh 4 4000 testcase0.conf to obtain a successful execution
nachocano
2014-11-28 00:16:35 -08:00
21f3f3eec4
adding const to variable to comply with google code convention... may need to change more stuff though. Taint what else do you mean? Spaces, tabs, names?
nachocano
2014-11-27 17:03:31 -08:00
2f1ba40786
change in socket, to pass out error code
tqchen
2014-11-27 16:17:07 -08:00
c565104491
adding some references to mock inside TEST preprocessor directive. It shouldn't be an assert because it shutdowns the process. Instead should check on the value and return some sort of error, so that we can recover. The mock contains queues, indexed by the rank of the process. For each node, you can configure the behavior you expect (success or failure for now) when you call any of the methods (AllReduce, Broadcast, LoadCheckPoint and CheckPoint)... If you call several times AllReduce, the outputs will pop from the queue, i.e., first you can retrieve a success, then a failure and so on. Pretty basic for now, need to tune it better
nachocano
2014-11-26 17:24:29 -08:00
54fcff189f
dummy mock for now
nachocano
2014-11-26 16:37:23 -08:00