[doc] Troubleshoot nccl shared memory. [skip ci] (#9206)
This commit is contained in:
parent
62e9387cd5
commit
7f20eaed93
@ -519,6 +519,9 @@ Troubleshooting
|
|||||||
the ``NCCL_SOCKET_IFNAME``. In addition, you can use ``NCCL_DEBUG`` to obtain debug
|
the ``NCCL_SOCKET_IFNAME``. In addition, you can use ``NCCL_DEBUG`` to obtain debug
|
||||||
logs.
|
logs.
|
||||||
|
|
||||||
|
- If NCCL fails to initialize in a container environment, it might be caused by limited
|
||||||
|
system shared memory. With docker, one can try the flag: `--shm-size=4g`.
|
||||||
|
|
||||||
- MIG (Multi-Instance GPU) is not yet supported by NCCL. You will receive an error message
|
- MIG (Multi-Instance GPU) is not yet supported by NCCL. You will receive an error message
|
||||||
that includes `Multiple processes within a communication group ...` upon initialization.
|
that includes `Multiple processes within a communication group ...` upon initialization.
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user