[doc] Troubleshoot nccl shared memory. [skip ci] (#9206)

This commit is contained in:
Jiaming Yuan 2023-05-31 05:00:02 +08:00 committed by GitHub
parent 62e9387cd5
commit 7f20eaed93
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -519,6 +519,9 @@ Troubleshooting
the ``NCCL_SOCKET_IFNAME``. In addition, you can use ``NCCL_DEBUG`` to obtain debug
logs.
- If NCCL fails to initialize in a container environment, it might be caused by limited
system shared memory. With docker, one can try the flag: `--shm-size=4g`.
- MIG (Multi-Instance GPU) is not yet supported by NCCL. You will receive an error message
that includes `Multiple processes within a communication group ...` upon initialization.