These are notes to resolve specific errors with PyTorch 1.8.1 and CUDA 11.1 at this particular moment in time. On May 11, 2021, the normal Detectron2 install instructions are below (done in a virtualenv).
1. Install OpenCV. My notes are for OpenCV 4.5.2 with CUDA 11.2 and cuDNN 8.1 on Ubuntu 20.04. You can standardize on the PyTorch and Detectron2 requirements, but there don’t appear to be any issues with using CUDA 11.2 instead of CUDA 11.1.
2. Install PyTorch and torchvision. This installs PyTorch Stable 1.8.1 for CUDA 11.1
pip3 install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
3. Install Detectron2 (currently v0.4) using CUDA 11.1 and torch 1.8
python -m pip install detectron2 -f \ https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.8/index.html
If you try to run the balloon tutorial, training will fail with a message like:
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [13,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
A cursory search will show that this error occurs when cfg.MODEL.ROI_HEADS.NUM_CLASSES is not correctly set. However, that’s not the problem here. There is an issue with PyTorch 1.8.1 and CUDA 11.1 (Detectron2 issue 2837, PyTorch issue 55027). It has been fixed in the PyTorch 1.9 preview (nightly), so it’s necessary to install that instead.
pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu111/torch_nightly.html
After upgrading PyTorch, you might encounter a message like the one below when training (Detectron2 issue 686)
cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN3c1012CUDATensorIdEv
The commands below are helpful to get diagnostic information.
import torch torch.__version__ import detectron2 detectron2.__version__
python -m detectron2.utils.collect_env
It’s necessary to rebuild Detectron2 to use the upgraded PyTorch nightly. From the Detectron2 install instructions:
To rebuild detectron2 that’s built from a local clone, use
rm -rf build/ **/*.soto clean the old build first. You often need to rebuild detectron2 after reinstalling PyTorch.
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git' # (add --user if you don't have permission) # Or, to install it from a local clone: git clone https://github.com/facebookresearch/detectron2.git python -m pip install -e detectron2
After rebuilding Detectron2 (I rebuilt from the local git clone), the balloon tutorial works without error.
Once the PyTorch fix makes it into stable and Detectron2 and PyTorch are back in sync, this post won’t be necessary. But hopefully this helps in the interim.