How to Enable GPU Passthrough to LXC Containers in Proxmox indicates that the process of providing passthrough of a GPU to both an LXC container as well as a Virtual Machine is not possible as the two types of configurations conflict with each other.
As my own preference is to run whatever possible in LXC containers, I'll summarize the configuration I used, which is an amalgamation of configurations from several sites.
My current installation is ProxMox v9.1.6 with:
- ProArt Z890-CREATOR WIFI
- Intel(R) Core(TM) Ultra 9 285K
- Corsair CMP64GX5M2X6600C32 (128G 4400 MT/s) - ECC would have been nice
- NVIDIA Corporation AD103 [GeForce RTX 4070] (rev a1)
In BIOS/UEFI, enable these:
- VT-d / IOMMU
- Above 4G Decoding
- PCIe Native Power Management (if available)
Proxmox kernel parameters:
# /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction"
update-grub
reboot
VFIO Binding - optional but recommended:
# /etc/modprobe.d/vfio.conf
options vfio_iommu_type1 allow_unsafe_interrupts=1
Obtain Linux drivers from NVidia. The CUDA toolkit is not required. Only the drivers are required in ProxMox. Toolkits and add-ons are added within the container.
Install the drivers:
apt install build-essential
apt install pve-headers-$(uname -r)
sh NVIDIA-Linux-x86_64-595.58.03.run
# note, use the open kernel, rather than proprietary
Blacklist nouveau:
cat > /etc/modprobe.d/blacklist-nouveau.conf << EOF
blacklist nouveau
options nouveau modeset=0
EOF
Test that the card is accessible:
nvidia-smi
Enable Data Persistence to prevent the GPU from re-initializing with each use.
nvidia-persistenced --persistence-mode
systemctl enable nvidia-persistenced
Then restart:
reboot
Identify the nvidia devices requiring passthrough:
root@host02:~# ls -al /dev/nvidia*
crw-rw-rw- 1 root root 195, 0 Mar 28 11:58 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Mar 28 11:58 /dev/nvidiactl
crw-rw-rw- 1 root root 505, 0 Mar 28 11:58 /dev/nvidia-uvm
crw-rw-rw- 1 root root 505, 1 Mar 28 11:58 /dev/nvidia-uvm-tools
/dev/nvidia-caps:
total 0
drwxr-xr-x 2 root root 80 Mar 28 11:58 .
drwxr-xr-x 21 root root 5060 Mar 28 11:58 ..
cr-------- 1 root root 508, 1 Mar 28 11:58 nvidia-cap1
cr--r--r-- 1 root root 508, 2 Mar 28 11:58 nvidia-cap2
Note the numbers 195, 505 and 508 in this list (yours may be different).
Construct a container, and prior to starting, place the following into /etc/pve/lxc/<vmid>.conf (based upon the device listing above):
dev0: /dev/nvidia0
dev1: /dev/nvidiactl
dev2: /dev/nvidia-modeset
dev3: /dev/nvidia-uvm
dev4: /dev/nvidia-uvm-tools
dev5: /dev/nvidia-caps/nvidia-cap1
dev6: /dev/nvidia-caps/nvidia-cap2
These lines are optional in the config file, one site talks about them by my container seems to work without them (they may be an old style cgroup2 style passthrough rather than the device oriented passthrough above):
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 505:* rwm
lxc.cgroup2.devices.allow: c 508:* rwm
Start the container and push the driver file into the container:
pct push <vmid> downloads/NVIDIA-Linux-x86_64-595.58.03.run /root/NVIDIA-Linux-x86_64-595.58.03.run
In the container, install the driver, minus the kernel module:
apt install kmod
sh NVIDIA-Linux-x86_64-595.58.03.run --no-kernel-modules
Run nvidia-smi in the container to confirm the card is reachable.
Add nvtop at the host or the container level to chart live GPU utllization:
apt install nvtop
Additional resources:
Another type of test to run when pytorch is installed:
python -c "import torch; print(torch.cuda.is_available())"
NOTE: for running in unprivileged container, some ideas in Plex GPU transcoding in Docker on LXC on Proxmox v2:
# if you're running in unprivileged mode, you also need to add permissions
# you either add the lines above, or the lines below -- not both
# gid/uid might need to be changed to suit your lxc-setup
dev0: /dev/nvidia0,gid=1000,uid=1000
dev1: /dev/nvidiactl,gid=1000,uid=1000
dev2: /dev/nvidia-modeset,gid=1000,uid=1000
dev3: /dev/nvidia-uvm,gid=1000,uid=1000
dev4: /dev/nvidia-uvm-tools,gid=1000,uid=1000
dev5: /dev/nvidia-caps/nvidia-cap1,gid=1000,uid=1000
dev6: /dev/nvidia-caps/nvidia-cap2,gid=1000,uid=1000
dev7: /dev/dri/card0,gid=1000,uid=1000
dev8: /dev/dri/renderD128,gid=1000,uid=1000