For trying out some LSTM Machine Learning algorithms with my TradeFrame Algorithmic Trading Library, I wanted to install LibTorch with NVidia/Cuda support for hardware accelerating learning.
Do not install the nvidia-driver yet. It is part of the cuda deployment package. Only install headers, which are necessary for building kernal modules.
$ sudo apt install linux-headers-$(uname -r)
I used Installing C++ Distributions of PyTorch as a starting point. However, their example is CPU based. My desire is for a Cuda based installation. This meant going to the CUDA Zone and start the Download process. My configuration options were: Linux, x86_64, Debian, 12, deb (local).
Using the "deb (local)" with a complete file seemed to be the only way to ensure all components were available.
The steps, as of this writing, were:
wget https://developer.download.nvidia.com/compute/cuda/12.9.0/local_installers/cuda-repo-debian12-12-9-local_12.9.0-575.51.03-1_amd64.deb
sudo dpkg -i cuda-repo-debian12-12-9-local_12.9.0-575.51.03-1_amd64.deb
sudo cp /var/cuda-repo-debian12-12-9-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-9
Install the open version of the nvidia drivers:
sudo apt-get install -y nvidia-open
See if the nouveau driver is installed.
$ lsmod |grep nouveau
If so, then run these commands to enable the nvidia driver and to blacklist the nouveau driver and reboot:
sudo mv /etc/modprobe.d/nvidia.conf.dpkg-new /etc/modprobe.d/nvidia.conf
sudo update-initramfs -u
There is also the NVIDIA CUDA Installation Guide for Linux for further information.
The following changes are required for a successful compile of the example application below:
$ diff math_functions.h /etc/alternatives/cuda/bin/../targets/x86_64-linux/include/crt/math_functions.h
2556c2556
< extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ double sinpi(double x);
---
> extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ double sinpi(double x) noexcept (true);
2579c2579
< extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ float sinpif(float x);
---
> extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ float sinpif(float x) noexcept (true);
2601c2601
< extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ double cospi(double x);
---
> extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ double cospi(double x) noexcept (true);
2623c2623
< extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ float cospif(float x);
---
> extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ float cospif(float x) noexcept (true);
The PyTorch LibTorch library can be downloaded from PyTorch Start Locally. Choose the C++/Java option with Cuda 12.8 (as of this writing). An appropriate link is presented. Download and expand the file into a development directory. LibTorch doesn't have at this moment a build for Cuda 12.9, but is referenced as 12.8.
The most recent can be found at https://download.pytorch.org/libtorch/nightly/cu128/libtorch-shared-with-deps-latest.zip.
It is probably advised to NOT use the Debian package, as it may be out of date: pytorch-cuda.
Expand the libtorch package and deploy to /usr/local/share/libtorch.
To test out the installation, I then created a subdirectory containing a couple of files. The first is the test code example-app.cpp:
#include <torch/torch.h>
#include <iostream>
int main() {
torch::Tensor tensor = torch::rand({2, 3});
std::cout << tensor << std::endl;
}
The second file is the CMakeLists.txt file. This is my version:
cmake_minimum_required(VERSION 3.18 FATAL_ERROR)
cmake_policy(SET CMP0104 NEW)
cmake_policy(SET CMP0105 NEW)
project(example-app)
find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
set(CMAKE_CUDA_STANDARD 17)
add_executable(example-app example-app.cpp)
target_link_libraries(example-app "${TORCH_LIBRARIES}")
set_property(TARGET example-app PROPERTY CXX_STANDARD 17)
Then to build the example:
mkdir build
cmake \
-DCMAKE_PREFIX_PATH=/usr/local/share/libtorch \
-DCMAKE_CUDA_ARCHITECTURES=native \
-DCMAKE_BUILD_TYPE=DEBUG \
-DCMAKE_CUDA_COMPILER=/etc/alternatives/cuda/bin/nvcc \
-Dnvtx3_dir=/usr/local/cuda/targets/x86_64-linux/include/nvtx3 \
..
make
./example-app
Notes:
- PREFIX_PATH points to the directory of your expanded libtorch download
- CMAKE_CUDA_ARCHITECTURES provides a 'native' cuda solution, the build process will determine the specific gpu for which to build
- CMAKE_BUILD_TYPE can be DEBUG or RELEASE
- CMAKE_CUDA_COMPILER needs to be set, by using /etc/alternatives, these are softlinks to the version you desire (as were installed by the cuda installation)
- nvtx3_dir is required, as the current libtorch library seems to still refer to nvtx and not nvtx3
If you get output along the lines of:
-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 5.0;8.0;8.6;8.9;9.0;9.0a;10.0;10.0a;10.1a;12.0;12.0a
My system has two RTX 4070 cards, and can be verified with (an extract is shown with important parts, noticing that the nvidia driver is properly shown):
$ sudo lshw -c video
*-display
product: Arrow Lake-S [Intel Graphics]
configuration: depth=32 driver=i915 latency=0 mode=3840x2160 resolution=3840,2160 visual=truecolor xres=3840 yres=2160
*-display
product: AD103 [GeForce RTX 4070]
configuration: driver=nvidia latency=0
*-display
product: AD103 [GeForce RTX 4070]
configuration: driver=nvidia latency=0
Therefore, the output of my cmake process will include gpu specific selections:
-- Autodetected CUDA architecture(s): 8.9 8.9
-- Added CUDA NVCC flags for: -gencode;arch=compute_89,code=sm_89
And running the generated binary results in valid output:
$ ./example-app
0.7141 0.9744 0.3179
0.7794 0.9281 0.7529
[ CPUFloatType{2,3} ]
Continue reading "Installing LibTorch with Cuda on NVIDIA GeForce..." »