Inference with an RX580

艾米心amihart
5 min readJan 31, 2025

--

I would not recommend purchasing an RX580 for anything AI-related. If you want a budget card for this purpose, a 3060 12GB is probably the best way to go at the moment. I tried to see if I could get the RX580 to run LLMs and Stable Diffusion simply out of morbid curiosity to see if it could work at all. Despite how ancient this card is, it is technically possible to use it for this purpose, and I will explain here how I managed to get it to work in Debian.

Large Language Models

Sadly, you cannot just install AMD’s ROCm drivers because AMD dropped support for them for this card. The simplest way to get the RX580 to work with LLMs is to just abandon ROCm entirely and use Vulkan drivers instead, which currently do support the RX580. Ollama does not support Vulkan, so we will use llama.cpp instead.

First, install the Vulkan drivers and some other prerequisites.

sudo apt install vulkan-tools libtcmalloc-minimal4 libcurl4-openssl-dev glslc cmake make git pkg-config libvulkan-dev

Then, compile llama.cpp with Vulkan.

cd ~
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build
cd build
cmake .. -DGGML_VULKAN=on -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=ON

Now, simply add it to your PATH and log out and log back in.

echo 'export PATH=$PATH:'$(realpath bin) >> ~/.bashrc

When you log back in, you should then be able to run models on your GPU. The first time you run them they will have to download first which can take a bit of time. Below is an example of running DeepSeek R1 8B, which is the largest that will fit inside of the VRAM of the RX580.

#Use this to download a model and test it
llama-run deepseek-r1:8B
#Run in the CLI, gives you tokens per second rating after you Ctrl+C
llama-cli -m deepseek-r1:8B --device Vulkan0 -ngl 100
#Run in the browser, similar to ChatGPT
llama-server -m deepseek-r1:8B --device Vulkan0 -ngl 100 --host 0.0.0.0

If I run it without GPU acceleration, I get only about 5.45 tokens per second on my Celeron G6900. However, when I run it using the Vulkan driver with 100 GPU layers, I get 24.56 tokens per second! Pretty decent uplift, it takes the model from painful to use to actually very nice to use.

Of course, this is far from the performance you’d get from even a lower-end Nvidia card with tensor cores, like the 3060. On my 3060, I get 59 tokens per second with the same model. Additionally, on the 3060, due to the extra 4GB of VRAM, you can also run the 14B model at 33 tokens per second.

Sadly, even though Vulkan seems to do a pretty good job with the RX580, I am unaware of any way to get Vulkan to work with Stable Diffusion. If you want to use Stable Diffusion, you will need ROCm.

Stable Diffusion

To get ROCm to work, we can use a sandbox where the drivers are configured to work that is isolated from the rest of the system. This can be done using Docker. Below is how to install Docker, it is a bit of a lengthy process.

#Docker (https://docs.docker.com/engine/install/debian/)
sudo bash
apt update
apt install -y ca-certificates curl
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
chmod a+r /etc/apt/keyrings/docker.asc
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
tee /etc/apt/sources.list.d/docker.list > /dev/null
apt update
apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Once Docker is installed, we can use woodrex’s Docker containers. The first is just a container where ROCm is configured, the second is one where ROCm and Stable Diffusion is configured and the latter automatically launches. I recommend both since opening the first one is useful for running rocm-smi which will give GPU usage information, helpful for verifying it is actually running on your GPU. Below are the commands for actually running the containers. The first time you run them, it has to download the container, which can take some time.

#rocm-for-gfx803-dev
sudo docker run -it \
--network=host \
--device=/dev/kfd \
--device=/dev/dri \
--ipc=host \
--shm-size 16G \
--group-add video \
--cap-add=SYS_PTRACE \
--rm woodrex/rocm-for-gfx803-dev:latest

#woodrex/sd-webui-for-gfx803:latest
cd ~
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd ~/stable-diffusion-webui
sudo docker run -it \
--network=host \
--device=/dev/kfd \
--device=/dev/dri \
--ipc=host \
--shm-size 16G \
--group-add video \
--cap-add=SYS_PTRACE \
--rm -v $(pwd)/cache:/root/.cache \
-v $(pwd)/data:/stable-diffusion-webui/data \
woodrex/sd-webui-for-gfx803:latest

As you can see in the images below, Stable Diffusion is running and is using 100% of my GPU. It is not fast as it requires a full 1 minute to generate an image with the default model with the default settings. Keep in mind that a 3060 only takes about 4 seconds to generate an image with these same settings.

One thing I recommend is after you have Stable Diffusion running, create an image of the container. This will prevent it from having to redownload some libraries every time you run it. To do this, first start Stable Diffusion and make sure it is running in the browser, then in another terminal, you can use sudo docker ps -l to list all running containers. Copy the CONTAINER ID value and then use sudo docker commit [CONTAINER ID] without the brackets to save it. You can run sudo docker images to see the IMAGE ID. Copy this ID and then go back to the command used for running Stable Diffusion and replace the very last line with this ID. Now, when you run it, it will run it with those libraries already installed, and so it will simply launch Stable Diffusion without having to install anything.

--

--

艾米心amihart
艾米心amihart

Written by 艾米心amihart

Professional software developer (B.S CompSci), quantum computing enthusiast.

Responses (1)