Spaces:
Running
Running
vLLM + openai/gpt-oss-20b on 3× RTX 3090 (CUDA 12.8) — FlashAttention Error
#10
by
robinhassan
- opened
Hi,
Trying to run openai/gpt-oss-20b on vLLM with:
3 × RTX 3090 (24 GB each)
CUDA 12.8, Driver 570.xx
Python 3, PyTorch 2.7.0
I want to split the model across all GPUs, but I’m getting a FlashAttention error (likely needs FA3).
Any tips for multi-GPU vLLM setup and installing FlashAttention 3 for this environment?
Thanks!