itod's picture
Update README.md
8b1c1c5 verified
metadata
license: mit
docker pull ghcr.io/ggerganov/llama.cpp:server-cuda

Assuming mistral-7B-instruct-v0.2-fp16.gguf file is downloaded to /path/to/models directory on the local machine, run the container accesing the model with:

docker run --gpus all -v /path/to/models:/models -p 8000:8000 ghcr.io/ggerganov/llama.cpp:server-cuda -m /models/mistral-7B-instruct-v0.2-fp16.gguf --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 50