itod
/

mistral-7B-instruct-v0.2-f16

Model card Files Files and versions

mistral-7B-instruct-v0.2-f16 / README.md

itod's picture

Update README.md

8b1c1c5 verified over 1 year ago

|

history blame contribute delete

1.2 kB

	---
	license: mit
	---
	- Source Mistral 7B model: </br>https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/
	- This model is converted from Bfloat16 datatype to Float16 datatype with convert tool from: </br>https://github.com/ggerganov/llama.cpp
	- Deployment on CUDA GPU: </br>Pull the ready-made llama.cpp container:
	```
	docker pull ghcr.io/ggerganov/llama.cpp:server-cuda
	```
	Assuming mistral-7B-instruct-v0.2-fp16.gguf file is downloaded to /path/to/models directory on the local machine, run the container accesing the model with:
	```
	docker run --gpus all -v /path/to/models:/models -p 8000:8000 ghcr.io/ggerganov/llama.cpp:server-cuda -m /models/mistral-7B-instruct-v0.2-fp16.gguf --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 50
	```
	- Test the deployment accessing the model with the browser at http://localhost:8000
	- llama.cpp server also provides OpenAI compatible API
	- If CUDA GPU is not available, the version of the model converted to int8 may be interesting, available in this repo: </br>https://huggingface.co/itod/mistral-7B-instruct-v0.2-q8
	- More details about usage is avalable in llama.cpp documentation: </br>https://github.com/ggerganov/llama.cpp/tree/master/examples/server