WARNING: This is a language model that has undergone instruction tuning for conversational settings that exploit function calling capabilities. It has not been aligned with human preferences. As a result, it may generate outputs that are inappropriate, misleading, biased, or unsafe. These risks can be mitigated through additional post-training stages, which is strongly recommended before deployment in any production system, especially for high-stakes applications.
NOTE: This is a GATED model, intended only for internal and external tests. Do not request access if you have not already contact us and have been given permission to test it. Please write carlos.rodriguez1(at)bsc.es to justify use, and we can grant access.
How to use
from datetime import datetime
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
model_id = "BSC-LT/salamandra-7b-instruct"
text = "What is the weather like in Paris today?"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16
)
message = [ { "role": "user", "content": text } ]
tools = [{
"type": "function",
"name": "get_weather",
"description": "Get current temperature for a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country e.g. Bogotá, Colombia"
}
},
"required": [
"location"
],
"additionalProperties": False
}
}]
prompt = tokenizer.apply_chat_template(
message,
tokenize=False,
add_generation_prompt=True,
tools=tools
)
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=1000)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Deploy with vllm
Deploy the model using vllm docker image.
docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=<secret>" \
-p 80:80 \
vllm/vllm-openai:latest \
--model BSC-LT/salamandra-7b-instruct-tools \
--enable-auto-tool-choice \
--tool-call-parser hermes \
--max_model_len 8196 \
--port 80
Then use it with openai api
pip install openai
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1/",
api_key="hf_xxxx"
)
models = client.models.list()
model = models.data[0].id
system_message = ""
messages = [{ "role": "system", "content": system_message}] if system_message else []
messages.append( {"role":"user", "content": "What is the weather like in Paris today?"})
print(messages)
chat_completion = client.chat.completions.create(
model=model,
tools=tools
messages=messages,
stream=False,
max_tokens=1000,
temperature=0.1,
frequency_penalty=0.2,
)
print(chat_completion)
- Downloads last month
- 241