You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

🧠 Model Description

Designed for evaluating function calls in the context of Model Context Protocol (MCP) tools.
It can assess whether a function call is correct, uses the wrong tool, has incorrect parameter names, or has incorrect parameter values.

The mcp-tool-use-quality-ranger-0.6b is a fine-tuned sequence classification model created to evaluate the quality of function calls in conversational AI systems.

Max Context Length: 32,768 Tokens

It determines if a given function call:

  • Selects the correct tool
  • Has correct parameter names and structure
  • Contains correct parameter values

It produces one of four possible classification labels:

Label Meaning
VALID_CALL βœ… The tool name, parameters, and values are all correct, or no suitable tool exists and no function call is made.
TOOL_ERROR ❌ The tool name does not exist or does not match the user intent.
PARAM_NAME_ERROR ❌ The correct tool is used, but parameter names are missing, extra, or incorrect.
PARAM_VALUE_ERROR ❌ Tool and parameter names are correct, but parameter values are wrong or incorrectly formatted.

πŸ“Š Benchmark Evaluation

The mcp-tool-use-quality-ranger-0.6b was evaluated in a binary classification setting,
where the prediction is Correct if the function call evaluation matched the gold label, and Incorrect otherwise.

Model #Params Avg. Latency Avg Binary Accuracy Qualifire mcp-tool-use-quality Benchmark Binary Accuracy Limbic Benchmark Binary Accuracy
qualifire/mcp-tool-use-quality-ranger-4b [private] 4B 0.30[sec] 0.978 0.997 0.960
qualifire/mcp-tool-use-quality-ranger-0.6b 0.6B 0.09[sec] 0.958 0.993 0.924
gemini-2.5-flash - 4.87[sec] 0.890 0.936 0.845
quotientai/limbic-tool-use-0.5B-32K 0.5B 0.79[sec] 0.818 0.749 0.887

πŸ“Œ Metrics Definitions

  • Avg. Binary Accuracy – Mean accuracy across all evaluated benchmarks,
    where predictions are mapped to binary outcomes as follows:

    • Qualifire TUQ Benchmark

      • Correct β†’ VALID_CALL
      • Incorrect β†’ TOOL_ERROR, PARAM_NAME_ERROR or PARAM_VALUE_ERROR
    • Limbic Benchmark

      • Correct β†’ correct
      • Incorrect β†’ incorrect_tool, incorrect_parameter_names or incorrect_parameter_values
  • Qualifire TUQ Benchmark link – Qualifire Tool Selection Quality Benchmark.

  • Limbic Benchmark link – Limbic Eval Tool Use MCP Benchmark.


πŸ“œ Evaluation Prompt Template

The model uses the following structured evaluation process:

  1. TOOL SELECTION

    • Check if the tool name exists in available_tools
    • Check if tool purpose matches user intent
    • Fail β†’ TOOL_ERROR❌
  2. PARAMETER STRUCTURE

    • All required parameters are present
    • No extra parameters
    • Parameter names exactly match the schema
    • Fail β†’ PARAM_NAME_ERROR❌
  3. PARAMETER VALUES

    • Values have correct data types
    • Values match user request
    • No fabricated or incorrect values
    • Fail β†’ PARAM_VALUE_ERROR❌

If all checks pass β†’ VALID_CALLβœ…


πŸ“¦ Requirements

  • transformers>=4.51.0
  • huggingface_hub
  • torch

πŸ’» Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
import torch
from huggingface_hub import hf_hub_download

# Model name
model_name = "qualifire/mcp-tool-use-quality-ranger-0.6b"

# Map raw labels to human-readable labels
map_id_to_label = {
    'LABEL_0': 'VALID_CALL',
    'LABEL_1': 'TOOL_ERROR',
    'LABEL_2': 'PARAM_NAME_ERROR',
    'LABEL_3': 'PARAM_VALUE_ERROR'
}

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map='auto',
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create pipeline
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)

# Load prompt template
file_path = hf_hub_download(repo_id=model_name, filename="tsq_prompt_template.txt")
with open(file_path, encoding="utf-8") as f:
    PROMPT_TEMPLATE = f.read()

# Example inputs
example_tools_list = '''[
  {
    "type": "function",
    "function": {
      "name": "send-email",
      "description": "Send an email using Resend",
      "parameters": {
        "properties": {
          "to": {
            "type": "string",
            "format": "email",
            "description": "Recipient email address"
          },
          "content": {
            "type": "string",
            "description": "Plain text email content"
          },
          "subject": {
            "type": "string",
            "description": "Email subject line"
          },
          "scheduledAt": {
            "type": "string",
            "description": "Optional parameter to schedule the email. This uses natural language. Examples would be 'tomorrow at 10am' or 'in 2 hours' or 'next day at 9am PST' or 'Friday at 3pm ET'."
          }
        },
        "required": ["to", "subject", "content"]
      }
    }
  }
]'''

example_message_history = '''[
  {
    "role": "user",
    "content": "Please send an email to 'jane.doe@example.com' with the subject 'Meeting Follow-Up'. The content should be 'Hi Jane, just following up on our meeting from yesterday. Please find the attached notes.' and schedule it for tomorrow at 10am."
  },
  {
    "completion_message": {
      "content": {
        "type": "text",
        "text": ""
      },
      "role": "assistant",
      "stop_reason": "tool_calls",
      "tool_calls": [
        {
          "id": "call_le25efmhltxx9o7n4rfe",
          "function": {
            "name": "send-email",
            "arguments": {
              "subject": "Meeting Follow-Up",
              "content": "Hi Jane, just following up on our meeting from yesterday. Please find the attached notes.",
              "scheduledAt": "tomorrow at 10am"
            }
          }
        }
      ]
    }
  }
]'''

# Format input
example_input = PROMPT_TEMPLATE.format(
    message_history=example_message_history,
    available_tools=example_tools_list
)

# Get prediction
result = pipe(example_input)[0]
result['label'] = map_id_to_label[result['label']]
print(result)

✨ Example Output

{'label': 'PARAM_NAME_ERROR', 'score': 0.9999843835830688}

The tool call name and values are correct, but the required parameter 'to' is missing from the function call, so the label is PARAM_NAME_ERROR.

Downloads last month
195
Safetensors
Model size
596M params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for qualifire/mcp-tool-use-quality-ranger-0.6b

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(274)
this model
Quantizations
1 model

Collection including qualifire/mcp-tool-use-quality-ranger-0.6b