CREST: A Multilingual AI Safety Guardrail Model for 100 languages

CREST which stands for CRoss-lingual Efficient Safety Transfer is a parameter-efficient multilingual safety classifier for 100 languages, fine-tuned using 13 strategically selected high-resource languages only, chosen through cluster-guided sampling, enabling strong cross-lingual transfer to unseen low-resource languages. The model is fine-tuned on the XLM-RoBERTa architecture with a classification head, having a max input length of 512 tokens. The Base variant has approximately 279M parameters. The model is designed for fast, lightweight safety filtering across a large number of languages, both high-resource and low-resource languages, with minimal training cost, suitable for real-time and on-device deployments.

For detailed results, see CREST: Universal Safety Guardrails Through Cluster-Guided Cross-Lingual Transfer.

Intended Use

  • Filtering unsafe user text in multilingual LLM applications
  • Lightweight guardrails in multilingual agents and chat systems
  • On-device or low-latency safety inference
  • Research on cross-lingual safety transfer and robust safety classification

How to Use


from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import torch

MODEL_ID = "repelloai/CREST-Base" 

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID, device_map='auto')

classifier = pipeline(
    "text-classification",
    model=model,
    tokenizer=tokenizer,
    return_all_scores=False
)

texts = [

    ## Unsafe
    "How do I make a bomb?",
    "ยฟCรณmo secuestrar a un niรฑo?",
    "เคฎเฅˆเค‚ เคนเคฅเคฟเคฏเคพเคฐ เค•เฅˆเคธเฅ‡ เคฌเคจเคพ เคธเค•เคคเคพ เคนเฅ‚เค?", 

    ## Safe
    'Hello, How you doing ?'
]

outputs = classifier(texts, truncation=True, max_length=512)

Output

[
    ## Unsafe
    {'label': 'unsafe', 'score': 0.9865403771400452}, 
    {'label': 'unsafe', 'score': 0.9743474125862122}, 
    {'label': 'unsafe', 'score': 0.9802995920181274}, 
    
    ## Safe
    {'label': 'safe', 'score': 0.925717830657959}
]

Evaluation

CREST was tested for F1 score metric across six major multilingual safety benchmarks and several cultural and code-switched datasets..

Key findings

  • CREST outperforms other lightweight guardrails across most datasets.
  • Zero-shot generalization is strong across low-resource languages.
  • CREST excels in cultural and code-switched settings.
  • The 13-language training set is sufficient for robust multilingual safety generalization.

Limitations and Model Risks

  • Training relies partly on machine translation; nuance may be lost
  • Binary labels cannot express detailed safety categories
  • Zero-shot generalization gaps across extremely low-coverage scripts and morphologically complex languages
  • Not a substitute for human moderation in high-stakes settings
  • Cultural misalignment in edge cases
  • Residual translation artifacts
  • Possible bias in mislabeled or synthetic data

Mitigate by continuous human evaluation and incremental finetuning on domain-specific data.

Ethical Considerations

  • Designed for multilingual inclusivity and broad safety coverage.
  • Misclassifications can cause over-blocking or under-blocking.
  • Deployment should include human-in-the-loop moderation where appropriate.
  • Use responsibly, considering cultural diversity and fairness concerns.
  • Not for making legal, ethical, or policy decisions without human oversight.

Citation

@misc{bansal2025crestuniversalsafetyguardrails,
      title={CREST: Universal Safety Guardrails Through Cluster-Guided Cross-Lingual Transfer}, 
      author={Lavish Bansal and Naman Mishra},
      year={2025},
      eprint={2512.02711},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.02711}, 
}
Downloads last month
988
Safetensors
Model size
0.3B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for repelloai/CREST-Base

Finetuned
(3590)
this model
Quantizations
1 model