YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

My Dummy Model


language: fr license: apache-2.0 tags: - masked-lm - camembert - transformers - tf - french - fill-mask

CamemBERT MLM - Fine-tuned Model

This is a TensorFlow-based masked language model (MLM) based on the camembert-base checkpoint, a RoBERTa-like model trained on French text.

Model description

This model uses the CamemBERT architecture, which is a RoBERTa-based transformer trained on large-scale French corpora (e.g., OSCAR, CCNet). It's designed to perform Masked Language Modeling (MLM) tasks.

It was loaded and saved using the transformers library in TensorFlow (TFAutoModelForMaskedLM). It can be used for fill-in-the-blank tasks in French.

Intended uses & limitations

Intended uses

  • Fill-mask predictions in French
  • Feature extraction for NLP tasks
  • Fine-tuning on downstream tasks like text classification, NER, etc.

Limitations

  • Works best with French text
  • May not generalize well to other languages
  • Cannot be used for generative tasks (e.g., translation, text generation)

How to use

from transformers import TFAutoModelForMaskedLM, AutoTokenizer
import tensorflow as tf

model = TFAutoModelForMaskedLM.from_pretrained("Mhammad2023/my-dummy-model")
tokenizer = AutoTokenizer.from_pretrained("Mhammad2023/my-dummy-model")

inputs = tokenizer("J'aime le [MASK] rouge.", return_tensors="tf")
outputs = model(**inputs)
logits = outputs.logits

masked_index = tf.argmax(inputs.input_ids == tokenizer.mask_token_id, axis=1)[0]
predicted_token_id = tf.argmax(logits[0, masked_index])
predicted_token = tokenizer.decode([predicted_token_id])

print(f"Predicted word: {predicted_token}")

Limitations and bias

This model inherits the limitations and biases from the camembert-base checkpoint, including:

Potential biases from the training data (e.g., internet corpora)

Inappropriate predictions for sensitive topics

Use with caution in production or sensitive applications.

Training data

The model was not further fine-tuned; it is based directly on camembert-base, which was trained on:

OSCAR (Open Super-large Crawled ALMAnaCH coRpus)

CCNet (Common Crawl News)

Training procedure

No additional training was applied for this version. You can load and fine-tune it on your task using Trainer or Keras API.

Evaluation results

This version has not been evaluated on downstream tasks. For evaluation metrics and benchmarks, refer to the original camembert-base model card.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support