results

This model is a fine-tuned version of distilbert-base-uncased on an Fitness-Intent (harshmakwana/fitness-intent) dataset. It achieves the following results on the evaluation set:

Loss: 0.0660
Accuracy: 0.975
Macro F1: 0.9753

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 32
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Macro F1
0.9307	1.0	60	0.8233	0.8417	0.8310
0.3312	2.0	120	0.2507	0.9583	0.9585
0.1212	3.0	180	0.1077	0.9667	0.9670
0.0316	4.0	240	0.0713	0.9667	0.9670
0.0173	5.0	300	0.0724	0.9667	0.9666
0.0114	6.0	360	0.0646	0.9667	0.9669
0.0098	7.0	420	0.0711	0.9833	0.9833
0.0084	8.0	480	0.0701	0.975	0.9753
0.0079	9.0	540	0.0686	0.975	0.9753
0.0077	10.0	600	0.0660	0.975	0.9753

Framework versions

Transformers 4.53.2
Pytorch 2.6.0+cu124
Datasets 4.0.0
Tokenizers 0.21.2

DistilBERT — Fitness-Intent Classifier 🏋️‍♂️

Lightweight DistilBERT fine-tuned on the Fitness-Intent dataset to detect six user intents for a conversational fitness assistant.

Intent labels	Examples (short)
`find_exercise`	“How do I do a proper squat?”
`general_chat`	“Hey coach, good morning!”
`generate_plan`	“Make me a 3-day split.”
`get_nutrition_info`	“How much protein do I need?”
`log_feeling`	“I felt tired after today’s workout.”
`out_of_scope`	Non-fitness or irrelevant queries

✨ Key Points

	Fine-tuned DistilBERT	Zero-shot BART-large-MNLI
Params	66 M	407 M
Test Accuracy	93.3 %	61.7 %
Macro F1	0.93	0.57
Colab free-GPU VRAM	< 4 GB	≈ 12 GB (risk of CPU fallback)
Train time	< 4 min (10 ep)	―

Fine-tuning a small model beats a 6× larger model in this domain.
Demonstrates the value of task-specific fine-tuning for resource-constrained use-cases.

📊 Evaluation Summary

Metric (Test 120 ex.)	Score
Accuracy	0.9333
Macro F1	0.9330
Per-class F1 range	0.90 – 1.00
Confusion Matrix	see notebook / repo

The best checkpoint occurred at epoch 7 (macro-F1 0.983 val); Trainer automatically re-loaded it (load_best_model_at_end=True).

🔧 Usage Example

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="harshmakwana/distilbert-fitness-intent",
    tokenizer="harshmakwana/distilbert-fitness-intent",
    top_k=None,             # return all logits if desired
)

classifier("Can you show me exercises to strengthen my core?")
# ➜ [{'label': 'find_exercise', 'score': 0.97}, ...]

@misc{distilfitnessintent2025, title = {DistilBERT Fine-tuned on Fitness-Intent}, author = {harsh makwana}, year = {2025}, howpublished = {\url{https://huggingface.co/harshmakwana/distilbert-fitness-intent}} }

Downloads last month: 3

Safetensors

Model size

67M params

Tensor type

F32

Model tree for harshmakwana/distilbert-fitness-intent

Base model

distilbert/distilbert-base-uncased

Finetuned

(10352)

this model

Evaluation results

Accuracy on Fitness-Intent (harshmakwana/fitness-intent)
test set self-reported

0.933
Macro F1 on Fitness-Intent (harshmakwana/fitness-intent)
test set self-reported

0.933
Params on Fitness-Intent (harshmakwana/fitness-intent)
test set self-reported

~66 M
Accuracy on Fitness-Intent (harshmakwana/fitness-intent)
test set self-reported

0.617
Macro F1 on Fitness-Intent (harshmakwana/fitness-intent)
test set self-reported

0.570
Model on Fitness-Intent (harshmakwana/fitness-intent)
test set self-reported

facebook/bart-large-mnli

View on Papers With Code