🚨 UNDERFITTING β€” When your AI pretends to learn πŸ€–πŸ’₯

Community Article Published September 10, 2025

πŸ“– Definition

Underfitting = your AI model hasn't learned enough to be effective.

Signs:

  • High train_loss (poor memorization)
  • High eval_loss (poor generalization)
  • Trash responses (spam, weird characters)
  • Disappointing performance on known AND unknown data

⚑ Advantages / Disadvantages / Limitations

βœ… "Advantages" (if we can call them that)

  • No overfitting (at least there's that...)
  • Fast training (but useless)
  • Low resource consumption

❌ Disadvantages

  • Unusable model in production
  • Waste of time/money on compute
  • Maximum developer frustration
  • Catastrophic performance

⚠️ Limitations

  • Sometimes late detection (after full training)
  • Confusion with other issues (data quality, bugs)

πŸ› οΈ Practical tutorial: My real case

πŸ“Š Setup

  • Model: GPT-2 Small (124M parameters)
  • Dataset: 80 MB, 125,705 texts, 123,518 Q&A
  • Config: 1 epoch, LR=5e-5, batch_size=8, max_length=512
  • Hardware: GTX 1080 Ti, Ryzen 5600G, 48 GB RAM

πŸ“ˆ Results obtained

train_loss: 1.63
eval_loss: 1.42  
perplexity: 4.16

πŸ§ͺ Real-world testing

Input:  "Hi there!"
Output: "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"

Input:  "What's DHCP?"  
Output: "DHCP ??????????????????????????????"

Input:  "Natural satellites of Earth?"
Output: "Earth???????????????????????????"

Verdict: 🚨 UNDERFITTING CONFIRMED


πŸ’‘ Concrete examples

Typical underfitting cases

  • Insufficient epochs (1 instead of 6-10)
  • Learning rate too low (1e-6 instead of 5e-5)
  • Dataset too complex for the model
  • Inadequate architecture (model too simple)

Affected models

  • GPT-2 Small trained 1 epoch
  • BERT Base on massive dataset (1 epoch)
  • Custom transformers under-dimensioned

πŸ“‹ Cheat sheet: Diagnosing underfitting

πŸ” Warning signals

  • Train_loss > 2.0 (for NLP)
  • Eval_loss close to train_loss (no learning)
  • Perplexity > 10 (confused model)
  • Repetitive/incoherent outputs

πŸ› οΈ Solutions

  • More epochs (6-10 minimum)
  • Higher learning rate (5e-5 β†’ 1e-4)
  • More data (if possible)
  • More complex architecture

βš™οΈ Recommended config

epochs: 6-10
learning_rate: 5e-5 to 1e-4  
warmup: 10%
batch_size: 8-16
max_length: 512

πŸ’» Code example

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=6,
    learning_rate=5e-5,
    warmup_ratio=0.1,
    per_device_train_batch_size=8,
    evaluation_strategy="steps",
    eval_steps=500,
    save_steps=1000,
    logging_steps=100
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset
)

trainer.train()

πŸ“ Summary

Underfitting = under-trained model producing catastrophic results. 1 epoch on 125k examples = recipe for failure. Solution: more epochs, metrics monitoring, patience.


🎯 Conclusion

My underfitted GPT-2 taught me a valuable lesson: data quantity doesn't replace training quality. Next step: 6 epochs, tight monitoring, enriched InfiniGPT dataset.


❓ Q&A

Q: How many epochs minimum to avoid underfitting? A: 6-10 epochs for a 125k examples dataset. You watch the loss curve to adjust.

Q: My model spams characters, is it necessarily underfitting?
A: Yes, very likely. Also check data quality and tokenization.

Q: How to differentiate underfitting from bad data? A: Test on a clean mini-dataset. If it works, it's the data. If it fails, it's underfitting.


πŸ€“ Did you know?

Underfitting was the main problem of early neural networks in the 1960s! Researchers thought their models were "too dumb" to learn, when in reality they simply didn't have enough computing power to do more than 1-2 epochs. It took GPUs and the 2010s to discover that these architectures were viable with sufficient training! πŸš€


ThΓ©o CHARLET IT Systems & Networks Student - AI/ML Specialization

Creator of AG-BPE (Attention-Guided Tokenization)

πŸ”— LinkedIn: https://www.linkedin.com/in/thΓ©o-charlet

πŸš€ Seeking internship opportunities

Community

Sign up or log in to comment