Model Card for AutoML Books Classification

This model card documents the AutoML Books Classification model trained with AutoGluon AutoML on a classmate’s dataset of fiction and nonfiction books.
The task is to predict whether a book is recommended to everyone based on tabular features.


Model Details


Intended Use

Direct Use

  • Educational demonstration of AutoML on a small tabular dataset.
  • Comparison of multiple classical ML models through automated search.
  • Understanding validation vs. test performance tradeoffs.

Out of Scope Use

  • Not designed for production or book recommendation engines.
  • Dataset too small to generalize beyond classroom context.

Dataset

  • Source: https://huggingface.co/datasets/jennifee/HW1-tabular-dataset .
  • Task: Classification (RecommendToEveryone = 0/1).
  • Size: 30 original samples + ~300 augmented rows.
  • Features:
    • Pages (integer)
    • Thickness (float)
    • ReadStatus (categorical: read/started/not read)
    • Genre (categorical: fiction/nonfiction)
    • RecommendToEveryone (binary target)

Training Setup

  • AutoML framework: AutoGluon TabularPredictor
  • Evaluation metric: Accuracy
  • Budget: 300 seconds training time, small scale search
  • Hardware: Google Colab (T4 GPU not required, CPU sufficient)
  • Search Space:
    • Tree based models: LightGBM, XGBoost, ExtraTrees, RandomForest
    • Neural nets: Torch, FastAI (small MLPs)
    • Bagging and ensembling across layers (L1, L2, L3)

Results

Mini Leaderboard (Top 3 Models)

Rank Model Test Accuracy Validation Accuracy
1 RandomForestEntr_BAG_L1 0.55 0.65
2 LightGBM_r96_BAG_L2 0.53 0.72
3 LightGBMLarge_BAG_L2 0.53 0.74
  • Best model (AutoGluon selected): RandomForestEntr_BAG_L1
  • Test Accuracy: ~0.55
  • Validation Accuracy (best across runs): up to 0.75 (LightGBM variants)

Note: The “best model” may vary depending on random splits and seeds.
While AutoGluon reported RandomForestEntr_BAG_L1 as best in this run, LightGBM models sometimes achieved higher validation accuracy but generalized less strongly.


Limitations, Biases, and Ethical Notes

  • Small dataset size → models may overfit, performance metrics unstable.
  • Augmented data → synthetic rows may not reflect true variability.
  • Task scope → purely educational, not for real world recommendation.

AI Usage Disclosure

  • ChatGPT (GPT-5) assisted in:
    • Helping with coding and AutoGluon AutoML approach on the go
    • Polishing the Colab notebook for clarity
    • Refining this model card

Citation

BibTeX:

@model{bareethul_books_classification,
  author       = {Kader, Bareethul},
  title        = {AutoML Books Classification},
  year         = {2025},
  framework    = {AutoGluon},
  repository   = {https://huggingface.co/bareethul/AutoML-books-classification}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bareethul/AutoML-books-classification

Finetuned
(1)
this model

Dataset used to train bareethul/AutoML-books-classification