Model Card for AutoML Books Classification

This model card documents the AutoML Books Classification model trained with AutoGluon AutoML on a classmate’s dataset of fiction and nonfiction books.
The task is to predict whether a book is recommended to everyone based on tabular features.

Model Details

Developed by: Bareethul Kader
Framework: AutoGluon
Repository: bareethul/AutoML-books-classification
License: CC BY 4.0

Intended Use

Direct Use

Educational demonstration of AutoML on a small tabular dataset.
Comparison of multiple classical ML models through automated search.
Understanding validation vs. test performance tradeoffs.

Out of Scope Use

Not designed for production or book recommendation engines.
Dataset too small to generalize beyond classroom context.

Dataset

Source: https://huggingface.co/datasets/jennifee/HW1-tabular-dataset .
Task: Classification (RecommendToEveryone = 0/1).
Size: 30 original samples + ~300 augmented rows.
Features:
- Pages (integer)
- Thickness (float)
- ReadStatus (categorical: read/started/not read)
- Genre (categorical: fiction/nonfiction)
- RecommendToEveryone (binary target)

Training Setup

AutoML framework: AutoGluon TabularPredictor
Evaluation metric: Accuracy
Budget: 300 seconds training time, small scale search
Hardware: Google Colab (T4 GPU not required, CPU sufficient)
Search Space:
- Tree based models: LightGBM, XGBoost, ExtraTrees, RandomForest
- Neural nets: Torch, FastAI (small MLPs)
- Bagging and ensembling across layers (L1, L2, L3)

Results

Mini Leaderboard (Top 3 Models)

Rank	Model	Test Accuracy	Validation Accuracy
1	RandomForestEntr_BAG_L1	0.55	0.65
2	LightGBM_r96_BAG_L2	0.53	0.72
3	LightGBMLarge_BAG_L2	0.53	0.74

Best model (AutoGluon selected): RandomForestEntr_BAG_L1
Test Accuracy: ~0.55
Validation Accuracy (best across runs): up to 0.75 (LightGBM variants)

Note: The “best model” may vary depending on random splits and seeds.
While AutoGluon reported RandomForestEntr_BAG_L1 as best in this run, LightGBM models sometimes achieved higher validation accuracy but generalized less strongly.

Limitations, Biases, and Ethical Notes

Small dataset size → models may overfit, performance metrics unstable.
Augmented data → synthetic rows may not reflect true variability.
Task scope → purely educational, not for real world recommendation.

AI Usage Disclosure

ChatGPT (GPT-5) assisted in:
- Helping with coding and AutoGluon AutoML approach on the go
- Polishing the Colab notebook for clarity
- Refining this model card

Citation

BibTeX:

@model{bareethul_books_classification,
  author       = {Kader, Bareethul},
  title        = {AutoML Books Classification},
  year         = {2025},
  framework    = {AutoGluon},
  repository   = {https://huggingface.co/bareethul/AutoML-books-classification}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for bareethul/AutoML-books-classification

Base model

autogluon/tabpfn-mix-1.0-classifier

Finetuned

(1)

this model

bareethul
/

AutoML-books-classification