Model Card for AutoML Books Classification
This model card documents the AutoML Books Classification model trained with AutoGluon AutoML on a classmate’s dataset of fiction and nonfiction books.
The task is to predict whether a book is recommended to everyone based on tabular features.
Model Details
- Developed by: Bareethul Kader
- Framework: AutoGluon
- Repository: bareethul/AutoML-books-classification
- License: CC BY 4.0
Intended Use
Direct Use
- Educational demonstration of AutoML on a small tabular dataset.
- Comparison of multiple classical ML models through automated search.
- Understanding validation vs. test performance tradeoffs.
Out of Scope Use
- Not designed for production or book recommendation engines.
- Dataset too small to generalize beyond classroom context.
Dataset
- Source: https://huggingface.co/datasets/jennifee/HW1-tabular-dataset .
- Task: Classification (
RecommendToEveryone
= 0/1). - Size: 30 original samples + ~300 augmented rows.
- Features:
Pages
(integer)Thickness
(float)ReadStatus
(categorical: read/started/not read)Genre
(categorical: fiction/nonfiction)RecommendToEveryone
(binary target)
Training Setup
- AutoML framework: AutoGluon TabularPredictor
- Evaluation metric: Accuracy
- Budget: 300 seconds training time, small scale search
- Hardware: Google Colab (T4 GPU not required, CPU sufficient)
- Search Space:
- Tree based models: LightGBM, XGBoost, ExtraTrees, RandomForest
- Neural nets: Torch, FastAI (small MLPs)
- Bagging and ensembling across layers (L1, L2, L3)
Results
Mini Leaderboard (Top 3 Models)
Rank | Model | Test Accuracy | Validation Accuracy |
---|---|---|---|
1 | RandomForestEntr_BAG_L1 | 0.55 | 0.65 |
2 | LightGBM_r96_BAG_L2 | 0.53 | 0.72 |
3 | LightGBMLarge_BAG_L2 | 0.53 | 0.74 |
- Best model (AutoGluon selected):
RandomForestEntr_BAG_L1
- Test Accuracy: ~0.55
- Validation Accuracy (best across runs): up to 0.75 (LightGBM variants)
Note: The “best model” may vary depending on random splits and seeds.
While AutoGluon reported RandomForestEntr_BAG_L1
as best in this run, LightGBM models sometimes achieved higher validation accuracy but generalized less strongly.
Limitations, Biases, and Ethical Notes
- Small dataset size → models may overfit, performance metrics unstable.
- Augmented data → synthetic rows may not reflect true variability.
- Task scope → purely educational, not for real world recommendation.
AI Usage Disclosure
- ChatGPT (GPT-5) assisted in:
- Helping with coding and AutoGluon AutoML approach on the go
- Polishing the Colab notebook for clarity
- Refining this model card
Citation
BibTeX:
@model{bareethul_books_classification,
author = {Kader, Bareethul},
title = {AutoML Books Classification},
year = {2025},
framework = {AutoGluon},
repository = {https://huggingface.co/bareethul/AutoML-books-classification}
}
Model tree for bareethul/AutoML-books-classification
Base model
autogluon/tabpfn-mix-1.0-classifier