bareethul's picture
Update README.md
839b8f0 verified
---
license: cc
datasets:
- jennifee/HW1-tabular-dataset
language:
- en
metrics:
- accuracy
base_model:
- autogluon/tabpfn-mix-1.0-classifier
pipeline_tag: tabular-classification
tags:
- automl
- classification
- books
- tabular
- autogluon
---
# Model Card for AutoML Books Classification
This model card documents the **AutoML Books Classification** model trained with **AutoGluon AutoML** on a classmate’s dataset of fiction and nonfiction books.
The task is to predict whether a book is **recommended to everyone** based on tabular features.
---
## Model Details
- **Developed by:** Bareethul Kader
- **Framework:** AutoGluon
- **Repository:** [bareethul/AutoML-books-classification](https://huggingface.co/bareethul/AutoML-books-classification)
- **License:** CC BY 4.0
---
## Intended Use
### Direct Use
- Educational demonstration of AutoML on a small tabular dataset.
- Comparison of multiple classical ML models through automated search.
- Understanding validation vs. test performance tradeoffs.
### Out of Scope Use
- Not designed for production or book recommendation engines.
- Dataset too small to generalize beyond classroom context.
---
## Dataset
- **Source:** https://huggingface.co/datasets/jennifee/HW1-tabular-dataset .
- **Task:** Classification (`RecommendToEveryone` = 0/1).
- **Size:** 30 original samples + ~300 augmented rows.
- **Features:**
- `Pages` (integer)
- `Thickness` (float)
- `ReadStatus` (categorical: read/started/not read)
- `Genre` (categorical: fiction/nonfiction)
- `RecommendToEveryone` (binary target)
---
## Training Setup
- **AutoML framework:** AutoGluon TabularPredictor
- **Evaluation metric:** Accuracy
- **Budget:** 300 seconds training time, small scale search
- **Hardware:** Google Colab (T4 GPU not required, CPU sufficient)
- **Search Space:**
- Tree based models: LightGBM, XGBoost, ExtraTrees, RandomForest
- Neural nets: Torch, FastAI (small MLPs)
- Bagging and ensembling across layers (L1, L2, L3)
---
## Results
### Mini Leaderboard (Top 3 Models)
| Rank | Model | Test Accuracy | Validation Accuracy |
|------|---------------------------|---------------|----------------------|
| 1 | RandomForestEntr_BAG_L1 | **0.55** | 0.65 |
| 2 | LightGBM_r96_BAG_L2 | 0.53 | 0.72 |
| 3 | LightGBMLarge_BAG_L2 | 0.53 | 0.74 |
- **Best model (AutoGluon selected):** `RandomForestEntr_BAG_L1`
- **Test Accuracy:** ~0.55
- **Validation Accuracy (best across runs):** up to 0.75 (LightGBM variants)
Note: The **“best model”** may vary depending on random splits and seeds.
While AutoGluon reported `RandomForestEntr_BAG_L1` as best in this run, LightGBM models sometimes achieved higher validation accuracy but generalized less strongly.
---
## Limitations, Biases, and Ethical Notes
- **Small dataset size** → models may overfit, performance metrics unstable.
- **Augmented data** → synthetic rows may not reflect true variability.
- **Task scope** → purely educational, not for real world recommendation.
---
## AI Usage Disclosure
- ChatGPT (GPT-5) assisted in:
- Helping with coding and AutoGluon AutoML approach on the go
- Polishing the Colab notebook for clarity
- Refining this model card
---
## Citation
**BibTeX:**
```bibtex
@model{bareethul_books_classification,
author = {Kader, Bareethul},
title = {AutoML Books Classification},
year = {2025},
framework = {AutoGluon},
repository = {https://huggingface.co/bareethul/AutoML-books-classification}
}