|
--- |
|
license: cc |
|
datasets: |
|
- jennifee/HW1-tabular-dataset |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
base_model: |
|
- autogluon/tabpfn-mix-1.0-classifier |
|
pipeline_tag: tabular-classification |
|
tags: |
|
- automl |
|
- classification |
|
- books |
|
- tabular |
|
- autogluon |
|
--- |
|
|
|
# Model Card for AutoML Books Classification |
|
|
|
This model card documents the **AutoML Books Classification** model trained with **AutoGluon AutoML** on a classmate’s dataset of fiction and nonfiction books. |
|
The task is to predict whether a book is **recommended to everyone** based on tabular features. |
|
|
|
--- |
|
|
|
## Model Details |
|
|
|
- **Developed by:** Bareethul Kader |
|
- **Framework:** AutoGluon |
|
- **Repository:** [bareethul/AutoML-books-classification](https://huggingface.co/bareethul/AutoML-books-classification) |
|
- **License:** CC BY 4.0 |
|
|
|
--- |
|
|
|
## Intended Use |
|
|
|
### Direct Use |
|
- Educational demonstration of AutoML on a small tabular dataset. |
|
- Comparison of multiple classical ML models through automated search. |
|
- Understanding validation vs. test performance tradeoffs. |
|
|
|
### Out of Scope Use |
|
- Not designed for production or book recommendation engines. |
|
- Dataset too small to generalize beyond classroom context. |
|
|
|
--- |
|
|
|
## Dataset |
|
|
|
- **Source:** https://huggingface.co/datasets/jennifee/HW1-tabular-dataset . |
|
- **Task:** Classification (`RecommendToEveryone` = 0/1). |
|
- **Size:** 30 original samples + ~300 augmented rows. |
|
- **Features:** |
|
- `Pages` (integer) |
|
- `Thickness` (float) |
|
- `ReadStatus` (categorical: read/started/not read) |
|
- `Genre` (categorical: fiction/nonfiction) |
|
- `RecommendToEveryone` (binary target) |
|
|
|
--- |
|
|
|
## Training Setup |
|
|
|
- **AutoML framework:** AutoGluon TabularPredictor |
|
- **Evaluation metric:** Accuracy |
|
- **Budget:** 300 seconds training time, small scale search |
|
- **Hardware:** Google Colab (T4 GPU not required, CPU sufficient) |
|
- **Search Space:** |
|
- Tree based models: LightGBM, XGBoost, ExtraTrees, RandomForest |
|
- Neural nets: Torch, FastAI (small MLPs) |
|
- Bagging and ensembling across layers (L1, L2, L3) |
|
|
|
--- |
|
|
|
## Results |
|
|
|
### Mini Leaderboard (Top 3 Models) |
|
|
|
| Rank | Model | Test Accuracy | Validation Accuracy | |
|
|------|---------------------------|---------------|----------------------| |
|
| 1 | RandomForestEntr_BAG_L1 | **0.55** | 0.65 | |
|
| 2 | LightGBM_r96_BAG_L2 | 0.53 | 0.72 | |
|
| 3 | LightGBMLarge_BAG_L2 | 0.53 | 0.74 | |
|
|
|
- **Best model (AutoGluon selected):** `RandomForestEntr_BAG_L1` |
|
- **Test Accuracy:** ~0.55 |
|
- **Validation Accuracy (best across runs):** up to 0.75 (LightGBM variants) |
|
|
|
Note: The **“best model”** may vary depending on random splits and seeds. |
|
While AutoGluon reported `RandomForestEntr_BAG_L1` as best in this run, LightGBM models sometimes achieved higher validation accuracy but generalized less strongly. |
|
|
|
--- |
|
|
|
## Limitations, Biases, and Ethical Notes |
|
|
|
- **Small dataset size** → models may overfit, performance metrics unstable. |
|
- **Augmented data** → synthetic rows may not reflect true variability. |
|
- **Task scope** → purely educational, not for real world recommendation. |
|
|
|
--- |
|
|
|
## AI Usage Disclosure |
|
|
|
- ChatGPT (GPT-5) assisted in: |
|
- Helping with coding and AutoGluon AutoML approach on the go |
|
- Polishing the Colab notebook for clarity |
|
- Refining this model card |
|
|
|
--- |
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
```bibtex |
|
@model{bareethul_books_classification, |
|
author = {Kader, Bareethul}, |
|
title = {AutoML Books Classification}, |
|
year = {2025}, |
|
framework = {AutoGluon}, |
|
repository = {https://huggingface.co/bareethul/AutoML-books-classification} |
|
} |
|
|