File size: 3,692 Bytes

---
license: cc
datasets:
- jennifee/HW1-tabular-dataset
language:
- en
metrics:
- accuracy
base_model:
- autogluon/tabpfn-mix-1.0-classifier
pipeline_tag: tabular-classification
tags:
- automl
- classification
- books
- tabular
- autogluon
---

# Model Card for AutoML Books Classification

This model card documents the **AutoML Books Classification** model trained with **AutoGluon AutoML** on a classmate’s dataset of fiction and nonfiction books.  
The task is to predict whether a book is **recommended to everyone** based on tabular features.  

---

## Model Details

- **Developed by:** Bareethul Kader
- **Framework:** AutoGluon
- **Repository:** [bareethul/AutoML-books-classification](https://huggingface.co/bareethul/AutoML-books-classification)  
- **License:** CC BY 4.0  

---

## Intended Use

### Direct Use
- Educational demonstration of AutoML on a small tabular dataset.  
- Comparison of multiple classical ML models through automated search.  
- Understanding validation vs. test performance tradeoffs.  

### Out of Scope Use
- Not designed for production or book recommendation engines.  
- Dataset too small to generalize beyond classroom context.  

---

## Dataset

- **Source:** https://huggingface.co/datasets/jennifee/HW1-tabular-dataset .  
- **Task:** Classification (`RecommendToEveryone` = 0/1).  
- **Size:** 30 original samples + ~300 augmented rows.  
- **Features:**  
  - `Pages` (integer)  
  - `Thickness` (float)  
  - `ReadStatus` (categorical: read/started/not read)  
  - `Genre` (categorical: fiction/nonfiction)  
  - `RecommendToEveryone` (binary target)  

---

## Training Setup

- **AutoML framework:** AutoGluon TabularPredictor  
- **Evaluation metric:** Accuracy  
- **Budget:** 300 seconds training time, small scale search  
- **Hardware:** Google Colab (T4 GPU not required, CPU sufficient)  
- **Search Space:**  
  - Tree based models: LightGBM, XGBoost, ExtraTrees, RandomForest  
  - Neural nets: Torch, FastAI (small MLPs)  
  - Bagging and ensembling across layers (L1, L2, L3)  

---

## Results

### Mini Leaderboard (Top 3 Models)

| Rank | Model                     | Test Accuracy | Validation Accuracy |
|------|---------------------------|---------------|----------------------|
| 1    | RandomForestEntr_BAG_L1   | **0.55**      | 0.65               |
| 2    | LightGBM_r96_BAG_L2       | 0.53          | 0.72               |
| 3    | LightGBMLarge_BAG_L2      | 0.53          | 0.74               |

- **Best model (AutoGluon selected):** `RandomForestEntr_BAG_L1`  
- **Test Accuracy:** ~0.55  
- **Validation Accuracy (best across runs):** up to 0.75 (LightGBM variants)  

Note: The **“best model”** may vary depending on random splits and seeds.  
While AutoGluon reported `RandomForestEntr_BAG_L1` as best in this run, LightGBM models sometimes achieved higher validation accuracy but generalized less strongly.  

---

## Limitations, Biases, and Ethical Notes

- **Small dataset size** → models may overfit, performance metrics unstable.  
- **Augmented data** → synthetic rows may not reflect true variability.  
- **Task scope** → purely educational, not for real world recommendation.  

---

## AI Usage Disclosure

- ChatGPT (GPT-5) assisted in:  
  - Helping with coding and AutoGluon AutoML approach on the go  
  - Polishing the Colab notebook for clarity 
  - Refining this model card  

---

## Citation

**BibTeX:**  
```bibtex
@model{bareethul_books_classification,
  author       = {Kader, Bareethul},
  title        = {AutoML Books Classification},
  year         = {2025},
  framework    = {AutoGluon},
  repository   = {https://huggingface.co/bareethul/AutoML-books-classification}
}