File size: 3,692 Bytes
5e3db0b 839b8f0 5e3db0b 91aafe9 5e3db0b 91aafe9 5e3db0b 839b8f0 5e3db0b 839b8f0 5e3db0b 839b8f0 5e3db0b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
---
license: cc
datasets:
- jennifee/HW1-tabular-dataset
language:
- en
metrics:
- accuracy
base_model:
- autogluon/tabpfn-mix-1.0-classifier
pipeline_tag: tabular-classification
tags:
- automl
- classification
- books
- tabular
- autogluon
---
# Model Card for AutoML Books Classification
This model card documents the **AutoML Books Classification** model trained with **AutoGluon AutoML** on a classmate’s dataset of fiction and nonfiction books.
The task is to predict whether a book is **recommended to everyone** based on tabular features.
---
## Model Details
- **Developed by:** Bareethul Kader
- **Framework:** AutoGluon
- **Repository:** [bareethul/AutoML-books-classification](https://huggingface.co/bareethul/AutoML-books-classification)
- **License:** CC BY 4.0
---
## Intended Use
### Direct Use
- Educational demonstration of AutoML on a small tabular dataset.
- Comparison of multiple classical ML models through automated search.
- Understanding validation vs. test performance tradeoffs.
### Out of Scope Use
- Not designed for production or book recommendation engines.
- Dataset too small to generalize beyond classroom context.
---
## Dataset
- **Source:** https://huggingface.co/datasets/jennifee/HW1-tabular-dataset .
- **Task:** Classification (`RecommendToEveryone` = 0/1).
- **Size:** 30 original samples + ~300 augmented rows.
- **Features:**
- `Pages` (integer)
- `Thickness` (float)
- `ReadStatus` (categorical: read/started/not read)
- `Genre` (categorical: fiction/nonfiction)
- `RecommendToEveryone` (binary target)
---
## Training Setup
- **AutoML framework:** AutoGluon TabularPredictor
- **Evaluation metric:** Accuracy
- **Budget:** 300 seconds training time, small scale search
- **Hardware:** Google Colab (T4 GPU not required, CPU sufficient)
- **Search Space:**
- Tree based models: LightGBM, XGBoost, ExtraTrees, RandomForest
- Neural nets: Torch, FastAI (small MLPs)
- Bagging and ensembling across layers (L1, L2, L3)
---
## Results
### Mini Leaderboard (Top 3 Models)
| Rank | Model | Test Accuracy | Validation Accuracy |
|------|---------------------------|---------------|----------------------|
| 1 | RandomForestEntr_BAG_L1 | **0.55** | 0.65 |
| 2 | LightGBM_r96_BAG_L2 | 0.53 | 0.72 |
| 3 | LightGBMLarge_BAG_L2 | 0.53 | 0.74 |
- **Best model (AutoGluon selected):** `RandomForestEntr_BAG_L1`
- **Test Accuracy:** ~0.55
- **Validation Accuracy (best across runs):** up to 0.75 (LightGBM variants)
Note: The **“best model”** may vary depending on random splits and seeds.
While AutoGluon reported `RandomForestEntr_BAG_L1` as best in this run, LightGBM models sometimes achieved higher validation accuracy but generalized less strongly.
---
## Limitations, Biases, and Ethical Notes
- **Small dataset size** → models may overfit, performance metrics unstable.
- **Augmented data** → synthetic rows may not reflect true variability.
- **Task scope** → purely educational, not for real world recommendation.
---
## AI Usage Disclosure
- ChatGPT (GPT-5) assisted in:
- Helping with coding and AutoGluon AutoML approach on the go
- Polishing the Colab notebook for clarity
- Refining this model card
---
## Citation
**BibTeX:**
```bibtex
@model{bareethul_books_classification,
author = {Kader, Bareethul},
title = {AutoML Books Classification},
year = {2025},
framework = {AutoGluon},
repository = {https://huggingface.co/bareethul/AutoML-books-classification}
}
|