Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.1.0
title: Dinosaur Project
emoji: 👀
colorFrom: green
colorTo: pink
sdk: gradio
sdk_version: 5.44.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: A simple dinosaur species classifier, fine-tuned on ConvNext
Jurassic Park Dinosaur Species Classifier
Introduction
This Huggingface Space contains source code for my project of building a classification model that can distinguish dinosaur species that appears in the Jurassic Park franchise. The model is then used to create a simple web app (using Gradio). With my code, you can:
- Upload an image of a dinosaur and get prediction on its species (top-5 predictions will be shown, along with their probabilities).
- Train the model again with different configuration (see
src/train.pyscript). - Use this as a baseline model and develop your own model.
About the data
The original dataset can be found on kaggle https://www.kaggle.com/datasets/antaresl/jurassic-park-dinosaurs-dataset.
However, this dataset is crawled from web, so it's very diverse in style, size, quality, ... and of course, there is a lot of noise in the data
(some species which looks very close to each other are labeled incorrectly, some images are children's drawing or fossil photograph, ...)
So I wrote the src/filter_and_split.py script to:
- Filter out children's drawing or fossil photograph (most of them) - which I believe, negatively impact the ability of the model to learn features the most.
- Split the dataset into train/validation/test data with ratio of 0.8/0.1/0/1 respectively
I use a CLIP model to quickly filter out invalid images, and I create a custom dataset class, which can be found in src/dataset.py to utilize batch processing and speed up the filtering process.
But it is my fault that I did not set a seed for random in the script, which reduce the reproducibility of the script.
Training
After trying on different models like ResNet, EfficientNet, ..., I find out that ConvNext Tiny yeilds the best result, and also the most stable model. The model was trained on GPU P100 of Kaggle.
The src/train.py script is used to train the model. Detailed process of exploring data, set up for training, examining train process and evaluate model on test data
can be found on dinosaur_species_classification.ipynb notebook.
Trained model
You can find my final model for this project in model directory.
Final results
The model achived an accuracy score of 0.75 and a weighted F1 score of roughly 0.76, which to me is not too shabby, considering the quality of data and limited resources that we have.
You can see the confusion matrix and a table of top-10 misclassified pairs at the end of the dinosaur_species_classification.ipynb notebook to have a closer look on the model's performance.
Suggestions for improvement
There are several ideas that I did not try, but you could in order to improve this model:
- Neural Style Transfer: Our data consists of a variety of styles (2D hand-drawing, 2D digital drawing, 3D models, ...), so our model might be prone to learning images's style, not biological features. In order to address this, you can use a Neural Style Transfer model to generate stylized version of original dataset, and train the model on this new dataset to see if there is any improvement.
- Bilinear Pooling: This is a very interesting method, since it uses two CNN backbones to generate two feature maps, then calculate the outer product of them to create a new feature map that contain information on how original features interact. This has proved to be efficient in fine-grained classification task like our project, where some classes have very small difference. However, this method also comes with a cost of computation, so you should do some research on Compact Bilinear Pooling - a method in which we estimate the outer product of feature maps instead of calculating it.
- Synthetic Data Generation: Our data is relatively small, only ~70 images per class, so you should consider using Gen-AI models to generate new samples. However, do note that you will need to generate samples in a way that preserve features of each class. After having a larger dataset, you can consider unfreezing more layers of ConvNext Tiny to fine-tune, or using Vision Transformers models.
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference