Model Card for Model ID
Vilnius University Deep Neural Networks course project.
Model Details
A transformer-based query classification model.
Model Description
This model was developed as part of a Deep Neural Networks (DNN) course project at Vilnius University.
It fine-tunes the Salesforce/codet5-base
model for classifying student queries related to C programming into five categories: General Question, Question from Code, Help Fix Code, Help Write Code, and Explain Code.
- Developed by: Brigita Bruškytė, Artiom Hovhannisyan, Eglė Orinaitė
Faculty of Mathematics and Informatics, Vilnius University
Dataset
- Size: 6,776 student queries from a real C programming course.
- Structure: JSON entries with
user_id
,time
,feature type
,feature version
,input question
,input code
,input intention
,input task description
. - Note: Dataset does not include AI responses — only the student queries.
Challenges
- Class imbalance: e.g., “General Question” is much more frequent.
- Field-based hints: Some classes have unique fields (like
input task description
), inadvertently helping classification. - Token length: Some queries, especially with code snippets, can be very long, hitting transformer limits.
- Structural inconsistency: Dataset descriptions sometimes did not match actual data.
Per-Category F1 Scores
Category | Codet-classy |
---|---|
Explain Code | 0.90 |
General Question | 0.97 |
Help Fix Code | 0.85 |
Help Write Code | 0.63 |
Question from Code | 0.89 |
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for geralto/codet-classy
Base model
Salesforce/codet5-base