Model Card for Model ID

Vilnius University Deep Neural Networks course project.

Model Details

A transformer-based query classification model.

Model Description

This model was developed as part of a Deep Neural Networks (DNN) course project at Vilnius University. It fine-tunes the Salesforce/codet5-base model for classifying student queries related to C programming into five categories: General Question, Question from Code, Help Fix Code, Help Write Code, and Explain Code.

  • Developed by: Brigita Bruškytė, Artiom Hovhannisyan, Eglė Orinaitė
    Faculty of Mathematics and Informatics, Vilnius University

Dataset

  • Size: 6,776 student queries from a real C programming course.
  • Structure: JSON entries with user_id, time, feature type, feature version, input question, input code, input intention, input task description.
  • Note: Dataset does not include AI responses — only the student queries.

Challenges

  • Class imbalance: e.g., “General Question” is much more frequent.
  • Field-based hints: Some classes have unique fields (like input task description), inadvertently helping classification.
  • Token length: Some queries, especially with code snippets, can be very long, hitting transformer limits.
  • Structural inconsistency: Dataset descriptions sometimes did not match actual data.

Per-Category F1 Scores

Category Codet-classy
Explain Code 0.90
General Question 0.97
Help Fix Code 0.85
Help Write Code 0.63
Question from Code 0.89
Downloads last month
5
Safetensors
Model size
223M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for geralto/codet-classy

Finetuned
(78)
this model

Dataset used to train geralto/codet-classy