geralto
/

codet-classy

Text Classification

text-generation-inference

Model card Files Files and versions

codet-classy / README.md

geralto's picture

Update README.md

2db6db3 verified 3 months ago

|

history blame contribute delete

1.96 kB

	---
	library_name: transformers
	datasets:
	- majeedkazemi/students-coding-questions-from-ai-assistant
	language:
	- en
	base_model:
	- Salesforce/codet5-base
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->
	Vilnius University Deep Neural Networks course project.


	## Model Details
	A transformer-based query classification model.


	### Model Description
	This model was developed as part of a Deep Neural Networks (DNN) course project at Vilnius University.
	It fine-tunes the `Salesforce/codet5-base` model for classifying student queries related to C programming into five categories: General Question, Question from Code, Help Fix Code, Help Write Code, and Explain Code.


	<!-- Provide a longer summary of what this model is. -->


	- Developed by: Brigita Bruškytė, Artiom Hovhannisyan, Eglė Orinaitė
	Faculty of Mathematics and Informatics, Vilnius University

	## Dataset
	- Size: 6,776 student queries from a real C programming course.
	- Structure: JSON entries with `user_id`, `time`, `feature type`, `feature version`, `input question`, `input code`, `input intention`, `input task description`.
	- Note: Dataset does not include AI responses — only the student queries.

	## Challenges
	- Class imbalance: e.g., “General Question” is much more frequent.
	- Field-based hints: Some classes have unique fields (like `input task description`), inadvertently helping classification.
	- Token length: Some queries, especially with code snippets, can be very long, hitting transformer limits.
	- Structural inconsistency: Dataset descriptions sometimes did not match actual data.


	### Per-Category F1 Scores

	\| Category \| Codet-classy \|
	\|----------------------\|------------\|
	\| Explain Code \| 0.90 \|
	\| General Question \| 0.97 \|
	\| Help Fix Code \| 0.85 \|
	\| Help Write Code \| 0.63 \|
	\| Question from Code \| 0.89 \|