File size: 3,636 Bytes
a0b6189 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
---
license: mit
base_model:
- bigcode/starcoder2-3b
pipeline_tag: image-to-text
---
# Gesture-to-Code Adapter for StarCoder2-3B
## Model Description
This repository contains a **Gesture-to-Code Adapter** designed to work with the **StarCoder2-3B** language model. By injecting gesture embeddings into the StarCoder2-3B token space, the adapter enables real-time translation of recognized gestures into structured programming code. It leverages StarCoder2-3B’s powerful code generation capabilities, extending them to multimodal input.
### Key Features
- **Base Model**: [StarCoder2-3B](https://huggingface.co/), a 3-billion parameter LLM specialized in code.
- **Adapter**: A lightweight MLP-based projection layer that aligns gesture embeddings (from a CNN or other visual encoder) to StarCoder2-3B’s 3072-dim token embeddings.
- **Training Objective**: Mean-squared error (MSE) alignment of gesture–token pairs, plus optional contrastive alignment to refine embeddings.
- **Usage**: Real-time sign language to code snippet generation, focusing on accessibility for Deaf or hard-of-hearing programmers.
## Dataset
- **Name**: A custom gesture dataset containing images for typical code-related gestures (e.g., “for loop,” “if statement,” “function definition”).
- **Format**: Each gesture is an image or short video snippet, which is converted to a fixed-size CNN embedding. The embedding is labeled to match the intended code structure.
- **Scale**: The dataset includes around XX,000 samples, covering ~XX discrete gestural instructions.
## Training Process
1. **Gesture Encoder**: A CNN-based classifier extracts 256- or 512-dimensional embeddings from sign images.
2. **Adapter Learning**: We train a simple projection (fully connected + activation) to map these embeddings into StarCoder2-3B’s input space.
3. **Integration**: During code generation, the adapter’s output replaces a special token’s embedding (e.g., `<G>`). The code model then produces a relevant code snippet conditioned on the recognized gesture.
## Model Performance
- **Cosine Similarity** between the adapter’s outputs and the matched StarCoder2-3B tokens.
- **Accuracy/F1** on sign-to-code classification for recognized gestures.
- **Code Quality**: Preliminary tests show valid syntax ~XX% of the time, with advanced logic requiring additional prompt context or manual checks.
## Intended Use
1. **Accessibility**: Provide a new input modality for coding, especially beneficial for Deaf/hard-of-hearing individuals.
2. **Educational Tools**: Enable sign-based code demonstrations in academic settings or coding bootcamps.
3. **Research**: Investigate multimodal alignment between visual gestures and textual code embeddings.
## Limitations
- **Limited Gesture Set**: Only covers a subset of sign language gestures and code constructs. Expanding coverage requires additional labeled data.
- **Hardware Requirements**: Real-time inference typically requires GPU acceleration for both CNN and StarCoder2-3B.
- **Complex Code**: While StarCoder2-3B is advanced, complicated multi-file or large project code generation might not be end-to-end feasible.
## How to Use
```python
from transformers import AutoModel
# 1. Load StarCoder2-3B
starcoder = AutoModel.from_pretrained("starcoder2-3b")
# 2. Load the adapter
# e.g., adapter = load_adapter("YourName/gesture2code_adapter")
# 3. Integration snippet
# For a recognized gesture -> CNN embedding -> adapter -> StarCoder2-3B token
# Replace special token <G> embedding with adapter output. |