File size: 1,834 Bytes
64478e1 702766a 64478e1 f113199 64478e1 702766a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
---
title: Code Similarity Visualization with GraphCodeBERT
emoji: 🧠
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 5.38.0
app_file: app.py
pinned: false
license: mit
short_description: Augmenting the Interpretability of GraphCodeBERT
---
# Code Similarity Visualization with GraphCodeBERT
This interactive application visualizes token-level embeddings generated by [GraphCodeBERT](https://huggingface.co/microsoft/graphcodebert-base) for classical sorting algorithms. It supports pairwise comparison of algorithms based on their representation in the model’s embedding space, using PCA for dimensionality reduction.
## ✒️ Reference
Martinez-Gil, J. (2025).
**Augmenting the Interpretability of GraphCodeBERT for Code Similarity Tasks**.
*International Journal of Software Engineering and Knowledge Engineering*, 35(05), 657–678.
## 🚀 Features
- Select two classical sorting algorithms.
- Automatic tokenization and embedding via GraphCodeBERT.
- PCA-based projection into 2D space for visualization.
- Clear matplotlib plots showing token-level distribution differences.
## 🧠 Technical Overview
- **Model**: [`microsoft/graphcodebert-base`](https://huggingface.co/microsoft/graphcodebert-base)
- **Embedding Layer**: Last hidden state
- **Reduction**: Principal Component Analysis (PCA)
- **Interface**: Gradio
- **Languages**: Python 3.10+
## 🛠 Dependencies
All required libraries are listed in `requirements.txt`:
```
transformers
torch
scikit-learn
numpy
matplotlib
gradio
Pillow
```
## 🖥️ Intended Use
- Academic teaching and demonstration of code embeddings
- Qualitative evaluation of pretrained models for source code
- Supplementary visualization for software engineering publications
## 📬 Contact
**Jorge Martinez-Gil**
Senior Research Scientist in Computer Science |