File size: 1,834 Bytes
64478e1
 
 
 
 
 
702766a
64478e1
 
f113199
 
64478e1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
702766a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
title: Code Similarity Visualization with GraphCodeBERT
emoji: 🧠
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 5.38.0
app_file: app.py
pinned: false
license: mit
short_description: Augmenting the Interpretability of GraphCodeBERT
---

# Code Similarity Visualization with GraphCodeBERT

This interactive application visualizes token-level embeddings generated by [GraphCodeBERT](https://huggingface.co/microsoft/graphcodebert-base) for classical sorting algorithms. It supports pairwise comparison of algorithms based on their representation in the model’s embedding space, using PCA for dimensionality reduction.

## ✒️ Reference

Martinez-Gil, J. (2025).  
**Augmenting the Interpretability of GraphCodeBERT for Code Similarity Tasks**.  
*International Journal of Software Engineering and Knowledge Engineering*, 35(05), 657–678.

## 🚀 Features

- Select two classical sorting algorithms.
- Automatic tokenization and embedding via GraphCodeBERT.
- PCA-based projection into 2D space for visualization.
- Clear matplotlib plots showing token-level distribution differences.

## 🧠 Technical Overview

- **Model**: [`microsoft/graphcodebert-base`](https://huggingface.co/microsoft/graphcodebert-base)
- **Embedding Layer**: Last hidden state
- **Reduction**: Principal Component Analysis (PCA)
- **Interface**: Gradio
- **Languages**: Python 3.10+

## 🛠 Dependencies

All required libraries are listed in `requirements.txt`:

```

transformers
torch
scikit-learn
numpy
matplotlib
gradio
Pillow

```

## 🖥️ Intended Use

- Academic teaching and demonstration of code embeddings
- Qualitative evaluation of pretrained models for source code
- Supplementary visualization for software engineering publications

## 📬 Contact

**Jorge Martinez-Gil**  
Senior Research Scientist in Computer Science