A newer version of the Gradio SDK is available:
5.38.2
metadata
title: Code Similarity Visualization with GraphCodeBERT
emoji: π§
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 5.38.0
app_file: app.py
pinned: false
license: mit
short_description: Augmenting the Interpretability of GraphCodeBERT
Code Similarity Visualization with GraphCodeBERT
This interactive application visualizes token-level embeddings generated by GraphCodeBERT for classical sorting algorithms. It supports pairwise comparison of algorithms based on their representation in the modelβs embedding space, using PCA for dimensionality reduction.
βοΈ Reference
Martinez-Gil, J. (2025).
Augmenting the Interpretability of GraphCodeBERT for Code Similarity Tasks.
International Journal of Software Engineering and Knowledge Engineering, 35(05), 657β678.
π Features
- Select two classical sorting algorithms.
- Automatic tokenization and embedding via GraphCodeBERT.
- PCA-based projection into 2D space for visualization.
- Clear matplotlib plots showing token-level distribution differences.
π§ Technical Overview
- Model:
microsoft/graphcodebert-base
- Embedding Layer: Last hidden state
- Reduction: Principal Component Analysis (PCA)
- Interface: Gradio
- Languages: Python 3.10+
π Dependencies
All required libraries are listed in requirements.txt
:
transformers
torch
scikit-learn
numpy
matplotlib
gradio
Pillow
π₯οΈ Intended Use
- Academic teaching and demonstration of code embeddings
- Qualitative evaluation of pretrained models for source code
- Supplementary visualization for software engineering publications
π¬ Contact
Jorge Martinez-Gil
Senior Research Scientist in Computer Science