jorgemarcc's picture
Update README.md
f113199 verified

A newer version of the Gradio SDK is available: 5.38.2

Upgrade
metadata
title: Code Similarity Visualization with GraphCodeBERT
emoji: 🧠
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 5.38.0
app_file: app.py
pinned: false
license: mit
short_description: Augmenting the Interpretability of GraphCodeBERT

Code Similarity Visualization with GraphCodeBERT

This interactive application visualizes token-level embeddings generated by GraphCodeBERT for classical sorting algorithms. It supports pairwise comparison of algorithms based on their representation in the model’s embedding space, using PCA for dimensionality reduction.

βœ’οΈ Reference

Martinez-Gil, J. (2025).
Augmenting the Interpretability of GraphCodeBERT for Code Similarity Tasks.
International Journal of Software Engineering and Knowledge Engineering, 35(05), 657–678.

πŸš€ Features

  • Select two classical sorting algorithms.
  • Automatic tokenization and embedding via GraphCodeBERT.
  • PCA-based projection into 2D space for visualization.
  • Clear matplotlib plots showing token-level distribution differences.

🧠 Technical Overview

  • Model: microsoft/graphcodebert-base
  • Embedding Layer: Last hidden state
  • Reduction: Principal Component Analysis (PCA)
  • Interface: Gradio
  • Languages: Python 3.10+

πŸ›  Dependencies

All required libraries are listed in requirements.txt:


transformers
torch
scikit-learn
numpy
matplotlib
gradio
Pillow

πŸ–₯️ Intended Use

  • Academic teaching and demonstration of code embeddings
  • Qualitative evaluation of pretrained models for source code
  • Supplementary visualization for software engineering publications

πŸ“¬ Contact

Jorge Martinez-Gil
Senior Research Scientist in Computer Science