jorgemarcc's picture
Create README.md
64478e1 verified
|
raw
history blame
1.76 kB
metadata
title: Code Similarity Visualization with GraphCodeBERT
emoji: ๐Ÿง 
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 4.30.0
app_file: app.py
pinned: false

Code Similarity Visualization with GraphCodeBERT

This interactive application visualizes token-level embeddings generated by GraphCodeBERT for classical sorting algorithms. It supports pairwise comparison of algorithms based on their representation in the modelโ€™s embedding space, using PCA for dimensionality reduction.

โœ’๏ธ Reference

Martinez-Gil, J. (2025).
Augmenting the Interpretability of GraphCodeBERT for Code Similarity Tasks.
International Journal of Software Engineering and Knowledge Engineering, 35(05), 657โ€“678.

๐Ÿš€ Features

  • Select two classical sorting algorithms.
  • Automatic tokenization and embedding via GraphCodeBERT.
  • PCA-based projection into 2D space for visualization.
  • Clear matplotlib plots showing token-level distribution differences.

๐Ÿง  Technical Overview

  • Model: microsoft/graphcodebert-base
  • Embedding Layer: Last hidden state
  • Reduction: Principal Component Analysis (PCA)
  • Interface: Gradio
  • Languages: Python 3.10+

๐Ÿ›  Dependencies

All required libraries are listed in requirements.txt:


transformers
torch
scikit-learn
numpy
matplotlib
gradio
Pillow

๐Ÿ–ฅ๏ธ Intended Use

  • Academic teaching and demonstration of code embeddings
  • Qualitative evaluation of pretrained models for source code
  • Supplementary visualization for software engineering publications

๐Ÿ“ฌ Contact

Jorge Martinez-Gil
Senior Research Scientist in Computer Science