Preprocessing
- Split on
;{}and newlines - Merge tiny fragments (< 3 tokens)
- CamelCase / snake_case identifier splitting
- Java keyword + English stopword filtering
Java code summarization comparison
Upload a .java file and watch corpus-fitted extractive baselines,
a semantic embedding model, and a fine-tuned transformer each generate a
java code summary, compared live side by side.
Extractive models summarize the whole file from split statements. CodeT5 runs once per Java method, the same setup used in the CodeXGLUE evaluation.
A single .java file
Split statements; CodeT5 splits by method
Term scoring
Graph centrality
Embeddings
Generation
Four summaries
; { } and newlinescache/idf_weights_train_val.pklEach model represents a different tier of prior knowledge. Click a card to expand its step-by-step algorithm, strengths, and limitations.
{{ m.tagline }}
{{ m.description }}
Upload a .java file. Extractive models use the whole file; CodeT5 summarizes each method separately.