Auto-README — Java Summarizer

NLP · CODE SUMMARIZATION

Four ways to summarize Java source code

Upload a .java file and watch corpus-fitted extractive baselines, a semantic embedding model, and a fine-tuned transformer each generate a java code summary, compared live side by side.

Try it now See how it works

4models

liveinference

01 — Architecture

How a file becomes four summaries

Extractive models summarize the whole file from split statements. CodeT5 runs once per Java method, the same setup used in the CodeXGLUE evaluation.

{ }

Java upload

A single .java file

⛓

Preprocess

Split statements; CodeT5 splits by method

TF-IDF

Term scoring

LexRank

Graph centrality

Sentence-T

Embeddings

CodeT5

Generation

▦

Summary

Four summaries

①

Preprocessing

Split on ; { } and newlines
Merge tiny fragments (< 3 tokens)
CamelCase / snake_case identifier splitting
Java keyword + English stopword filtering

②

Corpus fitting

TF-IDF & LexRank IDF from CodeXGLUE Java train + validation
Weights cached to cache/idf_weights_train_val.pkl
Neural models use frozen pretrained checkpoints
One-time load, then served from memory

③

Output

Extractive models return top-N statements from the whole file
CodeT5 generates one English sentence per method (evaluation setup)
Per-model latency tracked for each run
Results compared in a single view

02 — Models

The four summarizers

Each model represents a different tier of prior knowledge. Click a card to expand its step-by-step algorithm, strengths, and limitations.

{% for m in models %}

How it works

{{ loop.index }}{{ step }}

Strengths

{{ s }}

Limitations

{{ l }}

Input: {{ m.input }}

{% endfor %}

03 — Try it

Summarize your Java file

Upload a .java file. Extractive models use the whole file; CodeT5 summarizes each method separately.

Code Summarization