Python Docstring Generator
Generates docstrings for Python code snippets using a sequence-to-sequence model (e.g. T5 or CodeT5). Useful for code summarization and documentation.
Task
Given a Python function or code block (without a docstring), the model produces a short natural-language description suitable as a docstring.
Model
- Uses Hugging Face Transformers with a small T5 or CodeT5 checkpoint (e.g.
t5-small, orSalesforce/codet5-smallfor code). - Inference script loads the model and tokenizer and runs generation with configurable length and sampling.
Dataset
- Training (optional): datasets like CodeXGlue code-to-text, or DocstringGeneration-style data from Hugging Face Datasets.
- For inference only, no dataset is required; use pre-trained weights.
Usage
pip install -r requirements.txt
python inference.py --input "def add(a, b): return a + b"
For a quick demo in the browser, run the Gradio app:
python app.py
Example
Input:
def factorial(n):
if n <= 1:
return 1
return n * factorial(n - 1)
Output (example): "Compute the factorial of n recursively."
Files
inference.py— loads T5 (or CodeT5), runs generation; can take a file path or inline code.app.py— Gradio UI for pasting code and getting a docstring.
Limitations / future work
- Quality depends on the base model and any fine-tuning; out-of-domain code may get generic descriptions.
- Could be extended to multi-line docstrings or different styles (Google, NumPy, Sphinx).
Author
Alireza Aminzadeh
- Email: alireza.aminzadeh@hotmail.com
- Hugging Face: syeedalireza
- LinkedIn: alirezaaminzadeh