syeedalireza's picture
Upload folder using huggingface_hub
5b30d83 verified

Python Docstring Generator

Generates docstrings for Python code snippets using a sequence-to-sequence model (e.g. T5 or CodeT5). Useful for code summarization and documentation.

Task

Given a Python function or code block (without a docstring), the model produces a short natural-language description suitable as a docstring.

Model

  • Uses Hugging Face Transformers with a small T5 or CodeT5 checkpoint (e.g. t5-small, or Salesforce/codet5-small for code).
  • Inference script loads the model and tokenizer and runs generation with configurable length and sampling.

Dataset

  • Training (optional): datasets like CodeXGlue code-to-text, or DocstringGeneration-style data from Hugging Face Datasets.
  • For inference only, no dataset is required; use pre-trained weights.

Usage

pip install -r requirements.txt
python inference.py --input "def add(a, b): return a + b"

For a quick demo in the browser, run the Gradio app:

python app.py

Example

Input:

def factorial(n):
    if n <= 1:
        return 1
    return n * factorial(n - 1)

Output (example): "Compute the factorial of n recursively."

Files

  • inference.py — loads T5 (or CodeT5), runs generation; can take a file path or inline code.
  • app.py — Gradio UI for pasting code and getting a docstring.

Limitations / future work

  • Quality depends on the base model and any fine-tuning; out-of-domain code may get generic descriptions.
  • Could be extended to multi-line docstrings or different styles (Google, NumPy, Sphinx).

Author

Alireza Aminzadeh