Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

README.md +59 -0
app.py +29 -0
inference.py +65 -0
requirements.txt +4 -0

README.md ADDED Viewed

	@@ -0,0 +1,59 @@

+# Python Docstring Generator
+Generates docstrings for Python code snippets using a sequence-to-sequence model (e.g. T5 or CodeT5). Useful for code summarization and documentation.
+## Task
+Given a Python function or code block (without a docstring), the model produces a short natural-language description suitable as a docstring.
+## Model
+- Uses **Hugging Face Transformers** with a small T5 or CodeT5 checkpoint (e.g. `t5-small`, or `Salesforce/codet5-small` for code).
+- Inference script loads the model and tokenizer and runs generation with configurable length and sampling.
+## Dataset
+- Training (optional): datasets like **CodeXGlue** code-to-text, or **DocstringGeneration**-style data from Hugging Face Datasets.
+- For inference only, no dataset is required; use pre-trained weights.
+## Usage
+```bash
+pip install -r requirements.txt
+python inference.py --input "def add(a, b): return a + b"
+```
+For a quick demo in the browser, run the Gradio app:
+```bash
+python app.py
+```
+## Example
+Input:
+```python
+def factorial(n):
+    if n <= 1:
+        return 1
+    return n * factorial(n - 1)
+```
+Output (example): `"Compute the factorial of n recursively."`
+## Files
+- `inference.py` — loads T5 (or CodeT5), runs generation; can take a file path or inline code.
+- `app.py` — Gradio UI for pasting code and getting a docstring.
+## Limitations / future work
+- Quality depends on the base model and any fine-tuning; out-of-domain code may get generic descriptions.
+- Could be extended to multi-line docstrings or different styles (Google, NumPy, Sphinx).
+## Author
+**Alireza Aminzadeh**
+- Email: [alireza.aminzadeh@hotmail.com](mailto:alireza.aminzadeh@hotmail.com)
+- Hugging Face: [syeedalireza](https://huggingface.co/syeedalireza)
+- LinkedIn: [alirezaaminzadeh](https://www.linkedin.com/in/alirezaaminzadeh)

app.py ADDED Viewed

	@@ -0,0 +1,29 @@

+"""
+Minimal Gradio app for docstring generation.
+Run: python app.py
+"""
+import gradio as gr
+from inference import generate_docstring
+def summarize_code(code: str) -> str:
+    if not code or not code.strip():
+        return "Paste a Python code snippet above."
+    return generate_docstring(code, model_name="t5-small", max_length=128, num_beams=4)
+demo = gr.Interface(
+    fn=summarize_code,
+    inputs=gr.Textbox(
+        label="Python code",
+        placeholder="def add(a, b):\n    return a + b",
+        lines=8,
+    ),
+    outputs=gr.Textbox(label="Generated docstring"),
+    title="Python Docstring Generator",
+    description="Paste a Python function or snippet to get a short docstring summary.",
+)
+if __name__ == "__main__":
+    demo.launch()

inference.py ADDED Viewed

	@@ -0,0 +1,65 @@

+"""
+Inference script for docstring generation from Python code.
+Uses Hugging Face Transformers (T5 or CodeT5).
+"""
+import argparse
+from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
+import torch
+def generate_docstring(
+    code: str,
+    model_name: str = "t5-small",
+    max_length: int = 128,
+    num_beams: int = 4,
+    device: str = None,
+) -> str:
+    if device is None:
+        device = "cuda" if torch.cuda.is_available() else "cpu"
+    tokenizer = AutoTokenizer.from_pretrained(model_name)
+    model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device)
+    # T5 expects a prefix for the task; we use "summarize:" for generic text/code summary
+    input_text = "summarize: " + code
+    inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=512).to(device)
+    with torch.no_grad():
+        out = model.generate(
+            **inputs,
+            max_length=max_length,
+            num_beams=num_beams,
+            early_stopping=True,
+        )
+    return tokenizer.decode(out[0], skip_special_tokens=True)
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--input", type=str, required=True, help="Python code snippet (or path to file)")
+    parser.add_argument("--model_name", type=str, default="t5-small")
+    parser.add_argument("--max_length", type=int, default=128)
+    parser.add_argument("--num_beams", type=int, default=4)
+    args = parser.parse_args()
+    code = args.input
+    if len(code) < 260 and code.endswith(".py"):
+        try:
+            with open(code, "r") as f:
+                code = f.read()
+        except Exception:
+            pass
+    docstring = generate_docstring(
+        code,
+        model_name=args.model_name,
+        max_length=args.max_length,
+        num_beams=args.num_beams,
+    )
+    print(docstring)
+if __name__ == "__main__":
+    main()

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+torch>=2.0.0
+transformers>=4.30.0
+datasets>=2.12.0
+gradio>=4.0.0