File size: 1,897 Bytes
5b30d83 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | # Python Docstring Generator
Generates docstrings for Python code snippets using a sequence-to-sequence model (e.g. T5 or CodeT5). Useful for code summarization and documentation.
## Task
Given a Python function or code block (without a docstring), the model produces a short natural-language description suitable as a docstring.
## Model
- Uses **Hugging Face Transformers** with a small T5 or CodeT5 checkpoint (e.g. `t5-small`, or `Salesforce/codet5-small` for code).
- Inference script loads the model and tokenizer and runs generation with configurable length and sampling.
## Dataset
- Training (optional): datasets like **CodeXGlue** code-to-text, or **DocstringGeneration**-style data from Hugging Face Datasets.
- For inference only, no dataset is required; use pre-trained weights.
## Usage
```bash
pip install -r requirements.txt
python inference.py --input "def add(a, b): return a + b"
```
For a quick demo in the browser, run the Gradio app:
```bash
python app.py
```
## Example
Input:
```python
def factorial(n):
if n <= 1:
return 1
return n * factorial(n - 1)
```
Output (example): `"Compute the factorial of n recursively."`
## Files
- `inference.py` — loads T5 (or CodeT5), runs generation; can take a file path or inline code.
- `app.py` — Gradio UI for pasting code and getting a docstring.
## Limitations / future work
- Quality depends on the base model and any fine-tuning; out-of-domain code may get generic descriptions.
- Could be extended to multi-line docstrings or different styles (Google, NumPy, Sphinx).
## Author
**Alireza Aminzadeh**
- Email: [alireza.aminzadeh@hotmail.com](mailto:alireza.aminzadeh@hotmail.com)
- Hugging Face: [syeedalireza](https://huggingface.co/syeedalireza)
- LinkedIn: [alirezaaminzadeh](https://www.linkedin.com/in/alirezaaminzadeh)
|