research-draft / README.md
Arunvarma2565's picture
Add README.md with full documentation
f7c9709 verified
# ๐Ÿ“„ Research Draft
**AI-powered academic abstract generation โ€” 100 % local and private.**
Research Draft is a lightweight tool that generates high-quality research paper abstracts from uploaded PDFs. It runs entirely on your local machine using a small instruction-tuned language model served through [Ollama](https://ollama.com/), with a clean [Gradio](https://www.gradio.app/) web interface.
Built as a B.Tech / Data Science final-year project.
---
## โœจ Features
| Feature | Student | Researcher |
|---|:---:|:---:|
| Upload PDF | โœ… | โœ… |
| Generate abstract | โœ… | โœ… |
| Copy abstract | โœ… | โœ… |
| View generation history | โ€” | โœ… |
| Export latest result (.txt) | โ€” | โœ… |
| Export full history (.txt) | โ€” | โœ… |
| Clear history | โ€” | โœ… |
---
## ๐Ÿ—๏ธ Architecture
```
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Gradio UI โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚ abstract_ โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚ pdf_utils.py โ”‚ โ”‚ Ollama โ”‚
โ”‚ (app.py) โ”‚ โ”‚ service.py โ”‚ โ”‚ (extract/ โ”‚ โ”‚ Server โ”‚
โ”‚ โ”‚โ—€โ”€โ”€โ”€โ”€โ”‚ โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚ clean PDF) โ”‚ โ”‚ (local) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚ โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚
โ”‚ โ”‚โ”€โ”€โ”€โ”€โ–ถโ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚
โ”‚ โ”‚ โ”‚ llm_client.py โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚ /api/chat โ”‚
โ”‚ โ”‚โ—€โ”€โ”€โ”€โ”€โ”‚ (Ollama API) โ”‚โ—€โ”€โ”€โ”€โ”€โ”‚ โ”‚
โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚ โ”‚โ”€โ”€โ”€โ”€โ–ถโ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ โ”‚ โ”‚ history_ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ manager.py โ”‚
โ”‚ (JSON store) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
```
---
## ๐Ÿ“‚ Project Structure
```
research-draft/
โ”œโ”€โ”€ app.py # Gradio Blocks UI โ€” entry point
โ”œโ”€โ”€ pdf_utils.py # PDF text extraction and cleaning
โ”œโ”€โ”€ llm_client.py # Ollama API client
โ”œโ”€โ”€ history_manager.py # JSON-based history persistence
โ”œโ”€โ”€ abstract_service.py # Orchestration (PDF โ†’ LLM โ†’ history)
โ”œโ”€โ”€ requirements.txt # Python dependencies
โ”œโ”€โ”€ sample_modelfile.txt # Ollama Modelfile template
โ”œโ”€โ”€ data/
โ”‚ โ””โ”€โ”€ history.json # Persistent generation history
โ””โ”€โ”€ README.md # This file
```
---
## ๐Ÿš€ Setup Instructions
### Prerequisites
- **Python 3.10+**
- **Ollama** installed and running โ€” [Install Ollama](https://ollama.com/download)
- A GGUF model file (e.g., LFM2.5-1.2B-Instruct, Qwen2.5-1.5B-Instruct, or Phi-3-mini)
### Step 1 โ€” Clone or download the project
```bash
git clone https://huggingface.co/Arunvarma2565/research-draft
cd research-draft
```
### Step 2 โ€” Install Python dependencies
```bash
pip install -r requirements.txt
```
Or install manually:
```bash
pip install gradio PyMuPDF requests
```
### Step 3 โ€” Set up the Ollama model
1. **Download a GGUF model** (e.g., from Hugging Face). Place the `.gguf` file in the project directory or note its path.
2. **Edit `sample_modelfile.txt`** โ€” update the `FROM` line to point at your `.gguf` file:
```
FROM /path/to/your/model.gguf
```
3. **Create the model in Ollama:**
```bash
ollama create researchdraft -f sample_modelfile.txt
```
4. **Verify it works:**
```bash
ollama list # should show "researchdraft"
ollama run researchdraft "Hello" # quick sanity check
```
### Step 4 โ€” Start the Ollama server
If Ollama is not already running:
```bash
ollama serve
```
Leave this terminal open.
### Step 5 โ€” Launch Research Draft
In a **new terminal**:
```bash
cd research-draft
python app.py
```
Open your browser at **http://localhost:7860**.
---
## ๐ŸŽ“ How to Use
1. **Select your role** โ€” *Student* or *Researcher* โ€” from the dropdown.
2. **Upload a PDF** of a research paper.
3. Click **๐Ÿ” Generate Abstract**.
4. The generated abstract appears on the right. Use the copy button to grab it.
5. *(Researcher only)* Use the tools below to view history, export results, or clear history.
---
## โš™๏ธ Configuration
| Setting | Location | Default |
|---|---|---|
| Ollama URL | `llm_client.py` โ†’ `OLLAMA_BASE_URL` | `http://localhost:11434` |
| Model name | `llm_client.py` โ†’ `MODEL_NAME` | `researchdraft` |
| Temperature | `llm_client.py` โ†’ `generate_abstract()` | `0.3` |
| Max text chars | `pdf_utils.py` โ†’ `MAX_TEXT_CHARS` | `12 000` |
| History file | `history_manager.py` โ†’ `HISTORY_FILE` | `data/history.json` |
| Server port | `app.py` โ†’ `demo.launch()` | `7860` |
---
## ๐Ÿงฉ Tech Stack
| Component | Library / Tool |
|---|---|
| UI | Gradio (Blocks API) |
| PDF parsing | PyMuPDF (fitz) |
| LLM runtime | Ollama (local) |
| HTTP client | requests |
| History storage | JSON file |
| Language | Python 3.10+ |
---
## ๐Ÿ“ Sample Models That Work Well
| Model | Size | Notes |
|---|---|---|
| LFM2.5-1.2B-Instruct | ~1.2 B | Lightweight, good for CPU |
| Qwen2.5-1.5B-Instruct | ~1.5 B | Strong instruction following |
| Phi-3-mini-4k-instruct | ~3.8 B | Higher quality, needs more RAM |
| Llama-3.2-3B-Instruct | ~3.2 B | Good balance of speed and quality |
All models should be in **GGUF** format (Q4_K_M or Q5_K_M quantisation recommended).
---
## ๐Ÿ”ฎ Future Improvements
- **Multi-PDF batch processing** โ€” upload several papers and generate abstracts in bulk.
- **Abstract comparison** โ€” compare generated vs. original abstract side-by-side.
- **Keyword extraction** โ€” automatically extract key terms from the paper.
- **Citation-aware chunking** โ€” smarter text splitting that preserves section boundaries.
- **SQLite backend** โ€” replace JSON history with SQLite for better querying.
- **User authentication** โ€” simple login to separate Student/Researcher sessions.
- **PDF preview** โ€” render the first page of the uploaded PDF in the UI.
- **Streaming output** โ€” show the abstract being generated token by token.
- **Fine-tuned model** โ€” fine-tune a small model on abstract-generation pairs for better quality.
- **Evaluation metrics** โ€” add ROUGE / BERTScore comparison against original abstracts.
---
## ๐Ÿ“„ License
This project is for educational purposes (B.Tech final-year project). Use it freely for learning and research.
---
## ๐Ÿ™ Acknowledgements
- [Ollama](https://ollama.com/) โ€” local LLM serving
- [Gradio](https://www.gradio.app/) โ€” web UI framework
- [PyMuPDF](https://pymupdf.readthedocs.io/) โ€” PDF text extraction
- [Hugging Face](https://huggingface.co/) โ€” model hub and community