# 📄 Research Draft

**AI-powered academic abstract generation — 100 % local and private.**

Research Draft is a lightweight tool that generates high-quality research paper abstracts from uploaded PDFs. It runs entirely on your local machine using a small instruction-tuned language model served through [Ollama](https://ollama.com/), with a clean [Gradio](https://www.gradio.app/) web interface.

Built as a B.Tech / Data Science final-year project.

---

## ✨ Features

| Feature | Student | Researcher |
|---|:---:|:---:|
| Upload PDF | ✅ | ✅ |
| Generate abstract | ✅ | ✅ |
| Copy abstract | ✅ | ✅ |
| View generation history | — | ✅ |
| Export latest result (.txt) | — | ✅ |
| Export full history (.txt) | — | ✅ |
| Clear history | — | ✅ |

---

## 🏗️ Architecture

```
┌─────────────┐     ┌──────────────┐     ┌────────────────┐     ┌───────────┐
│  Gradio UI  │────▶│ abstract_    │────▶│  pdf_utils.py  │     │  Ollama   │
│  (app.py)   │     │ service.py   │     │  (extract/     │     │  Server   │
│             │◀────│              │────▶│   clean PDF)   │     │ (local)   │
└─────────────┘     │              │────▶│                │     │           │
                    │              │     └────────────────┘     │           │
                    │              │────▶┌────────────────┐     │           │
                    │              │     │  llm_client.py │────▶│ /api/chat │
                    │              │◀────│  (Ollama API)  │◀────│           │
                    │              │     └────────────────┘     └───────────┘
                    │              │────▶┌────────────────┐
                    │              │     │  history_       │
                    └──────────────┘     │  manager.py    │
                                        │  (JSON store)  │
                                        └────────────────┘
```

---

## 📂 Project Structure

```
research-draft/
├── app.py                  # Gradio Blocks UI — entry point
├── pdf_utils.py            # PDF text extraction and cleaning
├── llm_client.py           # Ollama API client
├── history_manager.py      # JSON-based history persistence
├── abstract_service.py     # Orchestration (PDF → LLM → history)
├── requirements.txt        # Python dependencies
├── sample_modelfile.txt    # Ollama Modelfile template
├── data/
│   └── history.json        # Persistent generation history
└── README.md               # This file
```

---

## 🚀 Setup Instructions

### Prerequisites

- **Python 3.10+**
- **Ollama** installed and running — [Install Ollama](https://ollama.com/download)
- A GGUF model file (e.g., LFM2.5-1.2B-Instruct, Qwen2.5-1.5B-Instruct, or Phi-3-mini)

### Step 1 — Clone or download the project

```bash
git clone https://huggingface.co/Arunvarma2565/research-draft
cd research-draft
```

### Step 2 — Install Python dependencies

```bash
pip install -r requirements.txt
```

Or install manually:

```bash
pip install gradio PyMuPDF requests
```

### Step 3 — Set up the Ollama model

1. **Download a GGUF model** (e.g., from Hugging Face). Place the `.gguf` file in the project directory or note its path.

2. **Edit `sample_modelfile.txt`** — update the `FROM` line to point at your `.gguf` file:
   ```
   FROM /path/to/your/model.gguf
   ```

3. **Create the model in Ollama:**
   ```bash
   ollama create researchdraft -f sample_modelfile.txt
   ```

4. **Verify it works:**
   ```bash
   ollama list                         # should show "researchdraft"
   ollama run researchdraft "Hello"    # quick sanity check
   ```

### Step 4 — Start the Ollama server

If Ollama is not already running:

```bash
ollama serve
```

Leave this terminal open.

### Step 5 — Launch Research Draft

In a **new terminal**:

```bash
cd research-draft
python app.py
```

Open your browser at **http://localhost:7860**.

---

## 🎓 How to Use

1. **Select your role** — *Student* or *Researcher* — from the dropdown.
2. **Upload a PDF** of a research paper.
3. Click **🔍 Generate Abstract**.
4. The generated abstract appears on the right. Use the copy button to grab it.
5. *(Researcher only)* Use the tools below to view history, export results, or clear history.

---

## ⚙️ Configuration

| Setting | Location | Default |
|---|---|---|
| Ollama URL | `llm_client.py` → `OLLAMA_BASE_URL` | `http://localhost:11434` |
| Model name | `llm_client.py` → `MODEL_NAME` | `researchdraft` |
| Temperature | `llm_client.py` → `generate_abstract()` | `0.3` |
| Max text chars | `pdf_utils.py` → `MAX_TEXT_CHARS` | `12 000` |
| History file | `history_manager.py` → `HISTORY_FILE` | `data/history.json` |
| Server port | `app.py` → `demo.launch()` | `7860` |

---

## 🧩 Tech Stack

| Component | Library / Tool |
|---|---|
| UI | Gradio (Blocks API) |
| PDF parsing | PyMuPDF (fitz) |
| LLM runtime | Ollama (local) |
| HTTP client | requests |
| History storage | JSON file |
| Language | Python 3.10+ |

---

## 📝 Sample Models That Work Well

| Model | Size | Notes |
|---|---|---|
| LFM2.5-1.2B-Instruct | ~1.2 B | Lightweight, good for CPU |
| Qwen2.5-1.5B-Instruct | ~1.5 B | Strong instruction following |
| Phi-3-mini-4k-instruct | ~3.8 B | Higher quality, needs more RAM |
| Llama-3.2-3B-Instruct | ~3.2 B | Good balance of speed and quality |

All models should be in **GGUF** format (Q4_K_M or Q5_K_M quantisation recommended).

---

## 🔮 Future Improvements

- **Multi-PDF batch processing** — upload several papers and generate abstracts in bulk.
- **Abstract comparison** — compare generated vs. original abstract side-by-side.
- **Keyword extraction** — automatically extract key terms from the paper.
- **Citation-aware chunking** — smarter text splitting that preserves section boundaries.
- **SQLite backend** — replace JSON history with SQLite for better querying.
- **User authentication** — simple login to separate Student/Researcher sessions.
- **PDF preview** — render the first page of the uploaded PDF in the UI.
- **Streaming output** — show the abstract being generated token by token.
- **Fine-tuned model** — fine-tune a small model on abstract-generation pairs for better quality.
- **Evaluation metrics** — add ROUGE / BERTScore comparison against original abstracts.

---

## 📄 License

This project is for educational purposes (B.Tech final-year project). Use it freely for learning and research.

---

## 🙏 Acknowledgements

- [Ollama](https://ollama.com/) — local LLM serving
- [Gradio](https://www.gradio.app/) — web UI framework
- [PyMuPDF](https://pymupdf.readthedocs.io/) — PDF text extraction
- [Hugging Face](https://huggingface.co/) — model hub and community