# 📄 Research Draft **AI-powered academic abstract generation — 100 % local and private.** Research Draft is a lightweight tool that generates high-quality research paper abstracts from uploaded PDFs. It runs entirely on your local machine using a small instruction-tuned language model served through [Ollama](https://ollama.com/), with a clean [Gradio](https://www.gradio.app/) web interface. Built as a B.Tech / Data Science final-year project. --- ## ✨ Features | Feature | Student | Researcher | |---|:---:|:---:| | Upload PDF | ✅ | ✅ | | Generate abstract | ✅ | ✅ | | Copy abstract | ✅ | ✅ | | View generation history | — | ✅ | | Export latest result (.txt) | — | ✅ | | Export full history (.txt) | — | ✅ | | Clear history | — | ✅ | --- ## 🏗️ Architecture ``` ┌─────────────┐ ┌──────────────┐ ┌────────────────┐ ┌───────────┐ │ Gradio UI │────▶│ abstract_ │────▶│ pdf_utils.py │ │ Ollama │ │ (app.py) │ │ service.py │ │ (extract/ │ │ Server │ │ │◀────│ │────▶│ clean PDF) │ │ (local) │ └─────────────┘ │ │────▶│ │ │ │ │ │ └────────────────┘ │ │ │ │────▶┌────────────────┐ │ │ │ │ │ llm_client.py │────▶│ /api/chat │ │ │◀────│ (Ollama API) │◀────│ │ │ │ └────────────────┘ └───────────┘ │ │────▶┌────────────────┐ │ │ │ history_ │ └──────────────┘ │ manager.py │ │ (JSON store) │ └────────────────┘ ``` --- ## 📂 Project Structure ``` research-draft/ ├── app.py # Gradio Blocks UI — entry point ├── pdf_utils.py # PDF text extraction and cleaning ├── llm_client.py # Ollama API client ├── history_manager.py # JSON-based history persistence ├── abstract_service.py # Orchestration (PDF → LLM → history) ├── requirements.txt # Python dependencies ├── sample_modelfile.txt # Ollama Modelfile template ├── data/ │ └── history.json # Persistent generation history └── README.md # This file ``` --- ## 🚀 Setup Instructions ### Prerequisites - **Python 3.10+** - **Ollama** installed and running — [Install Ollama](https://ollama.com/download) - A GGUF model file (e.g., LFM2.5-1.2B-Instruct, Qwen2.5-1.5B-Instruct, or Phi-3-mini) ### Step 1 — Clone or download the project ```bash git clone https://huggingface.co/Arunvarma2565/research-draft cd research-draft ``` ### Step 2 — Install Python dependencies ```bash pip install -r requirements.txt ``` Or install manually: ```bash pip install gradio PyMuPDF requests ``` ### Step 3 — Set up the Ollama model 1. **Download a GGUF model** (e.g., from Hugging Face). Place the `.gguf` file in the project directory or note its path. 2. **Edit `sample_modelfile.txt`** — update the `FROM` line to point at your `.gguf` file: ``` FROM /path/to/your/model.gguf ``` 3. **Create the model in Ollama:** ```bash ollama create researchdraft -f sample_modelfile.txt ``` 4. **Verify it works:** ```bash ollama list # should show "researchdraft" ollama run researchdraft "Hello" # quick sanity check ``` ### Step 4 — Start the Ollama server If Ollama is not already running: ```bash ollama serve ``` Leave this terminal open. ### Step 5 — Launch Research Draft In a **new terminal**: ```bash cd research-draft python app.py ``` Open your browser at **http://localhost:7860**. --- ## 🎓 How to Use 1. **Select your role** — *Student* or *Researcher* — from the dropdown. 2. **Upload a PDF** of a research paper. 3. Click **🔍 Generate Abstract**. 4. The generated abstract appears on the right. Use the copy button to grab it. 5. *(Researcher only)* Use the tools below to view history, export results, or clear history. --- ## ⚙️ Configuration | Setting | Location | Default | |---|---|---| | Ollama URL | `llm_client.py` → `OLLAMA_BASE_URL` | `http://localhost:11434` | | Model name | `llm_client.py` → `MODEL_NAME` | `researchdraft` | | Temperature | `llm_client.py` → `generate_abstract()` | `0.3` | | Max text chars | `pdf_utils.py` → `MAX_TEXT_CHARS` | `12 000` | | History file | `history_manager.py` → `HISTORY_FILE` | `data/history.json` | | Server port | `app.py` → `demo.launch()` | `7860` | --- ## 🧩 Tech Stack | Component | Library / Tool | |---|---| | UI | Gradio (Blocks API) | | PDF parsing | PyMuPDF (fitz) | | LLM runtime | Ollama (local) | | HTTP client | requests | | History storage | JSON file | | Language | Python 3.10+ | --- ## 📝 Sample Models That Work Well | Model | Size | Notes | |---|---|---| | LFM2.5-1.2B-Instruct | ~1.2 B | Lightweight, good for CPU | | Qwen2.5-1.5B-Instruct | ~1.5 B | Strong instruction following | | Phi-3-mini-4k-instruct | ~3.8 B | Higher quality, needs more RAM | | Llama-3.2-3B-Instruct | ~3.2 B | Good balance of speed and quality | All models should be in **GGUF** format (Q4_K_M or Q5_K_M quantisation recommended). --- ## 🔮 Future Improvements - **Multi-PDF batch processing** — upload several papers and generate abstracts in bulk. - **Abstract comparison** — compare generated vs. original abstract side-by-side. - **Keyword extraction** — automatically extract key terms from the paper. - **Citation-aware chunking** — smarter text splitting that preserves section boundaries. - **SQLite backend** — replace JSON history with SQLite for better querying. - **User authentication** — simple login to separate Student/Researcher sessions. - **PDF preview** — render the first page of the uploaded PDF in the UI. - **Streaming output** — show the abstract being generated token by token. - **Fine-tuned model** — fine-tune a small model on abstract-generation pairs for better quality. - **Evaluation metrics** — add ROUGE / BERTScore comparison against original abstracts. --- ## 📄 License This project is for educational purposes (B.Tech final-year project). Use it freely for learning and research. --- ## 🙏 Acknowledgements - [Ollama](https://ollama.com/) — local LLM serving - [Gradio](https://www.gradio.app/) — web UI framework - [PyMuPDF](https://pymupdf.readthedocs.io/) — PDF text extraction - [Hugging Face](https://huggingface.co/) — model hub and community