| # ๐ Research Draft |
|
|
| **AI-powered academic abstract generation โ 100 % local and private.** |
|
|
| Research Draft is a lightweight tool that generates high-quality research paper abstracts from uploaded PDFs. It runs entirely on your local machine using a small instruction-tuned language model served through [Ollama](https://ollama.com/), with a clean [Gradio](https://www.gradio.app/) web interface. |
|
|
| Built as a B.Tech / Data Science final-year project. |
|
|
| --- |
|
|
| ## โจ Features |
|
|
| | Feature | Student | Researcher | |
| |---|:---:|:---:| |
| | Upload PDF | โ
| โ
| |
| | Generate abstract | โ
| โ
| |
| | Copy abstract | โ
| โ
| |
| | View generation history | โ | โ
| |
| | Export latest result (.txt) | โ | โ
| |
| | Export full history (.txt) | โ | โ
| |
| | Clear history | โ | โ
| |
|
|
| --- |
|
|
| ## ๐๏ธ Architecture |
|
|
| ``` |
| โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโ |
| โ Gradio UI โโโโโโถโ abstract_ โโโโโโถโ pdf_utils.py โ โ Ollama โ |
| โ (app.py) โ โ service.py โ โ (extract/ โ โ Server โ |
| โ โโโโโโโ โโโโโโถโ clean PDF) โ โ (local) โ |
| โโโโโโโโโโโโโโโ โ โโโโโโถโ โ โ โ |
| โ โ โโโโโโโโโโโโโโโโโโ โ โ |
| โ โโโโโโถโโโโโโโโโโโโโโโโโโ โ โ |
| โ โ โ llm_client.py โโโโโโถโ /api/chat โ |
| โ โโโโโโโ (Ollama API) โโโโโโโ โ |
| โ โ โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโ |
| โ โโโโโโถโโโโโโโโโโโโโโโโโโ |
| โ โ โ history_ โ |
| โโโโโโโโโโโโโโโโ โ manager.py โ |
| โ (JSON store) โ |
| โโโโโโโโโโโโโโโโโโ |
| ``` |
|
|
| --- |
|
|
| ## ๐ Project Structure |
|
|
| ``` |
| research-draft/ |
| โโโ app.py # Gradio Blocks UI โ entry point |
| โโโ pdf_utils.py # PDF text extraction and cleaning |
| โโโ llm_client.py # Ollama API client |
| โโโ history_manager.py # JSON-based history persistence |
| โโโ abstract_service.py # Orchestration (PDF โ LLM โ history) |
| โโโ requirements.txt # Python dependencies |
| โโโ sample_modelfile.txt # Ollama Modelfile template |
| โโโ data/ |
| โ โโโ history.json # Persistent generation history |
| โโโ README.md # This file |
| ``` |
|
|
| --- |
|
|
| ## ๐ Setup Instructions |
|
|
| ### Prerequisites |
|
|
| - **Python 3.10+** |
| - **Ollama** installed and running โ [Install Ollama](https://ollama.com/download) |
| - A GGUF model file (e.g., LFM2.5-1.2B-Instruct, Qwen2.5-1.5B-Instruct, or Phi-3-mini) |
|
|
| ### Step 1 โ Clone or download the project |
|
|
| ```bash |
| git clone https://huggingface.co/Arunvarma2565/research-draft |
| cd research-draft |
| ``` |
|
|
| ### Step 2 โ Install Python dependencies |
|
|
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| Or install manually: |
|
|
| ```bash |
| pip install gradio PyMuPDF requests |
| ``` |
|
|
| ### Step 3 โ Set up the Ollama model |
|
|
| 1. **Download a GGUF model** (e.g., from Hugging Face). Place the `.gguf` file in the project directory or note its path. |
|
|
| 2. **Edit `sample_modelfile.txt`** โ update the `FROM` line to point at your `.gguf` file: |
| ``` |
| FROM /path/to/your/model.gguf |
| ``` |
| |
| 3. **Create the model in Ollama:** |
| ```bash |
| ollama create researchdraft -f sample_modelfile.txt |
| ``` |
| |
| 4. **Verify it works:** |
| ```bash |
| ollama list # should show "researchdraft" |
| ollama run researchdraft "Hello" # quick sanity check |
| ``` |
|
|
| ### Step 4 โ Start the Ollama server |
|
|
| If Ollama is not already running: |
|
|
| ```bash |
| ollama serve |
| ``` |
|
|
| Leave this terminal open. |
|
|
| ### Step 5 โ Launch Research Draft |
|
|
| In a **new terminal**: |
|
|
| ```bash |
| cd research-draft |
| python app.py |
| ``` |
|
|
| Open your browser at **http://localhost:7860**. |
|
|
| --- |
|
|
| ## ๐ How to Use |
|
|
| 1. **Select your role** โ *Student* or *Researcher* โ from the dropdown. |
| 2. **Upload a PDF** of a research paper. |
| 3. Click **๐ Generate Abstract**. |
| 4. The generated abstract appears on the right. Use the copy button to grab it. |
| 5. *(Researcher only)* Use the tools below to view history, export results, or clear history. |
|
|
| --- |
|
|
| ## โ๏ธ Configuration |
|
|
| | Setting | Location | Default | |
| |---|---|---| |
| | Ollama URL | `llm_client.py` โ `OLLAMA_BASE_URL` | `http://localhost:11434` | |
| | Model name | `llm_client.py` โ `MODEL_NAME` | `researchdraft` | |
| | Temperature | `llm_client.py` โ `generate_abstract()` | `0.3` | |
| | Max text chars | `pdf_utils.py` โ `MAX_TEXT_CHARS` | `12 000` | |
| | History file | `history_manager.py` โ `HISTORY_FILE` | `data/history.json` | |
| | Server port | `app.py` โ `demo.launch()` | `7860` | |
|
|
| --- |
|
|
| ## ๐งฉ Tech Stack |
|
|
| | Component | Library / Tool | |
| |---|---| |
| | UI | Gradio (Blocks API) | |
| | PDF parsing | PyMuPDF (fitz) | |
| | LLM runtime | Ollama (local) | |
| | HTTP client | requests | |
| | History storage | JSON file | |
| | Language | Python 3.10+ | |
|
|
| --- |
|
|
| ## ๐ Sample Models That Work Well |
|
|
| | Model | Size | Notes | |
| |---|---|---| |
| | LFM2.5-1.2B-Instruct | ~1.2 B | Lightweight, good for CPU | |
| | Qwen2.5-1.5B-Instruct | ~1.5 B | Strong instruction following | |
| | Phi-3-mini-4k-instruct | ~3.8 B | Higher quality, needs more RAM | |
| | Llama-3.2-3B-Instruct | ~3.2 B | Good balance of speed and quality | |
|
|
| All models should be in **GGUF** format (Q4_K_M or Q5_K_M quantisation recommended). |
|
|
| --- |
|
|
| ## ๐ฎ Future Improvements |
|
|
| - **Multi-PDF batch processing** โ upload several papers and generate abstracts in bulk. |
| - **Abstract comparison** โ compare generated vs. original abstract side-by-side. |
| - **Keyword extraction** โ automatically extract key terms from the paper. |
| - **Citation-aware chunking** โ smarter text splitting that preserves section boundaries. |
| - **SQLite backend** โ replace JSON history with SQLite for better querying. |
| - **User authentication** โ simple login to separate Student/Researcher sessions. |
| - **PDF preview** โ render the first page of the uploaded PDF in the UI. |
| - **Streaming output** โ show the abstract being generated token by token. |
| - **Fine-tuned model** โ fine-tune a small model on abstract-generation pairs for better quality. |
| - **Evaluation metrics** โ add ROUGE / BERTScore comparison against original abstracts. |
|
|
| --- |
|
|
| ## ๐ License |
|
|
| This project is for educational purposes (B.Tech final-year project). Use it freely for learning and research. |
|
|
| --- |
|
|
| ## ๐ Acknowledgements |
|
|
| - [Ollama](https://ollama.com/) โ local LLM serving |
| - [Gradio](https://www.gradio.app/) โ web UI framework |
| - [PyMuPDF](https://pymupdf.readthedocs.io/) โ PDF text extraction |
| - [Hugging Face](https://huggingface.co/) โ model hub and community |
|
|