Add README.md with full documentation

f7c9709 verified about 1 month ago

7.41 kB

	# 📄 Research Draft

	AI-powered academic abstract generation — 100 % local and private.

	Research Draft is a lightweight tool that generates high-quality research paper abstracts from uploaded PDFs. It runs entirely on your local machine using a small instruction-tuned language model served through [Ollama](https://ollama.com/), with a clean [Gradio](https://www.gradio.app/) web interface.

	Built as a B.Tech / Data Science final-year project.

	---

	## ✨ Features

	\| Feature \| Student \| Researcher \|
	\|---\|:---:\|:---:\|
	\| Upload PDF \| ✅ \| ✅ \|
	\| Generate abstract \| ✅ \| ✅ \|
	\| Copy abstract \| ✅ \| ✅ \|
	\| View generation history \| — \| ✅ \|
	\| Export latest result (.txt) \| — \| ✅ \|
	\| Export full history (.txt) \| — \| ✅ \|
	\| Clear history \| — \| ✅ \|

	---

	## 🏗️ Architecture

	```
	┌─────────────┐ ┌──────────────┐ ┌────────────────┐ ┌───────────┐
	│ Gradio UI │────▶│ abstract_ │────▶│ pdf_utils.py │ │ Ollama │
	│ (app.py) │ │ service.py │ │ (extract/ │ │ Server │
	│ │◀────│ │────▶│ clean PDF) │ │ (local) │
	└─────────────┘ │ │────▶│ │ │ │
	│ │ └────────────────┘ │ │
	│ │────▶┌────────────────┐ │ │
	│ │ │ llm_client.py │────▶│ /api/chat │
	│ │◀────│ (Ollama API) │◀────│ │
	│ │ └────────────────┘ └───────────┘
	│ │────▶┌────────────────┐
	│ │ │ history_ │
	└──────────────┘ │ manager.py │
	│ (JSON store) │
	└────────────────┘
	```

	---

	## 📂 Project Structure

	```
	research-draft/
	├── app.py # Gradio Blocks UI — entry point
	├── pdf_utils.py # PDF text extraction and cleaning
	├── llm_client.py # Ollama API client
	├── history_manager.py # JSON-based history persistence
	├── abstract_service.py # Orchestration (PDF → LLM → history)
	├── requirements.txt # Python dependencies
	├── sample_modelfile.txt # Ollama Modelfile template
	├── data/
	│ └── history.json # Persistent generation history
	└── README.md # This file
	```

	---

	## 🚀 Setup Instructions

	### Prerequisites

	- Python 3.10+
	- Ollama installed and running — [Install Ollama](https://ollama.com/download)
	- A GGUF model file (e.g., LFM2.5-1.2B-Instruct, Qwen2.5-1.5B-Instruct, or Phi-3-mini)

	### Step 1 — Clone or download the project

	```bash
	git clone https://huggingface.co/Arunvarma2565/research-draft
	cd research-draft
	```

	### Step 2 — Install Python dependencies

	```bash
	pip install -r requirements.txt
	```

	Or install manually:

	```bash
	pip install gradio PyMuPDF requests
	```

	### Step 3 — Set up the Ollama model

	1. Download a GGUF model (e.g., from Hugging Face). Place the `.gguf` file in the project directory or note its path.

	2. Edit `sample_modelfile.txt` — update the `FROM` line to point at your `.gguf` file:
	```
	FROM /path/to/your/model.gguf
	```

	3. Create the model in Ollama:
	```bash
	ollama create researchdraft -f sample_modelfile.txt
	```

	4. Verify it works:
	```bash
	ollama list # should show "researchdraft"
	ollama run researchdraft "Hello" # quick sanity check
	```

	### Step 4 — Start the Ollama server

	If Ollama is not already running:

	```bash
	ollama serve
	```

	Leave this terminal open.

	### Step 5 — Launch Research Draft

	In a new terminal:

	```bash
	cd research-draft
	python app.py
	```

	Open your browser at http://localhost:7860.

	---

	## 🎓 How to Use

	1. Select your role — Student or Researcher — from the dropdown.
	2. Upload a PDF of a research paper.
	3. Click 🔍 Generate Abstract.
	4. The generated abstract appears on the right. Use the copy button to grab it.
	5. (Researcher only) Use the tools below to view history, export results, or clear history.

	---

	## ⚙️ Configuration

	\| Setting \| Location \| Default \|
	\|---\|---\|---\|
	\| Ollama URL \| `llm_client.py` → `OLLAMA_BASE_URL` \| `http://localhost:11434` \|
	\| Model name \| `llm_client.py` → `MODEL_NAME` \| `researchdraft` \|
	\| Temperature \| `llm_client.py` → `generate_abstract()` \| `0.3` \|
	\| Max text chars \| `pdf_utils.py` → `MAX_TEXT_CHARS` \| `12 000` \|
	\| History file \| `history_manager.py` → `HISTORY_FILE` \| `data/history.json` \|
	\| Server port \| `app.py` → `demo.launch()` \| `7860` \|

	---

	## 🧩 Tech Stack

	\| Component \| Library / Tool \|
	\|---\|---\|
	\| UI \| Gradio (Blocks API) \|
	\| PDF parsing \| PyMuPDF (fitz) \|
	\| LLM runtime \| Ollama (local) \|
	\| HTTP client \| requests \|
	\| History storage \| JSON file \|
	\| Language \| Python 3.10+ \|

	---

	## 📝 Sample Models That Work Well

	\| Model \| Size \| Notes \|
	\|---\|---\|---\|
	\| LFM2.5-1.2B-Instruct \| ~1.2 B \| Lightweight, good for CPU \|
	\| Qwen2.5-1.5B-Instruct \| ~1.5 B \| Strong instruction following \|
	\| Phi-3-mini-4k-instruct \| ~3.8 B \| Higher quality, needs more RAM \|
	\| Llama-3.2-3B-Instruct \| ~3.2 B \| Good balance of speed and quality \|

	All models should be in GGUF format (Q4_K_M or Q5_K_M quantisation recommended).

	---

	## 🔮 Future Improvements

	- Multi-PDF batch processing — upload several papers and generate abstracts in bulk.
	- Abstract comparison — compare generated vs. original abstract side-by-side.
	- Keyword extraction — automatically extract key terms from the paper.
	- Citation-aware chunking — smarter text splitting that preserves section boundaries.
	- SQLite backend — replace JSON history with SQLite for better querying.
	- User authentication — simple login to separate Student/Researcher sessions.
	- PDF preview — render the first page of the uploaded PDF in the UI.
	- Streaming output — show the abstract being generated token by token.
	- Fine-tuned model — fine-tune a small model on abstract-generation pairs for better quality.
	- Evaluation metrics — add ROUGE / BERTScore comparison against original abstracts.

	---

	## 📄 License

	This project is for educational purposes (B.Tech final-year project). Use it freely for learning and research.

	---

	## 🙏 Acknowledgements

	- [Ollama](https://ollama.com/) — local LLM serving
	- [Gradio](https://www.gradio.app/) — web UI framework
	- [PyMuPDF](https://pymupdf.readthedocs.io/) — PDF text extraction
	- [Hugging Face](https://huggingface.co/) — model hub and community