code / md /QUICK_START_LOCAL.md

Laura Wagner

to commit or not commit that is the question

5f5806d 7 months ago

3.61 kB

	# Quick Start: Running Qwen-2.5-32B Locally

	This is a quick guide to get you started with FREE local LLM inference using your A100 GPU.

	## Why Local?

	✅ $0 cost - No API fees
	✅ Privacy - Data stays on your machine
	✅ Quality - 32B parameter model with strong performance

	## Setup (One-time)

	### 1. Pull the Model (~10-30 minutes)

	```bash
	# Pull Qwen-2.5-32B-Instruct
	ollama pull qwen2.5:32b-instruct

	# Wait for download to complete (~20GB)
	# Model will be cached at: ~/.ollama/models/
	```

	### 2. Verify Model is Ready

	```bash
	# List installed models
	ollama list

	# Should show: qwen2.5:32b-instruct

	# Test it
	ollama run qwen2.5:32b-instruct "Hello, who are you?"
	```

	If you see a response, you're ready! ✅

	## Running the Notebook

	### Open the Notebook

	```bash
	cd jupyter_notebooks
	jupyter notebook Section_2-3-4_Figure_8_deepfake_adapters.ipynb
	```

	### Run the Cells

	1. Cell 5: NER & Name Cleaning (processes names)
	2. Cell 7: Country/Nationality Mapping
	3. Cell 20: Qwen-2.5-32B Local Annotation 👈 This is the new one!

	### Configure Cell 20

	```python
	# Start with test mode
	TEST_MODE = True
	TEST_SIZE = 10

	# Then run full dataset
	TEST_MODE = False
	MAX_ROWS = 20000 # or None for all
	```

	### Run Cell 20

	Just click "Run" or press Shift+Enter. The cell will:
	1. Check if Ollama is installed ✅
	2. Check if model is available ✅
	3. Start annotating
	4. Save progress every 10 rows
	5. Show completion stats

	### Monitor Progress

	```
	Qwen Local: 100%\|██████████\| 10/10 [02:30<00:00, 15.0s/it]
	✅ Saved after 10 rows (~24.0 samples/hour)

	✅ Done! Results: data/CSV/qwen_local_annotated_POI_test.csv
	Total time: 2.5 minutes
	Average speed: 240.0 samples/hour
	```

	## Performance

	On your A100 80GB:
	- Speed: ~5-10 tokens/second
	- Throughput: ~100-200 samples/hour
	- Memory: ~22-25GB VRAM
	- Cost: $0

	### Time Estimates

	\| Dataset Size \| Time \|
	\|-------------\|------\|
	\| 10 samples (test) \| ~2-3 minutes \|
	\| 100 samples \| ~20-30 minutes \|
	\| 1,000 samples \| ~5-10 hours \|
	\| 10,000 samples \| ~50-100 hours \|

	Tip: Run overnight or over the weekend for large datasets!

	## Troubleshooting

	### "Model not found"

	```bash
	ollama pull qwen2.5:32b-instruct
	```

	### "Ollama not running"

	```bash
	ollama serve
	```

	### Out of Memory

	Your A100 has 80GB VRAM - this should NOT happen with the 32B model (~25GB VRAM).

	If it does, try the quantized version:
	```bash
	ollama pull qwen2.5:32b-instruct-q4_0 # Only ~12GB VRAM
	```

	## Output

	Results saved to:
	- Test: `data/CSV/qwen_local_annotated_POI_test.csv`
	- Full: `data/CSV/qwen_local_annotated_POI.csv`

	Same format as API results - easy to compare!

	## Custom Model Cache Location

	To store models in `data/models/`:

	```bash
	export OLLAMA_MODELS="/home/lauhp/000_PHD/000_010_PUBLICATION/CODE/pm-paper/data/models"
	ollama pull qwen2.5:32b-instruct
	```

	## Comparing API vs Local

	After running both:

	```python
	import pandas as pd

	qwen_api = pd.read_csv('data/CSV/qwen_annotated_POI_test.csv')
	qwen_local = pd.read_csv('data/CSV/qwen_local_annotated_POI_test.csv')

	# Check agreement
	agreement = (qwen_api['profession_llm'] == qwen_local['profession_llm']).mean()
	print(f"Agreement: {agreement*100:.1f}%")
	```

	## Full Documentation

	For more details, see:
	- `QWEN_LOCAL_SETUP.md` - Complete setup guide
	- `LLM_MODELS_COMPARISON.md` - All 6 LLM options compared

	## Summary

	✅ Ollama already installed
	✅ A100 80GB GPU - perfect for Qwen-2.5-32B
	✅ FREE inference - no API costs
	✅ Privacy - data stays local

	Next step: Run Cell 20 in the notebook! 🚀