Quick Start: Running Qwen-2.5-32B Locally
This is a quick guide to get you started with FREE local LLM inference using your A100 GPU.
Why Local?
β $0 cost - No API fees β Privacy - Data stays on your machine β Quality - 32B parameter model with strong performance
Setup (One-time)
1. Pull the Model (~10-30 minutes)
# Pull Qwen-2.5-32B-Instruct
ollama pull qwen2.5:32b-instruct
# Wait for download to complete (~20GB)
# Model will be cached at: ~/.ollama/models/
2. Verify Model is Ready
# List installed models
ollama list
# Should show: qwen2.5:32b-instruct
# Test it
ollama run qwen2.5:32b-instruct "Hello, who are you?"
If you see a response, you're ready! β
Running the Notebook
Open the Notebook
cd jupyter_notebooks
jupyter notebook Section_2-3-4_Figure_8_deepfake_adapters.ipynb
Run the Cells
- Cell 5: NER & Name Cleaning (processes names)
- Cell 7: Country/Nationality Mapping
- Cell 20: Qwen-2.5-32B Local Annotation π This is the new one!
Configure Cell 20
# Start with test mode
TEST_MODE = True
TEST_SIZE = 10
# Then run full dataset
TEST_MODE = False
MAX_ROWS = 20000 # or None for all
Run Cell 20
Just click "Run" or press Shift+Enter. The cell will:
- Check if Ollama is installed β
- Check if model is available β
- Start annotating
- Save progress every 10 rows
- Show completion stats
Monitor Progress
Qwen Local: 100%|ββββββββββ| 10/10 [02:30<00:00, 15.0s/it]
β
Saved after 10 rows (~24.0 samples/hour)
β
Done! Results: data/CSV/qwen_local_annotated_POI_test.csv
Total time: 2.5 minutes
Average speed: 240.0 samples/hour
Performance
On your A100 80GB:
- Speed: ~5-10 tokens/second
- Throughput: ~100-200 samples/hour
- Memory: ~22-25GB VRAM
- Cost: $0
Time Estimates
| Dataset Size | Time |
|---|---|
| 10 samples (test) | ~2-3 minutes |
| 100 samples | ~20-30 minutes |
| 1,000 samples | ~5-10 hours |
| 10,000 samples | ~50-100 hours |
Tip: Run overnight or over the weekend for large datasets!
Troubleshooting
"Model not found"
ollama pull qwen2.5:32b-instruct
"Ollama not running"
ollama serve
Out of Memory
Your A100 has 80GB VRAM - this should NOT happen with the 32B model (~25GB VRAM).
If it does, try the quantized version:
ollama pull qwen2.5:32b-instruct-q4_0 # Only ~12GB VRAM
Output
Results saved to:
- Test:
data/CSV/qwen_local_annotated_POI_test.csv - Full:
data/CSV/qwen_local_annotated_POI.csv
Same format as API results - easy to compare!
Custom Model Cache Location
To store models in data/models/:
export OLLAMA_MODELS="/home/lauhp/000_PHD/000_010_PUBLICATION/CODE/pm-paper/data/models"
ollama pull qwen2.5:32b-instruct
Comparing API vs Local
After running both:
import pandas as pd
qwen_api = pd.read_csv('data/CSV/qwen_annotated_POI_test.csv')
qwen_local = pd.read_csv('data/CSV/qwen_local_annotated_POI_test.csv')
# Check agreement
agreement = (qwen_api['profession_llm'] == qwen_local['profession_llm']).mean()
print(f"Agreement: {agreement*100:.1f}%")
Full Documentation
For more details, see:
QWEN_LOCAL_SETUP.md- Complete setup guideLLM_MODELS_COMPARISON.md- All 6 LLM options compared
Summary
β Ollama already installed β A100 80GB GPU - perfect for Qwen-2.5-32B β FREE inference - no API costs β Privacy - data stays local
Next step: Run Cell 20 in the notebook! π