# Quick Start: Running Qwen-2.5-32B Locally

This is a quick guide to get you started with FREE local LLM inference using your A100 GPU.

## Why Local?

✅ **$0 cost** - No API fees
✅ **Privacy** - Data stays on your machine
✅ **Quality** - 32B parameter model with strong performance

## Setup (One-time)

### 1. Pull the Model (~10-30 minutes)

```bash
# Pull Qwen-2.5-32B-Instruct
ollama pull qwen2.5:32b-instruct

# Wait for download to complete (~20GB)
# Model will be cached at: ~/.ollama/models/
```

### 2. Verify Model is Ready

```bash
# List installed models
ollama list

# Should show: qwen2.5:32b-instruct

# Test it
ollama run qwen2.5:32b-instruct "Hello, who are you?"
```

If you see a response, you're ready! ✅

## Running the Notebook

### Open the Notebook

```bash
cd jupyter_notebooks
jupyter notebook Section_2-3-4_Figure_8_deepfake_adapters.ipynb
```

### Run the Cells

1. **Cell 5**: NER & Name Cleaning (processes names)
2. **Cell 7**: Country/Nationality Mapping
3. **Cell 20**: Qwen-2.5-32B Local Annotation 👈 **This is the new one!**

### Configure Cell 20

```python
# Start with test mode
TEST_MODE = True
TEST_SIZE = 10

# Then run full dataset
TEST_MODE = False
MAX_ROWS = 20000  # or None for all
```

### Run Cell 20

Just click "Run" or press Shift+Enter. The cell will:
1. Check if Ollama is installed ✅
2. Check if model is available ✅
3. Start annotating
4. Save progress every 10 rows
5. Show completion stats

### Monitor Progress

```
Qwen Local: 100%|██████████| 10/10 [02:30<00:00, 15.0s/it]
✅ Saved after 10 rows (~24.0 samples/hour)

✅ Done! Results: data/CSV/qwen_local_annotated_POI_test.csv
Total time: 2.5 minutes
Average speed: 240.0 samples/hour
```

## Performance

On your A100 80GB:
- **Speed**: ~5-10 tokens/second
- **Throughput**: ~100-200 samples/hour
- **Memory**: ~22-25GB VRAM
- **Cost**: $0

### Time Estimates

| Dataset Size | Time |
|-------------|------|
| 10 samples (test) | ~2-3 minutes |
| 100 samples | ~20-30 minutes |
| 1,000 samples | ~5-10 hours |
| 10,000 samples | ~50-100 hours |

**Tip**: Run overnight or over the weekend for large datasets!

## Troubleshooting

### "Model not found"

```bash
ollama pull qwen2.5:32b-instruct
```

### "Ollama not running"

```bash
ollama serve
```

### Out of Memory

Your A100 has 80GB VRAM - this should NOT happen with the 32B model (~25GB VRAM).

If it does, try the quantized version:
```bash
ollama pull qwen2.5:32b-instruct-q4_0  # Only ~12GB VRAM
```

## Output

Results saved to:
- Test: `data/CSV/qwen_local_annotated_POI_test.csv`
- Full: `data/CSV/qwen_local_annotated_POI.csv`

Same format as API results - easy to compare!

## Custom Model Cache Location

To store models in `data/models/`:

```bash
export OLLAMA_MODELS="/home/lauhp/000_PHD/000_010_PUBLICATION/CODE/pm-paper/data/models"
ollama pull qwen2.5:32b-instruct
```

## Comparing API vs Local

After running both:

```python
import pandas as pd

qwen_api = pd.read_csv('data/CSV/qwen_annotated_POI_test.csv')
qwen_local = pd.read_csv('data/CSV/qwen_local_annotated_POI_test.csv')

# Check agreement
agreement = (qwen_api['profession_llm'] == qwen_local['profession_llm']).mean()
print(f"Agreement: {agreement*100:.1f}%")
```

## Full Documentation

For more details, see:
- `QWEN_LOCAL_SETUP.md` - Complete setup guide
- `LLM_MODELS_COMPARISON.md` - All 6 LLM options compared

## Summary

✅ Ollama already installed
✅ A100 80GB GPU - perfect for Qwen-2.5-32B
✅ FREE inference - no API costs
✅ Privacy - data stays local

**Next step**: Run Cell 20 in the notebook! 🚀