code / md /QUICK_START_LOCAL.md
Laura Wagner
to commit or not commit that is the question
5f5806d

Quick Start: Running Qwen-2.5-32B Locally

This is a quick guide to get you started with FREE local LLM inference using your A100 GPU.

Why Local?

βœ… $0 cost - No API fees βœ… Privacy - Data stays on your machine βœ… Quality - 32B parameter model with strong performance

Setup (One-time)

1. Pull the Model (~10-30 minutes)

# Pull Qwen-2.5-32B-Instruct
ollama pull qwen2.5:32b-instruct

# Wait for download to complete (~20GB)
# Model will be cached at: ~/.ollama/models/

2. Verify Model is Ready

# List installed models
ollama list

# Should show: qwen2.5:32b-instruct

# Test it
ollama run qwen2.5:32b-instruct "Hello, who are you?"

If you see a response, you're ready! βœ…

Running the Notebook

Open the Notebook

cd jupyter_notebooks
jupyter notebook Section_2-3-4_Figure_8_deepfake_adapters.ipynb

Run the Cells

  1. Cell 5: NER & Name Cleaning (processes names)
  2. Cell 7: Country/Nationality Mapping
  3. Cell 20: Qwen-2.5-32B Local Annotation πŸ‘ˆ This is the new one!

Configure Cell 20

# Start with test mode
TEST_MODE = True
TEST_SIZE = 10

# Then run full dataset
TEST_MODE = False
MAX_ROWS = 20000  # or None for all

Run Cell 20

Just click "Run" or press Shift+Enter. The cell will:

  1. Check if Ollama is installed βœ…
  2. Check if model is available βœ…
  3. Start annotating
  4. Save progress every 10 rows
  5. Show completion stats

Monitor Progress

Qwen Local: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [02:30<00:00, 15.0s/it]
βœ… Saved after 10 rows (~24.0 samples/hour)

βœ… Done! Results: data/CSV/qwen_local_annotated_POI_test.csv
Total time: 2.5 minutes
Average speed: 240.0 samples/hour

Performance

On your A100 80GB:

  • Speed: ~5-10 tokens/second
  • Throughput: ~100-200 samples/hour
  • Memory: ~22-25GB VRAM
  • Cost: $0

Time Estimates

Dataset Size Time
10 samples (test) ~2-3 minutes
100 samples ~20-30 minutes
1,000 samples ~5-10 hours
10,000 samples ~50-100 hours

Tip: Run overnight or over the weekend for large datasets!

Troubleshooting

"Model not found"

ollama pull qwen2.5:32b-instruct

"Ollama not running"

ollama serve

Out of Memory

Your A100 has 80GB VRAM - this should NOT happen with the 32B model (~25GB VRAM).

If it does, try the quantized version:

ollama pull qwen2.5:32b-instruct-q4_0  # Only ~12GB VRAM

Output

Results saved to:

  • Test: data/CSV/qwen_local_annotated_POI_test.csv
  • Full: data/CSV/qwen_local_annotated_POI.csv

Same format as API results - easy to compare!

Custom Model Cache Location

To store models in data/models/:

export OLLAMA_MODELS="/home/lauhp/000_PHD/000_010_PUBLICATION/CODE/pm-paper/data/models"
ollama pull qwen2.5:32b-instruct

Comparing API vs Local

After running both:

import pandas as pd

qwen_api = pd.read_csv('data/CSV/qwen_annotated_POI_test.csv')
qwen_local = pd.read_csv('data/CSV/qwen_local_annotated_POI_test.csv')

# Check agreement
agreement = (qwen_api['profession_llm'] == qwen_local['profession_llm']).mean()
print(f"Agreement: {agreement*100:.1f}%")

Full Documentation

For more details, see:

  • QWEN_LOCAL_SETUP.md - Complete setup guide
  • LLM_MODELS_COMPARISON.md - All 6 LLM options compared

Summary

βœ… Ollama already installed βœ… A100 80GB GPU - perfect for Qwen-2.5-32B βœ… FREE inference - no API costs βœ… Privacy - data stays local

Next step: Run Cell 20 in the notebook! πŸš€