code / md /QUICK_START_LOCAL.md

Laura Wagner

to commit or not commit that is the question

5f5806d about 2 months ago

preview code

raw

history blame contribute delete

3.61 kB

Quick Start: Running Qwen-2.5-32B Locally

This is a quick guide to get you started with FREE local LLM inference using your A100 GPU.

Why Local?

✅ $0 cost - No API fees ✅ Privacy - Data stays on your machine ✅ Quality - 32B parameter model with strong performance

Setup (One-time)

1. Pull the Model (~10-30 minutes)

# Pull Qwen-2.5-32B-Instruct
ollama pull qwen2.5:32b-instruct

# Wait for download to complete (~20GB)
# Model will be cached at: ~/.ollama/models/

2. Verify Model is Ready

# List installed models
ollama list

# Should show: qwen2.5:32b-instruct

# Test it
ollama run qwen2.5:32b-instruct "Hello, who are you?"

If you see a response, you're ready! ✅

Running the Notebook

Open the Notebook

cd jupyter_notebooks
jupyter notebook Section_2-3-4_Figure_8_deepfake_adapters.ipynb

Run the Cells

Cell 5: NER & Name Cleaning (processes names)
Cell 7: Country/Nationality Mapping
Cell 20: Qwen-2.5-32B Local Annotation 👈 This is the new one!

Configure Cell 20

# Start with test mode
TEST_MODE = True
TEST_SIZE = 10

# Then run full dataset
TEST_MODE = False
MAX_ROWS = 20000  # or None for all

Run Cell 20

Just click "Run" or press Shift+Enter. The cell will:

Check if Ollama is installed ✅
Check if model is available ✅
Start annotating
Save progress every 10 rows
Show completion stats

Monitor Progress

Qwen Local: 100%|██████████| 10/10 [02:30<00:00, 15.0s/it]
✅ Saved after 10 rows (~24.0 samples/hour)

✅ Done! Results: data/CSV/qwen_local_annotated_POI_test.csv
Total time: 2.5 minutes
Average speed: 240.0 samples/hour

Performance

On your A100 80GB:

Speed: ~5-10 tokens/second
Throughput: ~100-200 samples/hour
Memory: ~22-25GB VRAM
Cost: $0

Time Estimates

Dataset Size	Time
10 samples (test)	~2-3 minutes
100 samples	~20-30 minutes
1,000 samples	~5-10 hours
10,000 samples	~50-100 hours

Tip: Run overnight or over the weekend for large datasets!

Troubleshooting

"Model not found"

ollama pull qwen2.5:32b-instruct

"Ollama not running"

ollama serve

Out of Memory

Your A100 has 80GB VRAM - this should NOT happen with the 32B model (~25GB VRAM).

If it does, try the quantized version:

ollama pull qwen2.5:32b-instruct-q4_0  # Only ~12GB VRAM

Output

Results saved to:

Test: data/CSV/qwen_local_annotated_POI_test.csv
Full: data/CSV/qwen_local_annotated_POI.csv

Same format as API results - easy to compare!

Custom Model Cache Location

To store models in data/models/:

export OLLAMA_MODELS="/home/lauhp/000_PHD/000_010_PUBLICATION/CODE/pm-paper/data/models"
ollama pull qwen2.5:32b-instruct

Comparing API vs Local

After running both:

import pandas as pd

qwen_api = pd.read_csv('data/CSV/qwen_annotated_POI_test.csv')
qwen_local = pd.read_csv('data/CSV/qwen_local_annotated_POI_test.csv')

# Check agreement
agreement = (qwen_api['profession_llm'] == qwen_local['profession_llm']).mean()
print(f"Agreement: {agreement*100:.1f}%")

Full Documentation

For more details, see:

QWEN_LOCAL_SETUP.md - Complete setup guide
LLM_MODELS_COMPARISON.md - All 6 LLM options compared

Summary

✅ Ollama already installed ✅ A100 80GB GPU - perfect for Qwen-2.5-32B ✅ FREE inference - no API costs ✅ Privacy - data stays local

Next step: Run Cell 20 in the notebook! 🚀