# Quick Start: Running Qwen-2.5-32B Locally This is a quick guide to get you started with FREE local LLM inference using your A100 GPU. ## Why Local? ✅ **$0 cost** - No API fees ✅ **Privacy** - Data stays on your machine ✅ **Quality** - 32B parameter model with strong performance ## Setup (One-time) ### 1. Pull the Model (~10-30 minutes) ```bash # Pull Qwen-2.5-32B-Instruct ollama pull qwen2.5:32b-instruct # Wait for download to complete (~20GB) # Model will be cached at: ~/.ollama/models/ ``` ### 2. Verify Model is Ready ```bash # List installed models ollama list # Should show: qwen2.5:32b-instruct # Test it ollama run qwen2.5:32b-instruct "Hello, who are you?" ``` If you see a response, you're ready! ✅ ## Running the Notebook ### Open the Notebook ```bash cd jupyter_notebooks jupyter notebook Section_2-3-4_Figure_8_deepfake_adapters.ipynb ``` ### Run the Cells 1. **Cell 5**: NER & Name Cleaning (processes names) 2. **Cell 7**: Country/Nationality Mapping 3. **Cell 20**: Qwen-2.5-32B Local Annotation 👈 **This is the new one!** ### Configure Cell 20 ```python # Start with test mode TEST_MODE = True TEST_SIZE = 10 # Then run full dataset TEST_MODE = False MAX_ROWS = 20000 # or None for all ``` ### Run Cell 20 Just click "Run" or press Shift+Enter. The cell will: 1. Check if Ollama is installed ✅ 2. Check if model is available ✅ 3. Start annotating 4. Save progress every 10 rows 5. Show completion stats ### Monitor Progress ``` Qwen Local: 100%|██████████| 10/10 [02:30<00:00, 15.0s/it] ✅ Saved after 10 rows (~24.0 samples/hour) ✅ Done! Results: data/CSV/qwen_local_annotated_POI_test.csv Total time: 2.5 minutes Average speed: 240.0 samples/hour ``` ## Performance On your A100 80GB: - **Speed**: ~5-10 tokens/second - **Throughput**: ~100-200 samples/hour - **Memory**: ~22-25GB VRAM - **Cost**: $0 ### Time Estimates | Dataset Size | Time | |-------------|------| | 10 samples (test) | ~2-3 minutes | | 100 samples | ~20-30 minutes | | 1,000 samples | ~5-10 hours | | 10,000 samples | ~50-100 hours | **Tip**: Run overnight or over the weekend for large datasets! ## Troubleshooting ### "Model not found" ```bash ollama pull qwen2.5:32b-instruct ``` ### "Ollama not running" ```bash ollama serve ``` ### Out of Memory Your A100 has 80GB VRAM - this should NOT happen with the 32B model (~25GB VRAM). If it does, try the quantized version: ```bash ollama pull qwen2.5:32b-instruct-q4_0 # Only ~12GB VRAM ``` ## Output Results saved to: - Test: `data/CSV/qwen_local_annotated_POI_test.csv` - Full: `data/CSV/qwen_local_annotated_POI.csv` Same format as API results - easy to compare! ## Custom Model Cache Location To store models in `data/models/`: ```bash export OLLAMA_MODELS="/home/lauhp/000_PHD/000_010_PUBLICATION/CODE/pm-paper/data/models" ollama pull qwen2.5:32b-instruct ``` ## Comparing API vs Local After running both: ```python import pandas as pd qwen_api = pd.read_csv('data/CSV/qwen_annotated_POI_test.csv') qwen_local = pd.read_csv('data/CSV/qwen_local_annotated_POI_test.csv') # Check agreement agreement = (qwen_api['profession_llm'] == qwen_local['profession_llm']).mean() print(f"Agreement: {agreement*100:.1f}%") ``` ## Full Documentation For more details, see: - `QWEN_LOCAL_SETUP.md` - Complete setup guide - `LLM_MODELS_COMPARISON.md` - All 6 LLM options compared ## Summary ✅ Ollama already installed ✅ A100 80GB GPU - perfect for Qwen-2.5-32B ✅ FREE inference - no API costs ✅ Privacy - data stays local **Next step**: Run Cell 20 in the notebook! 🚀