LLM Models for Deepfake Annotation
Overview
The pipeline now includes 6 LLM options in individual cells for easy comparison:
- Deepseek - Testing (use first!)
- Qwen (API) - Chinese (Alibaba Cloud)
- Llama - American (Meta)
- Mixtral - French (Mistral AI)
- Gemma - American Open Source (Google)
- Qwen-2.5-32B Local - FREE local inference (NEW!)
The 6 LLMs
1. Deepseek (Testing)
Cell 10
- Model: deepseek-chat
- Provider: DeepSeek
- API: https://platform.deepseek.com/
- Cost:
$0.14-0.28 per 1M tokens ($1-2 for 10k entries) - Use case: Test this first! Cheapest option to verify pipeline works
- API Key:
misc/credentials/deepseek_api_key.txt
2. Qwen API (Chinese)
Cells 11-12
- Model: qwen-max (automatically uses Qwen3-Max)
- Provider: Alibaba Cloud DashScope
- API: https://dashscope.aliyun.com/
- Cost: Variable (check Alibaba pricing)
- Use case: Chinese company, strong multilingual support
- API Key:
misc/credentials/qwen_api_key.txt - Note: Uses latest Qwen3-Max when you specify
qwen-max
6. Qwen-2.5-32B Local (FREE!)
Cells 19-20 (NEW!)
- Model: qwen2.5:32b-instruct
- Provider: Ollama (local inference)
- Setup: https://ollama.com/
- Cost: $0 (FREE - no API costs!)
- Requirements:
- A100 80GB GPU (or similar)
- ~25GB VRAM during inference
- ~20GB storage for model download
- Ollama installed
- Speed: 5-10 tokens/sec on A100 (~100-200 samples/hour)
- Use case:
- β Large datasets (>1000 samples) where cost matters
- β Privacy-sensitive research data
- β Offline processing
- β Strong multilingual support
- Setup guide: See
QWEN_LOCAL_SETUP.md
3. Llama (American)
Cells 13-14
- Model: meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
- Provider: Together AI (hosting Meta's model)
- Developer: Meta (American)
- API: https://www.together.ai/
- Cost:
$0.90 per 1M tokens ($5-10 for 10k entries) - Use case: Open-source American model, good quality
- API Key:
misc/credentials/together_api_key.txt
4. Mixtral (French)
Cells 15-16
- Model: open-mixtral-8x22b
- Provider: Mistral AI
- Developer: Mistral AI (French)
- API: https://mistral.ai/
- Cost:
$2 per 1M tokens ($10-20 for 10k entries) - Use case: European alternative, Mixture-of-Experts architecture
- API Key:
misc/credentials/mistral_api_key.txt - Note: Using open-mixtral-8x22b (cheaper than mistral-large)
5. Gemma (American Open Source)
Cells 17-18
- Model: google/gemma-2-27b-it
- Provider: Together AI (hosting Google's model)
- Developer: Google (American)
- API: https://www.together.ai/ (same as Llama)
- Cost:
$0.80 per 1M tokens ($4-8 for 10k entries) - Use case: American open-source alternative, competitive quality
- API Key:
misc/credentials/together_api_key.txt(same as Llama) - Note: Fully open-source, can be self-hosted
Cost Comparison (10,000 entries)
| Model | Provider | Cost | Time | Origin |
|---|---|---|---|---|
| Qwen-2.5-32B Local | Ollama (local) | $0 | ~50-100 hrs | π¨π³ Chinese |
| Deepseek | DeepSeek | ~$1-2 | ~5-10 hrs | π¨π³ Chinese |
| Gemma 2 | Together AI | ~$4-8 | ~5-10 hrs | πΊπΈ American (open) |
| Llama 3.1 | Together AI | ~$5-10 | ~5-10 hrs | πΊπΈ American (open) |
| Mixtral | Mistral AI | ~$10-20 | ~5-10 hrs | π«π· French (open) |
| Qwen API | Alibaba | Variable | ~5-10 hrs | π¨π³ Chinese |
Note: Local inference is FREE but slower. Good for large datasets where cost matters more than time.
Recommended Testing Order
1. Start with Deepseek
# Cell 10
TEST_MODE = True
TEST_SIZE = 10
- Why: Cheapest, verify pipeline works
- Cost: Pennies for 10 samples
2. Compare on Small Sample
Pick 2-3 models and run on same 100 samples:
# In each cell:
TEST_MODE = True
TEST_SIZE = 100
Good combinations:
- Budget: Deepseek + Gemma
- Quality: Llama + Mixtral
- Geographic: Qwen + Llama + Mixtral
3. Production Run
Choose best model from testing and run full dataset:
TEST_MODE = False
MAX_ROWS = None # or 20000
API Key Setup
For Deepseek & Qwen (separate keys):
echo "your-deepseek-key" > misc/credentials/deepseek_api_key.txt
echo "your-qwen-key" > misc/credentials/qwen_api_key.txt
For Llama & Gemma (same Together AI key):
echo "your-together-key" > misc/credentials/together_api_key.txt
Both Llama and Gemma use the same Together AI key!
For Mixtral:
echo "your-mistral-key" > misc/credentials/mistral_api_key.txt
Output Files
Each LLM saves to a separate file:
data/CSV/
βββ deepseek_annotated_POI_test.csv # Deepseek test
βββ deepseek_annotated_POI.csv # Deepseek full
βββ qwen_annotated_POI_test.csv # Qwen API test
βββ qwen_annotated_POI.csv # Qwen API full
βββ qwen_local_annotated_POI_test.csv # Qwen Local test (NEW!)
βββ qwen_local_annotated_POI.csv # Qwen Local full (NEW!)
βββ llama_annotated_POI_test.csv # Llama test
βββ llama_annotated_POI.csv # Llama full
βββ mixtral_annotated_POI_test.csv # Mixtral test
βββ mixtral_annotated_POI.csv # Mixtral full
βββ gemma_annotated_POI_test.csv # Gemma test
βββ gemma_annotated_POI.csv # Gemma full
Comparing Results
After running multiple LLMs, compare results:
import pandas as pd
# Load results from different models
deepseek_df = pd.read_csv('data/CSV/deepseek_annotated_POI_test.csv')
qwen_df = pd.read_csv('data/CSV/qwen_annotated_POI_test.csv')
qwen_local_df = pd.read_csv('data/CSV/qwen_local_annotated_POI_test.csv') # NEW!
llama_df = pd.read_csv('data/CSV/llama_annotated_POI_test.csv')
mixtral_df = pd.read_csv('data/CSV/mixtral_annotated_POI_test.csv')
gemma_df = pd.read_csv('data/CSV/gemma_annotated_POI_test.csv')
# Compare profession distributions
print("Deepseek professions:", deepseek_df['profession_llm'].value_counts().head())
print("Qwen API professions:", qwen_df['profession_llm'].value_counts().head())
print("Qwen Local professions:", qwen_local_df['profession_llm'].value_counts().head()) # NEW!
print("Llama professions:", llama_df['profession_llm'].value_counts().head())
print("Mixtral professions:", mixtral_df['profession_llm'].value_counts().head())
print("Gemma professions:", gemma_df['profession_llm'].value_counts().head())
# Compare specific cases
print("\nIrene identification:")
print("Deepseek:", deepseek_df[deepseek_df['real_name'] == 'Irene']['full_name'].values)
print("Qwen API:", qwen_df[qwen_df['real_name'] == 'Irene']['full_name'].values)
print("Qwen Local:", qwen_local_df[qwen_local_df['real_name'] == 'Irene']['full_name'].values)
print("Llama:", llama_df[llama_df['real_name'] == 'Irene']['full_name'].values)
print("Mixtral:", mixtral_df[mixtral_df['real_name'] == 'Irene']['full_name'].values)
print("Gemma:", gemma_df[gemma_df['real_name'] == 'Irene']['full_name'].values)
Model Characteristics
Deepseek
- β Very cheap
- β Good for testing
- β οΈ Less documentation
- π¨π³ Chinese company
Qwen (Qwen3-Max)
- β Latest version automatically used
- β Strong multilingual
- β Good Asian name recognition
- π° Variable cost
- π¨π³ Chinese company (Alibaba)
Llama 3.1 70B
- β Open-source
- β Strong overall performance
- β Well-documented
- β American (Meta)
- π° Mid-range cost
Mixtral 8x22B
- β Open-source
- β MoE architecture (efficient)
- β European alternative
- π° Mid-range cost
- π«π· French company
Gemma 2 27B
- β Fully open-source
- β Can self-host
- β American (Google)
- β Cheap via API
- β Good quality for size
Qwen-2.5-32B Local (NEW!)
- β FREE - $0 cost (no API fees)
- β FAST - Local inference on A100 (5-10 tokens/sec)
- β PRIVATE - Data never leaves your machine
- β OFFLINE - Works without internet
- β HIGH QUALITY - 32B parameter model
- β Strong multilingual support
- β οΈ Requires: A100 80GB GPU, ~25GB VRAM, Ollama installed
- π¨π³ Chinese company (Alibaba)
- π¦ Model size: ~20GB download
Decision Matrix
If you prioritize...
FREE / Zero Cost: Use Qwen-2.5-32B Local (no API fees!)
Cost (with API): Use Deepseek or Gemma
Quality: Use Qwen-2.5-32B Local, Llama, or Mixtral
Privacy: Use Qwen-2.5-32B Local (data stays on your machine)
American/Open Source: Use Gemma or Llama
Asian Names: Use Qwen (API or Local - strong multilingual)
European Provider: Use Mixtral
Testing: Use Deepseek first, always!
Running Multiple Models
You can run all 6 models in sequence:
# 1. Run Cell 10 (Deepseek) - verify works (~$1-2 for 10k)
# 2. Run Cell 12 (Qwen API) - Chinese perspective (~variable cost)
# 3. Run Cell 14 (Llama) - American perspective (~$5-10 for 10k)
# 4. Run Cell 16 (Mixtral) - European perspective (~$10-20 for 10k)
# 5. Run Cell 18 (Gemma) - Open source perspective (~$4-8 for 10k)
# 6. Run Cell 20 (Qwen-2.5-32B Local) - FREE local inference ($0!)
Each saves to its own file, so you can compare results!
Notes
- Llama and Gemma use the same API key (Together AI)
- All models use the same 9 profession categories
- All models have automatic retries with exponential backoff
- All models save progress every 10 rows
- All models are resumable if interrupted
Summary
You now have 6 LLM options to choose from:
- π§ͺ Deepseek - Test first (cheapest API)
- π¨π³ Qwen3-Max API - Chinese, strong multilingual
- πΊπΈ Llama 3.1 70B - American, open-source
- π«π· Mixtral 8x22B - French, open-source MoE
- πΊπΈ Gemma 2 27B - American open-source (Google)
- π° Qwen-2.5-32B Local - FREE local inference (NEW!)
Each in its own cell, easy to run and compare! π
Recommended workflow:
- Test with Deepseek (Cell 10) - verify pipeline works
- For small datasets (<1000): Use API (Deepseek/Gemma/Llama)
- For large datasets (>1000): Use Qwen-2.5-32B Local (Cell 20) - FREE!