llm-training / README.md
percyraskova's picture
Upload folder using huggingface_hub
1bc8867 verified
---
language:
- en
license: agpl-3.0
library_name: transformers
tags:
- grpo
- rlhf
- fine-tuning
- marxism
- political-theory
- lora
- deepseek
- qwen
datasets:
- prolewiki/qa-corpus
base_model: unsloth/DeepSeek-R1-0528-Qwen3-8B
pipeline_tag: text-generation
---
# prolewiki-llm
GRPO fine-tuning infrastructure for training Marxist-Leninist language models.
## Overview
This repository contains the AI training infrastructure for fine-tuning language models on Marxist-Leninist theory using GRPO (Group Relative Policy Optimization). It includes:
- **Multi-Layer Reward System**: 17+ reward functions that prevent reward hacking (NLI coherence, self-consistency, structural analysis, topic relevance, depth scoring)
- **Headless Training**: Docker container for automated RunPod deployment with auto-shutoff
- **Jupyter Notebook**: Production-ready notebook optimized for A40/A100 GPUs
- **Comprehensive Tests**: Unit and integration tests for all components
## Quick Start
### RunPod Deployment (Recommended)
```bash
# 1. Build Docker image
docker build -t marxist-grpo:latest -f docker/Dockerfile .
# 2. Push to registry and deploy on RunPod
# Use A40 (48GB, $0.35/hr) for best cost/performance
# 3. Set environment variables on pod:
# - HF_TOKEN
# - WANDB_API_KEY
# - HF_REPO (optional, for model upload)
```
### Local Development
```bash
# Install dependencies
uv sync --group dev
# Download spaCy model (required for rewards)
python -m spacy download en_core_web_sm
# Run tests
uv run pytest -m "not slow and not gpu"
```
## Repository Structure
```
prolewiki-llm/
β”œβ”€β”€ src/prolewiki_llm/
β”‚ β”œβ”€β”€ grpo_rewards.py # Multi-layer reward functions
β”‚ β”œβ”€β”€ train_headless.py # Headless training script
β”‚ β”œβ”€β”€ export_grpo_dataset.py # Dataset conversion
β”‚ └── wandb_logging.py # W&B integration
β”œβ”€β”€ docker/
β”‚ β”œβ”€β”€ Dockerfile # Training container
β”‚ β”œβ”€β”€ start.sh # Entrypoint with auto-shutoff
β”‚ └── .env.example # Environment reference
β”œβ”€β”€ notebooks/
β”‚ └── Marxist_GRPO_RunPod_Optimized.ipynb
β”œβ”€β”€ tests/
β”‚ β”œβ”€β”€ unit/ # Unit tests
β”‚ β”œβ”€β”€ integration/ # Shell script tests
β”‚ └── fixtures/ # Mock commands
└── training_data/
└── grpo_dataset.jsonl # Training data
```
## Reward Functions
The reward system uses multiple layers to ensure quality responses:
| Layer | Function | Purpose |
|-------|----------|---------|
| 1 | `match_format_exactly` | Validate `<think>...</think>` tags |
| 2 | `nli_coherence_reward` | Response entails ground truth (BART-MNLI) |
| 3 | `self_consistency_reward` | No internal contradictions |
| 4 | `structural_coherence_reward` | Terms in proper syntactic roles (spaCy) |
| 5 | `topic_relevance_reward` | Answer addresses the question |
| 6 | `interconnection_depth_reward` | Rewards analysis, penalizes buzzword salad |
Use `full_coherence_reward()` for the complete 6-layer check, or `robust_coherence_reward()` for a faster 3-layer version.
## Training Configuration
Key environment variables for `train_headless.py`:
| Variable | Default | Description |
|----------|---------|-------------|
| `MODEL_NAME` | `unsloth/DeepSeek-R1-0528-Qwen3-8B` | Base model |
| `MAX_STEPS` | `500` | Training steps |
| `BATCH_SIZE` | `2` | Per-device batch size |
| `LEARNING_RATE` | `5e-6` | Learning rate |
| `REWARD_MODE` | `FULL` | `FULL`, `ROBUST`, or `LEGACY` |
| `HF_REPO` | `prolewiki/marxist-grpo-lora` | Upload destination |
## GPU Requirements
| GPU | VRAM | Price | Recommendation |
|-----|------|-------|----------------|
| **A40** | 48GB | $0.35/hr | Best value for 8B models |
| A100 | 80GB | $1.19/hr | Overkill for this use case |
| RTX 4090 | 24GB | $0.34/hr | Too small for 16-bit GRPO |
## Critical Notes
1. **torch.compile must be disabled** on RunPod/Jupyter (causes hangs)
2. **load_in_4bit=False** is required for GRPO (16-bit LoRA adapters)
3. **use_gradient_checkpointing=True** (not `"unsloth"`) for stability
## Related Projects
- [ProleWiki](https://en.prolewiki.org/) - The Marxist-Leninist encyclopedia
- [pw-mcp](https://github.com/prolewiki/pw-mcp) - MCP server for ProleWiki semantic search
## License
AGPL-3.0-only