percyraskova
/

llm-training

@@ -1,35 +1,30 @@
-*.7z filter=lfs diff=lfs merge=lfs -text
-*.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
-*.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
-*.ftz filter=lfs diff=lfs merge=lfs -text
-*.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
-*.joblib filter=lfs diff=lfs merge=lfs -text
-*.lfs.* filter=lfs diff=lfs merge=lfs -text
-*.mlmodel filter=lfs diff=lfs merge=lfs -text
-*.model filter=lfs diff=lfs merge=lfs -text
-*.msgpack filter=lfs diff=lfs merge=lfs -text
-*.npy filter=lfs diff=lfs merge=lfs -text
-*.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
-*.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
-*.pb filter=lfs diff=lfs merge=lfs -text
-*.pickle filter=lfs diff=lfs merge=lfs -text
-*.pkl filter=lfs diff=lfs merge=lfs -text
-*.pt filter=lfs diff=lfs merge=lfs -text
-*.pth filter=lfs diff=lfs merge=lfs -text
-*.rar filter=lfs diff=lfs merge=lfs -text
-*.safetensors filter=lfs diff=lfs merge=lfs -text
-saved_model/**/* filter=lfs diff=lfs merge=lfs -text
-*.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tar filter=lfs diff=lfs merge=lfs -text
-*.tflite filter=lfs diff=lfs merge=lfs -text
-*.tgz filter=lfs diff=lfs merge=lfs -text
-*.wasm filter=lfs diff=lfs merge=lfs -text
-*.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
-*.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text

+# Git LFS configuration for HuggingFace Hub
+# Auto-generated for ML repository
+# Model weights
+*.safetensors filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+# Quantized models
+*.gguf filter=lfs diff=lfs merge=lfs -text
+*.ggml filter=lfs diff=lfs merge=lfs -text
+# ONNX models
 *.onnx filter=lfs diff=lfs merge=lfs -text
+# Tokenizer files (large vocab)
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
+vocab.json filter=lfs diff=lfs merge=lfs -text
+# Large data files
 *.parquet filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+# Archives
+*.tar.gz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text

.gitignore CHANGED Viewed

@@ -1,4 +1,6 @@
 # Python
 __pycache__/
 *.py[cod]
 *$py.class
@@ -20,44 +22,111 @@ wheels/
 .installed.cfg
 *.egg
-# Virtual environments
 .venv/
 venv/
 ENV/
-# IDEs
 .idea/
 .vscode/
 *.swp
 *.swo
 # Jupyter
 .ipynb_checkpoints/
-# Testing
 .pytest_cache/
 .coverage
 htmlcov/
 .tox/
 .nox/
-# mypy
 .mypy_cache/
 # Archives
 *.tar.gz
 *.zip
-# Model artifacts (large files)
 *.safetensors
 *.bin
 *.gguf
-# Training outputs (generated)
 outputs/
 checkpoints/
 lora-output/
-# OS
 .DS_Store
 Thumbs.db

+# =============================================================================
 # Python
+# =============================================================================
 __pycache__/
 *.py[cod]
 *$py.class
 .installed.cfg
 *.egg
+# =============================================================================
+# Virtual Environments
+# =============================================================================
 .venv/
 venv/
 ENV/
+env/
+# =============================================================================
+# IDEs and Editors
+# =============================================================================
 .idea/
 .vscode/
 *.swp
 *.swo
+*~
+.spyderproject
+.spyproject
+# =============================================================================
 # Jupyter
+# =============================================================================
 .ipynb_checkpoints/
+# =============================================================================
+# Testing and Coverage
+# =============================================================================
 .pytest_cache/
 .coverage
+.coverage.*
 htmlcov/
 .tox/
 .nox/
+.cache/
+nosetests.xml
+coverage.xml
+*.cover
+# =============================================================================
+# Type Checking and Linting
+# =============================================================================
 .mypy_cache/
+.ruff_cache/
+.dmypy.json
+dmypy.json
+# =============================================================================
 # Archives
+# =============================================================================
 *.tar.gz
 *.zip
+*.rar
+*.7z
+# =============================================================================
+# Model Artifacts (Large Files - use Git LFS if needed)
+# =============================================================================
 *.safetensors
 *.bin
 *.gguf
+*.pt
+*.pth
+*.onnx
+*.h5
+*.pb
+# =============================================================================
+# Training Outputs (Generated)
+# =============================================================================
 outputs/
 checkpoints/
 lora-output/
+runs/
+wandb/
+lightning_logs/
+# =============================================================================
+# OS Files
+# =============================================================================
 .DS_Store
+.DS_Store?
+._*
+.Spotlight-V100
+.Trashes
+ehthumbs.db
 Thumbs.db
+# =============================================================================
+# Project-Specific
+# =============================================================================
+# Claude Code cache
+.claude/
+# UV lock file (regenerated from pyproject.toml)
+uv.lock
+# Pre-commit cache
+.pre-commit-config.yaml
+# Local config files
+.env
+.env.local
+*.local
+# Temporary files
+*.tmp
+*.temp
+*.bak

README.md CHANGED Viewed

@@ -1,77 +1,136 @@
 # prolewiki-llm
-GRPO fine-tuning and reward functions for training Marxist-Leninist language models.
 ## Overview
-This repository contains the AI training infrastructure for fine-tuning language models on Marxist-Leninist theory. It includes:
-- **Reward Functions**: Multi-layer reward system for GRPO training that prevents reward hacking
-- **Training Data**: Curated Q&A pairs and synthetic datasets for ideological consistency
-- **Training Scripts**: Ready-to-run notebooks for RunPod/cloud GPU training
-- **W&B Integration**: Weights & Biases logging for training observability
-## Related Projects
-- [pw-mcp](https://github.com/prolewiki/pw-mcp) - MCP server and ChromaDB pipeline for ProleWiki semantic search
-## Installation
 ```bash
-# Basic installation
-uv sync
-# Download spacy model (required for topic/coherence rewards)
-python -m spacy download en_core_web_sm
-# With training dependencies (for GPU training)
-uv sync --group training
-# Development
-uv sync --group dev
 ```
-## Usage
-### Reward Functions
-```python
-from prolewiki_llm import full_coherence_reward, format_reward
-# Combined 5-layer coherence check (recommended for training)
-reward = full_coherence_reward(
-    prompts=["What is imperialism?"],
-    completions=["<think>...</think>\n\nImperialism is..."],
-    answer="Lenin defined imperialism as..."
-)
-# Individual reward components
-format_score = format_reward(completions=["<think>...</think>\n\nAnswer..."])
 ```
-### Training
-See `notebooks/Marxist_GRPO_Training.ipynb` for a complete training example.
-## Project Structure
 ```
 prolewiki-llm/
 ├── src/prolewiki_llm/
-│   ├── grpo_rewards.py      # 17+ reward functions
-│   ├── wandb_logging.py     # W&B integration
-│   └── transform_to_grpo.py # Dataset conversion
-├── training_data/
-│   ├── synthetic_*.jsonl    # Training datasets
-│   ├── entity_whitelist.json # Anti-hallucination data
-│   └── MODEL_CARD.yaml      # Dataset documentation
 ├── notebooks/
-│   └── Marxist_GRPO_Training.ipynb
 ├── tests/
-│   └── unit/
-└── ai-docs/                 # AI-consumable documentation
 ```
 ## License
 AGPL-3.0-only

+---
+language:
+  - en
+license: agpl-3.0
+library_name: transformers
+tags:
+  - grpo
+  - rlhf
+  - fine-tuning
+  - marxism
+  - political-theory
+  - lora
+  - deepseek
+  - qwen
+datasets:
+  - prolewiki/qa-corpus
+base_model: unsloth/DeepSeek-R1-0528-Qwen3-8B
+pipeline_tag: text-generation
+---
 # prolewiki-llm
+GRPO fine-tuning infrastructure for training Marxist-Leninist language models.
 ## Overview
+This repository contains the AI training infrastructure for fine-tuning language models on Marxist-Leninist theory using GRPO (Group Relative Policy Optimization). It includes:
+- **Multi-Layer Reward System**: 17+ reward functions that prevent reward hacking (NLI coherence, self-consistency, structural analysis, topic relevance, depth scoring)
+- **Headless Training**: Docker container for automated RunPod deployment with auto-shutoff
+- **Jupyter Notebook**: Production-ready notebook optimized for A40/A100 GPUs
+- **Comprehensive Tests**: Unit and integration tests for all components
+## Quick Start
+### RunPod Deployment (Recommended)
 ```bash
+# 1. Build Docker image
+docker build -t marxist-grpo:latest -f docker/Dockerfile .
+# 2. Push to registry and deploy on RunPod
+# Use A40 (48GB, $0.35/hr) for best cost/performance
+# 3. Set environment variables on pod:
+#    - HF_TOKEN
+#    - WANDB_API_KEY
+#    - HF_REPO (optional, for model upload)
 ```
+### Local Development
+```bash
+# Install dependencies
+uv sync --group dev
+# Download spaCy model (required for rewards)
+python -m spacy download en_core_web_sm
+# Run tests
+uv run pytest -m "not slow and not gpu"
 ```
+## Repository Structure
 ```
 prolewiki-llm/
 ├── src/prolewiki_llm/
+│   ├── grpo_rewards.py       # Multi-layer reward functions
+│   ├── train_headless.py     # Headless training script
+│   ├── export_grpo_dataset.py # Dataset conversion
+│   └── wandb_logging.py      # W&B integration
+├── docker/
+│   ├── Dockerfile            # Training container
+│   ├── start.sh              # Entrypoint with auto-shutoff
+│   └── .env.example          # Environment reference
 ├── notebooks/
+│   └── Marxist_GRPO_RunPod_Optimized.ipynb
 ├── tests/
+│   ├── unit/                 # Unit tests
+│   ├── integration/          # Shell script tests
+│   └── fixtures/             # Mock commands
+└── training_data/
+    └── grpo_dataset.jsonl    # Training data
 ```
+## Reward Functions
+The reward system uses multiple layers to ensure quality responses:
+| Layer | Function | Purpose |
+|-------|----------|---------|
+| 1 | `match_format_exactly` | Validate `<think>...</think>` tags |
+| 2 | `nli_coherence_reward` | Response entails ground truth (BART-MNLI) |
+| 3 | `self_consistency_reward` | No internal contradictions |
+| 4 | `structural_coherence_reward` | Terms in proper syntactic roles (spaCy) |
+| 5 | `topic_relevance_reward` | Answer addresses the question |
+| 6 | `interconnection_depth_reward` | Rewards analysis, penalizes buzzword salad |
+Use `full_coherence_reward()` for the complete 6-layer check, or `robust_coherence_reward()` for a faster 3-layer version.
+## Training Configuration
+Key environment variables for `train_headless.py`:
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `MODEL_NAME` | `unsloth/DeepSeek-R1-0528-Qwen3-8B` | Base model |
+| `MAX_STEPS` | `500` | Training steps |
+| `BATCH_SIZE` | `2` | Per-device batch size |
+| `LEARNING_RATE` | `5e-6` | Learning rate |
+| `REWARD_MODE` | `FULL` | `FULL`, `ROBUST`, or `LEGACY` |
+| `HF_REPO` | `prolewiki/marxist-grpo-lora` | Upload destination |
+## GPU Requirements
+| GPU | VRAM | Price | Recommendation |
+|-----|------|-------|----------------|
+| **A40** | 48GB | $0.35/hr | Best value for 8B models |
+| A100 | 80GB | $1.19/hr | Overkill for this use case |
+| RTX 4090 | 24GB | $0.34/hr | Too small for 16-bit GRPO |
+## Critical Notes
+1. **torch.compile must be disabled** on RunPod/Jupyter (causes hangs)
+2. **load_in_4bit=False** is required for GRPO (16-bit LoRA adapters)
+3. **use_gradient_checkpointing=True** (not `"unsloth"`) for stability
+## Related Projects
+- [ProleWiki](https://en.prolewiki.org/) - The Marxist-Leninist encyclopedia
+- [pw-mcp](https://github.com/prolewiki/pw-mcp) - MCP server for ProleWiki semantic search
 ## License
 AGPL-3.0-only