Upload folder using huggingface_hub

1bc8867 verified about 2 months ago

4.35 kB

	---
	language:
	- en
	license: agpl-3.0
	library_name: transformers
	tags:
	- grpo
	- rlhf
	- fine-tuning
	- marxism
	- political-theory
	- lora
	- deepseek
	- qwen
	datasets:
	- prolewiki/qa-corpus
	base_model: unsloth/DeepSeek-R1-0528-Qwen3-8B
	pipeline_tag: text-generation
	---

	# prolewiki-llm

	GRPO fine-tuning infrastructure for training Marxist-Leninist language models.

	## Overview

	This repository contains the AI training infrastructure for fine-tuning language models on Marxist-Leninist theory using GRPO (Group Relative Policy Optimization). It includes:

	- Multi-Layer Reward System: 17+ reward functions that prevent reward hacking (NLI coherence, self-consistency, structural analysis, topic relevance, depth scoring)
	- Headless Training: Docker container for automated RunPod deployment with auto-shutoff
	- Jupyter Notebook: Production-ready notebook optimized for A40/A100 GPUs
	- Comprehensive Tests: Unit and integration tests for all components

	## Quick Start

	### RunPod Deployment (Recommended)

	```bash
	# 1. Build Docker image
	docker build -t marxist-grpo:latest -f docker/Dockerfile .

	# 2. Push to registry and deploy on RunPod
	# Use A40 (48GB, $0.35/hr) for best cost/performance

	# 3. Set environment variables on pod:
	# - HF_TOKEN
	# - WANDB_API_KEY
	# - HF_REPO (optional, for model upload)
	```

	### Local Development

	```bash
	# Install dependencies
	uv sync --group dev

	# Download spaCy model (required for rewards)
	python -m spacy download en_core_web_sm

	# Run tests
	uv run pytest -m "not slow and not gpu"
	```

	## Repository Structure

	```
	prolewiki-llm/
	├── src/prolewiki_llm/
	│ ├── grpo_rewards.py # Multi-layer reward functions
	│ ├── train_headless.py # Headless training script
	│ ├── export_grpo_dataset.py # Dataset conversion
	│ └── wandb_logging.py # W&B integration
	├── docker/
	│ ├── Dockerfile # Training container
	│ ├── start.sh # Entrypoint with auto-shutoff
	│ └── .env.example # Environment reference
	├── notebooks/
	│ └── Marxist_GRPO_RunPod_Optimized.ipynb
	├── tests/
	│ ├── unit/ # Unit tests
	│ ├── integration/ # Shell script tests
	│ └── fixtures/ # Mock commands
	└── training_data/
	└── grpo_dataset.jsonl # Training data
	```

	## Reward Functions

	The reward system uses multiple layers to ensure quality responses:

	\| Layer \| Function \| Purpose \|
	\|-------\|----------\|---------\|
	\| 1 \| `match_format_exactly` \| Validate `<think>...</think>` tags \|
	\| 2 \| `nli_coherence_reward` \| Response entails ground truth (BART-MNLI) \|
	\| 3 \| `self_consistency_reward` \| No internal contradictions \|
	\| 4 \| `structural_coherence_reward` \| Terms in proper syntactic roles (spaCy) \|
	\| 5 \| `topic_relevance_reward` \| Answer addresses the question \|
	\| 6 \| `interconnection_depth_reward` \| Rewards analysis, penalizes buzzword salad \|

	Use `full_coherence_reward()` for the complete 6-layer check, or `robust_coherence_reward()` for a faster 3-layer version.

	## Training Configuration

	Key environment variables for `train_headless.py`:

	\| Variable \| Default \| Description \|
	\|----------\|---------\|-------------\|
	\| `MODEL_NAME` \| `unsloth/DeepSeek-R1-0528-Qwen3-8B` \| Base model \|
	\| `MAX_STEPS` \| `500` \| Training steps \|
	\| `BATCH_SIZE` \| `2` \| Per-device batch size \|
	\| `LEARNING_RATE` \| `5e-6` \| Learning rate \|
	\| `REWARD_MODE` \| `FULL` \| `FULL`, `ROBUST`, or `LEGACY` \|
	\| `HF_REPO` \| `prolewiki/marxist-grpo-lora` \| Upload destination \|

	## GPU Requirements

	\| GPU \| VRAM \| Price \| Recommendation \|
	\|-----\|------\|-------\|----------------\|
	\| A40 \| 48GB \| $0.35/hr \| Best value for 8B models \|
	\| A100 \| 80GB \| $1.19/hr \| Overkill for this use case \|
	\| RTX 4090 \| 24GB \| $0.34/hr \| Too small for 16-bit GRPO \|

	## Critical Notes

	1. torch.compile must be disabled on RunPod/Jupyter (causes hangs)
	2. load_in_4bit=False is required for GRPO (16-bit LoRA adapters)
	3. use_gradient_checkpointing=True (not `"unsloth"`) for stability

	## Related Projects

	- [ProleWiki](https://en.prolewiki.org/) - The Marxist-Leninist encyclopedia
	- [pw-mcp](https://github.com/prolewiki/pw-mcp) - MCP server for ProleWiki semantic search

	## License

	AGPL-3.0-only