Text Generation
Transformers
English
grpo
rlhf
fine-tuning
marxism
political-theory
lora
deepseek
qwen
Instructions to use percyraskova/llm-training with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use percyraskova/llm-training with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="percyraskova/llm-training")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("percyraskova/llm-training", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use percyraskova/llm-training with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "percyraskova/llm-training" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "percyraskova/llm-training", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/percyraskova/llm-training
- SGLang
How to use percyraskova/llm-training with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "percyraskova/llm-training" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "percyraskova/llm-training", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "percyraskova/llm-training" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "percyraskova/llm-training", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use percyraskova/llm-training with Docker Model Runner:
docker model run hf.co/percyraskova/llm-training
prolewiki-llm
GRPO fine-tuning infrastructure for training Marxist-Leninist language models.
Overview
This repository contains the AI training infrastructure for fine-tuning language models on Marxist-Leninist theory using GRPO (Group Relative Policy Optimization). It includes:
- Multi-Layer Reward System: 17+ reward functions that prevent reward hacking (NLI coherence, self-consistency, structural analysis, topic relevance, depth scoring)
- Headless Training: Docker container for automated RunPod deployment with auto-shutoff
- Jupyter Notebook: Production-ready notebook optimized for A40/A100 GPUs
- Comprehensive Tests: Unit and integration tests for all components
Quick Start
RunPod Deployment (Recommended)
# 1. Build Docker image
docker build -t marxist-grpo:latest -f docker/Dockerfile .
# 2. Push to registry and deploy on RunPod
# Use A40 (48GB, $0.35/hr) for best cost/performance
# 3. Set environment variables on pod:
# - HF_TOKEN
# - WANDB_API_KEY
# - HF_REPO (optional, for model upload)
Local Development
# Install dependencies
uv sync --group dev
# Download spaCy model (required for rewards)
python -m spacy download en_core_web_sm
# Run tests
uv run pytest -m "not slow and not gpu"
Repository Structure
prolewiki-llm/
βββ src/prolewiki_llm/
β βββ grpo_rewards.py # Multi-layer reward functions
β βββ train_headless.py # Headless training script
β βββ export_grpo_dataset.py # Dataset conversion
β βββ wandb_logging.py # W&B integration
βββ docker/
β βββ Dockerfile # Training container
β βββ start.sh # Entrypoint with auto-shutoff
β βββ .env.example # Environment reference
βββ notebooks/
β βββ Marxist_GRPO_RunPod_Optimized.ipynb
βββ tests/
β βββ unit/ # Unit tests
β βββ integration/ # Shell script tests
β βββ fixtures/ # Mock commands
βββ training_data/
βββ grpo_dataset.jsonl # Training data
Reward Functions
The reward system uses multiple layers to ensure quality responses:
| Layer | Function | Purpose |
|---|---|---|
| 1 | match_format_exactly |
Validate <think>...</think> tags |
| 2 | nli_coherence_reward |
Response entails ground truth (BART-MNLI) |
| 3 | self_consistency_reward |
No internal contradictions |
| 4 | structural_coherence_reward |
Terms in proper syntactic roles (spaCy) |
| 5 | topic_relevance_reward |
Answer addresses the question |
| 6 | interconnection_depth_reward |
Rewards analysis, penalizes buzzword salad |
Use full_coherence_reward() for the complete 6-layer check, or robust_coherence_reward() for a faster 3-layer version.
Training Configuration
Key environment variables for train_headless.py:
| Variable | Default | Description |
|---|---|---|
MODEL_NAME |
unsloth/DeepSeek-R1-0528-Qwen3-8B |
Base model |
MAX_STEPS |
500 |
Training steps |
BATCH_SIZE |
2 |
Per-device batch size |
LEARNING_RATE |
5e-6 |
Learning rate |
REWARD_MODE |
FULL |
FULL, ROBUST, or LEGACY |
HF_REPO |
prolewiki/marxist-grpo-lora |
Upload destination |
GPU Requirements
| GPU | VRAM | Price | Recommendation |
|---|---|---|---|
| A40 | 48GB | $0.35/hr | Best value for 8B models |
| A100 | 80GB | $1.19/hr | Overkill for this use case |
| RTX 4090 | 24GB | $0.34/hr | Too small for 16-bit GRPO |
Critical Notes
- torch.compile must be disabled on RunPod/Jupyter (causes hangs)
- load_in_4bit=False is required for GRPO (16-bit LoRA adapters)
- use_gradient_checkpointing=True (not
"unsloth") for stability
Related Projects
License
AGPL-3.0-only
Model tree for percyraskova/llm-training
Base model
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B Finetuned
unsloth/DeepSeek-R1-0528-Qwen3-8B