percyraskova commited on
Commit
1bc8867
Β·
verified Β·
1 Parent(s): 81b3473

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. .gitattributes +24 -29
  2. .gitignore +76 -7
  3. README.md +106 -47
.gitattributes CHANGED
@@ -1,35 +1,30 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
 
 
 
3
  *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
 
5
  *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
  *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
  *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
18
  *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
- *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
1
+ # Git LFS configuration for HuggingFace Hub
2
+ # Auto-generated for ML repository
3
+
4
+ # Model weights
5
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
6
  *.bin filter=lfs diff=lfs merge=lfs -text
7
+ *.pt filter=lfs diff=lfs merge=lfs -text
8
+ *.pth filter=lfs diff=lfs merge=lfs -text
9
  *.ckpt filter=lfs diff=lfs merge=lfs -text
 
 
10
  *.h5 filter=lfs diff=lfs merge=lfs -text
11
+ *.pb filter=lfs diff=lfs merge=lfs -text
12
+
13
+ # Quantized models
14
+ *.gguf filter=lfs diff=lfs merge=lfs -text
15
+ *.ggml filter=lfs diff=lfs merge=lfs -text
16
+
17
+ # ONNX models
18
  *.onnx filter=lfs diff=lfs merge=lfs -text
19
+
20
+ # Tokenizer files (large vocab)
21
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
22
+ vocab.json filter=lfs diff=lfs merge=lfs -text
23
+
24
+ # Large data files
25
  *.parquet filter=lfs diff=lfs merge=lfs -text
26
+ *.arrow filter=lfs diff=lfs merge=lfs -text
27
+
28
+ # Archives
29
+ *.tar.gz filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
30
  *.zip filter=lfs diff=lfs merge=lfs -text
 
 
.gitignore CHANGED
@@ -1,4 +1,6 @@
 
1
  # Python
 
2
  __pycache__/
3
  *.py[cod]
4
  *$py.class
@@ -20,44 +22,111 @@ wheels/
20
  .installed.cfg
21
  *.egg
22
 
23
- # Virtual environments
 
 
24
  .venv/
25
  venv/
26
  ENV/
 
27
 
28
- # IDEs
 
 
29
  .idea/
30
  .vscode/
31
  *.swp
32
  *.swo
 
 
 
33
 
 
34
  # Jupyter
 
35
  .ipynb_checkpoints/
36
 
37
- # Testing
 
 
38
  .pytest_cache/
39
  .coverage
 
40
  htmlcov/
41
  .tox/
42
  .nox/
 
 
 
 
43
 
44
- # mypy
 
 
45
  .mypy_cache/
 
 
 
46
 
 
47
  # Archives
 
48
  *.tar.gz
49
  *.zip
 
 
50
 
51
- # Model artifacts (large files)
 
 
52
  *.safetensors
53
  *.bin
54
  *.gguf
 
 
 
 
 
55
 
56
- # Training outputs (generated)
 
 
57
  outputs/
58
  checkpoints/
59
  lora-output/
 
 
 
60
 
61
- # OS
 
 
62
  .DS_Store
 
 
 
 
 
63
  Thumbs.db
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # =============================================================================
2
  # Python
3
+ # =============================================================================
4
  __pycache__/
5
  *.py[cod]
6
  *$py.class
 
22
  .installed.cfg
23
  *.egg
24
 
25
+ # =============================================================================
26
+ # Virtual Environments
27
+ # =============================================================================
28
  .venv/
29
  venv/
30
  ENV/
31
+ env/
32
 
33
+ # =============================================================================
34
+ # IDEs and Editors
35
+ # =============================================================================
36
  .idea/
37
  .vscode/
38
  *.swp
39
  *.swo
40
+ *~
41
+ .spyderproject
42
+ .spyproject
43
 
44
+ # =============================================================================
45
  # Jupyter
46
+ # =============================================================================
47
  .ipynb_checkpoints/
48
 
49
+ # =============================================================================
50
+ # Testing and Coverage
51
+ # =============================================================================
52
  .pytest_cache/
53
  .coverage
54
+ .coverage.*
55
  htmlcov/
56
  .tox/
57
  .nox/
58
+ .cache/
59
+ nosetests.xml
60
+ coverage.xml
61
+ *.cover
62
 
63
+ # =============================================================================
64
+ # Type Checking and Linting
65
+ # =============================================================================
66
  .mypy_cache/
67
+ .ruff_cache/
68
+ .dmypy.json
69
+ dmypy.json
70
 
71
+ # =============================================================================
72
  # Archives
73
+ # =============================================================================
74
  *.tar.gz
75
  *.zip
76
+ *.rar
77
+ *.7z
78
 
79
+ # =============================================================================
80
+ # Model Artifacts (Large Files - use Git LFS if needed)
81
+ # =============================================================================
82
  *.safetensors
83
  *.bin
84
  *.gguf
85
+ *.pt
86
+ *.pth
87
+ *.onnx
88
+ *.h5
89
+ *.pb
90
 
91
+ # =============================================================================
92
+ # Training Outputs (Generated)
93
+ # =============================================================================
94
  outputs/
95
  checkpoints/
96
  lora-output/
97
+ runs/
98
+ wandb/
99
+ lightning_logs/
100
 
101
+ # =============================================================================
102
+ # OS Files
103
+ # =============================================================================
104
  .DS_Store
105
+ .DS_Store?
106
+ ._*
107
+ .Spotlight-V100
108
+ .Trashes
109
+ ehthumbs.db
110
  Thumbs.db
111
+
112
+ # =============================================================================
113
+ # Project-Specific
114
+ # =============================================================================
115
+ # Claude Code cache
116
+ .claude/
117
+
118
+ # UV lock file (regenerated from pyproject.toml)
119
+ uv.lock
120
+
121
+ # Pre-commit cache
122
+ .pre-commit-config.yaml
123
+
124
+ # Local config files
125
+ .env
126
+ .env.local
127
+ *.local
128
+
129
+ # Temporary files
130
+ *.tmp
131
+ *.temp
132
+ *.bak
README.md CHANGED
@@ -1,77 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # prolewiki-llm
2
 
3
- GRPO fine-tuning and reward functions for training Marxist-Leninist language models.
4
 
5
  ## Overview
6
 
7
- This repository contains the AI training infrastructure for fine-tuning language models on Marxist-Leninist theory. It includes:
8
-
9
- - **Reward Functions**: Multi-layer reward system for GRPO training that prevents reward hacking
10
- - **Training Data**: Curated Q&A pairs and synthetic datasets for ideological consistency
11
- - **Training Scripts**: Ready-to-run notebooks for RunPod/cloud GPU training
12
- - **W&B Integration**: Weights & Biases logging for training observability
13
 
14
- ## Related Projects
 
 
 
15
 
16
- - [pw-mcp](https://github.com/prolewiki/pw-mcp) - MCP server and ChromaDB pipeline for ProleWiki semantic search
17
 
18
- ## Installation
19
 
20
  ```bash
21
- # Basic installation
22
- uv sync
23
 
24
- # Download spacy model (required for topic/coherence rewards)
25
- python -m spacy download en_core_web_sm
26
 
27
- # With training dependencies (for GPU training)
28
- uv sync --group training
29
-
30
- # Development
31
- uv sync --group dev
32
  ```
33
 
34
- ## Usage
35
-
36
- ### Reward Functions
37
 
38
- ```python
39
- from prolewiki_llm import full_coherence_reward, format_reward
 
40
 
41
- # Combined 5-layer coherence check (recommended for training)
42
- reward = full_coherence_reward(
43
- prompts=["What is imperialism?"],
44
- completions=["<think>...</think>\n\nImperialism is..."],
45
- answer="Lenin defined imperialism as..."
46
- )
47
 
48
- # Individual reward components
49
- format_score = format_reward(completions=["<think>...</think>\n\nAnswer..."])
50
  ```
51
 
52
- ### Training
53
-
54
- See `notebooks/Marxist_GRPO_Training.ipynb` for a complete training example.
55
-
56
- ## Project Structure
57
 
58
  ```
59
  prolewiki-llm/
60
  β”œβ”€β”€ src/prolewiki_llm/
61
- β”‚ β”œβ”€β”€ grpo_rewards.py # 17+ reward functions
62
- β”‚ β”œβ”€β”€ wandb_logging.py # W&B integration
63
- β”‚ └── transform_to_grpo.py # Dataset conversion
64
- β”œβ”€β”€ training_data/
65
- β”‚ β”œβ”€β”€ synthetic_*.jsonl # Training datasets
66
- β”‚ β”œβ”€β”€ entity_whitelist.json # Anti-hallucination data
67
- β”‚ └── MODEL_CARD.yaml # Dataset documentation
 
68
  β”œβ”€β”€ notebooks/
69
- β”‚ └── Marxist_GRPO_Training.ipynb
70
  β”œβ”€β”€ tests/
71
- β”‚ └── unit/
72
- └── ai-docs/ # AI-consumable documentation
 
 
 
73
  ```
74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  ## License
76
 
77
  AGPL-3.0-only
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: agpl-3.0
5
+ library_name: transformers
6
+ tags:
7
+ - grpo
8
+ - rlhf
9
+ - fine-tuning
10
+ - marxism
11
+ - political-theory
12
+ - lora
13
+ - deepseek
14
+ - qwen
15
+ datasets:
16
+ - prolewiki/qa-corpus
17
+ base_model: unsloth/DeepSeek-R1-0528-Qwen3-8B
18
+ pipeline_tag: text-generation
19
+ ---
20
+
21
  # prolewiki-llm
22
 
23
+ GRPO fine-tuning infrastructure for training Marxist-Leninist language models.
24
 
25
  ## Overview
26
 
27
+ This repository contains the AI training infrastructure for fine-tuning language models on Marxist-Leninist theory using GRPO (Group Relative Policy Optimization). It includes:
 
 
 
 
 
28
 
29
+ - **Multi-Layer Reward System**: 17+ reward functions that prevent reward hacking (NLI coherence, self-consistency, structural analysis, topic relevance, depth scoring)
30
+ - **Headless Training**: Docker container for automated RunPod deployment with auto-shutoff
31
+ - **Jupyter Notebook**: Production-ready notebook optimized for A40/A100 GPUs
32
+ - **Comprehensive Tests**: Unit and integration tests for all components
33
 
34
+ ## Quick Start
35
 
36
+ ### RunPod Deployment (Recommended)
37
 
38
  ```bash
39
+ # 1. Build Docker image
40
+ docker build -t marxist-grpo:latest -f docker/Dockerfile .
41
 
42
+ # 2. Push to registry and deploy on RunPod
43
+ # Use A40 (48GB, $0.35/hr) for best cost/performance
44
 
45
+ # 3. Set environment variables on pod:
46
+ # - HF_TOKEN
47
+ # - WANDB_API_KEY
48
+ # - HF_REPO (optional, for model upload)
 
49
  ```
50
 
51
+ ### Local Development
 
 
52
 
53
+ ```bash
54
+ # Install dependencies
55
+ uv sync --group dev
56
 
57
+ # Download spaCy model (required for rewards)
58
+ python -m spacy download en_core_web_sm
 
 
 
 
59
 
60
+ # Run tests
61
+ uv run pytest -m "not slow and not gpu"
62
  ```
63
 
64
+ ## Repository Structure
 
 
 
 
65
 
66
  ```
67
  prolewiki-llm/
68
  β”œβ”€β”€ src/prolewiki_llm/
69
+ β”‚ β”œβ”€β”€ grpo_rewards.py # Multi-layer reward functions
70
+ β”‚ β”œβ”€β”€ train_headless.py # Headless training script
71
+ β”‚ β”œβ”€β”€ export_grpo_dataset.py # Dataset conversion
72
+ β”‚ └── wandb_logging.py # W&B integration
73
+ β”œβ”€β”€ docker/
74
+ β”‚ β”œβ”€β”€ Dockerfile # Training container
75
+ β”‚ β”œβ”€β”€ start.sh # Entrypoint with auto-shutoff
76
+ β”‚ └── .env.example # Environment reference
77
  β”œβ”€β”€ notebooks/
78
+ β”‚ └── Marxist_GRPO_RunPod_Optimized.ipynb
79
  β”œβ”€β”€ tests/
80
+ β”‚ β”œβ”€β”€ unit/ # Unit tests
81
+ β”‚ β”œβ”€β”€ integration/ # Shell script tests
82
+ β”‚ └── fixtures/ # Mock commands
83
+ └── training_data/
84
+ └── grpo_dataset.jsonl # Training data
85
  ```
86
 
87
+ ## Reward Functions
88
+
89
+ The reward system uses multiple layers to ensure quality responses:
90
+
91
+ | Layer | Function | Purpose |
92
+ |-------|----------|---------|
93
+ | 1 | `match_format_exactly` | Validate `<think>...</think>` tags |
94
+ | 2 | `nli_coherence_reward` | Response entails ground truth (BART-MNLI) |
95
+ | 3 | `self_consistency_reward` | No internal contradictions |
96
+ | 4 | `structural_coherence_reward` | Terms in proper syntactic roles (spaCy) |
97
+ | 5 | `topic_relevance_reward` | Answer addresses the question |
98
+ | 6 | `interconnection_depth_reward` | Rewards analysis, penalizes buzzword salad |
99
+
100
+ Use `full_coherence_reward()` for the complete 6-layer check, or `robust_coherence_reward()` for a faster 3-layer version.
101
+
102
+ ## Training Configuration
103
+
104
+ Key environment variables for `train_headless.py`:
105
+
106
+ | Variable | Default | Description |
107
+ |----------|---------|-------------|
108
+ | `MODEL_NAME` | `unsloth/DeepSeek-R1-0528-Qwen3-8B` | Base model |
109
+ | `MAX_STEPS` | `500` | Training steps |
110
+ | `BATCH_SIZE` | `2` | Per-device batch size |
111
+ | `LEARNING_RATE` | `5e-6` | Learning rate |
112
+ | `REWARD_MODE` | `FULL` | `FULL`, `ROBUST`, or `LEGACY` |
113
+ | `HF_REPO` | `prolewiki/marxist-grpo-lora` | Upload destination |
114
+
115
+ ## GPU Requirements
116
+
117
+ | GPU | VRAM | Price | Recommendation |
118
+ |-----|------|-------|----------------|
119
+ | **A40** | 48GB | $0.35/hr | Best value for 8B models |
120
+ | A100 | 80GB | $1.19/hr | Overkill for this use case |
121
+ | RTX 4090 | 24GB | $0.34/hr | Too small for 16-bit GRPO |
122
+
123
+ ## Critical Notes
124
+
125
+ 1. **torch.compile must be disabled** on RunPod/Jupyter (causes hangs)
126
+ 2. **load_in_4bit=False** is required for GRPO (16-bit LoRA adapters)
127
+ 3. **use_gradient_checkpointing=True** (not `"unsloth"`) for stability
128
+
129
+ ## Related Projects
130
+
131
+ - [ProleWiki](https://en.prolewiki.org/) - The Marxist-Leninist encyclopedia
132
+ - [pw-mcp](https://github.com/prolewiki/pw-mcp) - MCP server for ProleWiki semantic search
133
+
134
  ## License
135
 
136
  AGPL-3.0-only