QLoRA + OpenEnv Training Architecture (HF-Centric)
π§ SYSTEM OVERVIEW
You are building a QLoRA + reward-based training system fully inside Hugging Face infrastructure.
Core principles:
- GPU = compute only (ephemeral)
- Hub = source of truth (persistent)
- Training = reproducible + resumable
- Metrics = structured + queryable
π§± ARCHITECTURE
1. Code Repository (Control Plane)
Stored on Hugging Face Hub
Contents:
- train.py
- openenv_loop.py
- model_utils.py
- eval.py
- requirements.txt
- config.yaml
2. Training Runtime (Execution Plane)
Runs via Hugging Face Training Jobs
- GPU: A10G recommended
- Pulls code from Hub
- Uses temporary disk
- Must explicitly persist outputs
3. Model Storage
Stored on Hugging Face Hub
Contents:
- LoRA adapter weights
- Optional merged model
4. Metrics Storage
Stored as dataset on Hugging Face Hub
Example schema: { "run_id": "exp_001", "step": 120, "train_reward": 0.82, "eval_reward_base": 0.65, "eval_reward_ft": 0.78, "loss": 1.23 }
5. Optional Artifacts
- plots (.png)
- logs
- evaluation outputs
π TRAINING PROCESS FLOW
Step 1 β Initialization
- Load base model (4-bit)
- Attach LoRA adapters
- Load tokenizer
- Resume from checkpoint if exists
Step 2 β Training Loop
generate β evaluate β reward β update β log β checkpoint
Step 3 β Logging
- Buffer metrics locally
- Push every N steps to dataset
Step 4 β Evaluation
- Evaluate base vs fine-tuned
- Log results
Step 5 β Checkpointing
- Save locally
- Push to Hub
- Ensure resumability
Step 6 β Visualization
- Generate plots
- Save + push
π RESUME LOGIC
- Check latest checkpoint
- Resume if exists
- Else start fresh
π POST-TRAINING
- Load dataset
- Plot reward curves
- Compare models
βοΈ CONFIG
- batch_size
- learning_rate
- lora_rank
- eval_interval
- checkpoint_interval
- push_interval
- run_id
β οΈ FAILURE HANDLING
- Frequent checkpointing
- Frequent metric pushes
π§ FINAL MODEL
HF Hub = database
HF Jobs = compute
QLoRA = training layer
OpenEnv = reward
Datasets = tracking
β SUCCESS CRITERIA
- Resumable training
- Persistent metrics
- Comparable runs
- No data loss