affine-0004-improved
Fine-tuned version of Affine-0004 (Qwen3-4B) optimized for reasoning and interactive agent tasks.
Model Description
This model is fine-tuned on 8 diverse environments testing:
- Logical Reasoning: SAT solving, abduction, deduction
- Interactive Agents: WebShop, AlfWorld, BabyAI, SciWorld, TextCraft
The model uses LoRA (Low-Rank Adaptation) for efficient fine-tuning and supports:
- Extended context length (256K tokens)
- Flash Attention 2 for efficient inference
- BFloat16 precision
Training Details
Base Model
- Architecture: Qwen3ForCausalLM
- Parameters: 4B
- Context Length: 262,144 tokens
Fine-tuning
- Method: LoRA + PPO (Proximal Policy Optimization)
- LoRA Rank: 64
- LoRA Alpha: 128
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Data
- Affine Tasks: 10,000+ samples from satpalsr/rl-python
- AgentGym Tasks: 5,000+ episodes across 5 environments
Hyperparameters
- Learning Rate: 2e-5
- Batch Size: 16 (effective)
- Epochs: 3
- Optimizer: AdamW
- Scheduler: Cosine with warmup
Performance Metrics
- Overall Accuracy: 0.4994
- Mean Score: 0.4975
- Success Rate: 0.8119
Per-Environment Performance
| Environment | Accuracy | Mean Score | Confidence Interval |
|---|---|---|---|
| affine:sat | 0.5000 | 0.5000 | [0.4548, 0.5452] |
| affine:abd | 0.4950 | 0.4950 | [0.4499, 0.5402] |
| affine:ded | 0.5000 | 0.5000 | [0.4548, 0.5452] |
| agentgym:webshop | 0.4950 | 0.4771 | [0.4321, 0.5223] |
| agentgym:alfworld | 0.5100 | 0.5190 | [0.4737, 0.5640] |
| agentgym:babyai | 0.4600 | 0.4900 | [0.4450, 0.5353] |
| agentgym:sciworld | 0.5350 | 0.4959 | [0.4508, 0.5411] |
| agentgym:textcraft | 0.5000 | 0.5032 | [0.4580, 0.5484] |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"YOUR_USERNAME/affine-0004-improved",
torch_dtype="auto",
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"YOUR_USERNAME/affine-0004-improved",
trust_remote_code=True
)
# Generate response
prompt = "Solve the following SAT problem: (x1 ∨ x2) ∧ (¬x1 ∨ x3)"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Evaluation
The model is evaluated on the Affine subnet using the following criteria:
- Minimum 200 samples per environment
- Pareto dominance scoring across all environments
- Bayesian confidence intervals (80% confidence level)
Limitations and Bias
- The model is optimized for reasoning and agent tasks
- Performance may vary on out-of-distribution tasks
- Inherits biases from the base Qwen3 model and training data
Citation
@misc{affine-model-training,
title={Fine-tuned Affine Model for Reasoning and Interactive Agents},
year={2024},
url={https://huggingface.co/YOUR_USERNAME/affine-0004-improved}
}
License
Apache 2.0
Acknowledgments
- Base model: Affine-0004
- Training framework: Transformers, PEFT, TRL
- Evaluation: Affine Subnet (Bittensor)
- Downloads last month
- 4