affine-0004-improved

Fine-tuned version of Affine-0004 (Qwen3-4B) optimized for reasoning and interactive agent tasks.

Model Description

This model is fine-tuned on 8 diverse environments testing:

Logical Reasoning: SAT solving, abduction, deduction
Interactive Agents: WebShop, AlfWorld, BabyAI, SciWorld, TextCraft

The model uses LoRA (Low-Rank Adaptation) for efficient fine-tuning and supports:

Extended context length (256K tokens)
Flash Attention 2 for efficient inference
BFloat16 precision

Training Details

Base Model

Architecture: Qwen3ForCausalLM
Parameters: 4B
Context Length: 262,144 tokens

Fine-tuning

Method: LoRA + PPO (Proximal Policy Optimization)
LoRA Rank: 64
LoRA Alpha: 128
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Training Data

Affine Tasks: 10,000+ samples from satpalsr/rl-python
AgentGym Tasks: 5,000+ episodes across 5 environments

Hyperparameters

Learning Rate: 2e-5
Batch Size: 16 (effective)
Epochs: 3
Optimizer: AdamW
Scheduler: Cosine with warmup

Performance Metrics

Overall Accuracy: 0.4994
Mean Score: 0.4975
Success Rate: 0.8119

Per-Environment Performance

Environment	Accuracy	Mean Score	Confidence Interval
affine:sat	0.5000	0.5000	[0.4548, 0.5452]
affine:abd	0.4950	0.4950	[0.4499, 0.5402]
affine:ded	0.5000	0.5000	[0.4548, 0.5452]
agentgym:webshop	0.4950	0.4771	[0.4321, 0.5223]
agentgym:alfworld	0.5100	0.5190	[0.4737, 0.5640]
agentgym:babyai	0.4600	0.4900	[0.4450, 0.5353]
agentgym:sciworld	0.5350	0.4959	[0.4508, 0.5411]
agentgym:textcraft	0.5000	0.5032	[0.4580, 0.5484]

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "YOUR_USERNAME/affine-0004-improved",
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "YOUR_USERNAME/affine-0004-improved",
    trust_remote_code=True
)

# Generate response
prompt = "Solve the following SAT problem: (x1 ∨ x2) ∧ (¬x1 ∨ x3)"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Evaluation

The model is evaluated on the Affine subnet using the following criteria:

Minimum 200 samples per environment
Pareto dominance scoring across all environments
Bayesian confidence intervals (80% confidence level)

Limitations and Bias

The model is optimized for reasoning and agent tasks
Performance may vary on out-of-distribution tasks
Inherits biases from the base Qwen3 model and training data

Citation

@misc{affine-model-training,
  title={Fine-tuned Affine Model for Reasoning and Interactive Agents},
  year={2024},
  url={https://huggingface.co/YOUR_USERNAME/affine-0004-improved}
}

License

Apache 2.0

Acknowledgments

Base model: Affine-0004
Training framework: Transformers, PEFT, TRL
Evaluation: Affine Subnet (Bittensor)

Downloads last month: 3

Safetensors

Model size

4B params

Tensor type

BF16

carlorrr
/

affine-0004-improved