affine-0004-improved

Fine-tuned version of Affine-0004 (Qwen3-4B) optimized for reasoning and interactive agent tasks.

Model Description

This model is fine-tuned on 8 diverse environments testing:

  • Logical Reasoning: SAT solving, abduction, deduction
  • Interactive Agents: WebShop, AlfWorld, BabyAI, SciWorld, TextCraft

The model uses LoRA (Low-Rank Adaptation) for efficient fine-tuning and supports:

  • Extended context length (256K tokens)
  • Flash Attention 2 for efficient inference
  • BFloat16 precision

Training Details

Base Model

  • Architecture: Qwen3ForCausalLM
  • Parameters: 4B
  • Context Length: 262,144 tokens

Fine-tuning

  • Method: LoRA + PPO (Proximal Policy Optimization)
  • LoRA Rank: 64
  • LoRA Alpha: 128
  • Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Training Data

  • Affine Tasks: 10,000+ samples from satpalsr/rl-python
  • AgentGym Tasks: 5,000+ episodes across 5 environments

Hyperparameters

  • Learning Rate: 2e-5
  • Batch Size: 16 (effective)
  • Epochs: 3
  • Optimizer: AdamW
  • Scheduler: Cosine with warmup

Performance Metrics

  • Overall Accuracy: 0.4994
  • Mean Score: 0.4975
  • Success Rate: 0.8119

Per-Environment Performance

Environment Accuracy Mean Score Confidence Interval
affine:sat 0.5000 0.5000 [0.4548, 0.5452]
affine:abd 0.4950 0.4950 [0.4499, 0.5402]
affine:ded 0.5000 0.5000 [0.4548, 0.5452]
agentgym:webshop 0.4950 0.4771 [0.4321, 0.5223]
agentgym:alfworld 0.5100 0.5190 [0.4737, 0.5640]
agentgym:babyai 0.4600 0.4900 [0.4450, 0.5353]
agentgym:sciworld 0.5350 0.4959 [0.4508, 0.5411]
agentgym:textcraft 0.5000 0.5032 [0.4580, 0.5484]

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "YOUR_USERNAME/affine-0004-improved",
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "YOUR_USERNAME/affine-0004-improved",
    trust_remote_code=True
)

# Generate response
prompt = "Solve the following SAT problem: (x1 ∨ x2) ∧ (¬x1 ∨ x3)"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Evaluation

The model is evaluated on the Affine subnet using the following criteria:

  • Minimum 200 samples per environment
  • Pareto dominance scoring across all environments
  • Bayesian confidence intervals (80% confidence level)

Limitations and Bias

  • The model is optimized for reasoning and agent tasks
  • Performance may vary on out-of-distribution tasks
  • Inherits biases from the base Qwen3 model and training data

Citation

@misc{affine-model-training,
  title={Fine-tuned Affine Model for Reasoning and Interactive Agents},
  year={2024},
  url={https://huggingface.co/YOUR_USERNAME/affine-0004-improved}
}

License

Apache 2.0

Acknowledgments

  • Base model: Affine-0004
  • Training framework: Transformers, PEFT, TRL
  • Evaluation: Affine Subnet (Bittensor)
Downloads last month
4
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train carlorrr/affine-0004-improved