satpalsr/rl-python
Viewer • Updated • 23.3k • 55 • 2
Fine-tuned version of Affine-0004 (Qwen3-4B) optimized for reasoning and interactive agent tasks.
This model is fine-tuned on 8 diverse environments testing:
The model uses LoRA (Low-Rank Adaptation) for efficient fine-tuning and supports:
| Environment | Accuracy | Mean Score | Confidence Interval |
|---|---|---|---|
| affine:sat | 0.5000 | 0.5000 | [0.4548, 0.5452] |
| affine:abd | 0.4950 | 0.4950 | [0.4499, 0.5402] |
| affine:ded | 0.5000 | 0.5000 | [0.4548, 0.5452] |
| agentgym:webshop | 0.4950 | 0.4771 | [0.4321, 0.5223] |
| agentgym:alfworld | 0.5100 | 0.5190 | [0.4737, 0.5640] |
| agentgym:babyai | 0.4600 | 0.4900 | [0.4450, 0.5353] |
| agentgym:sciworld | 0.5350 | 0.4959 | [0.4508, 0.5411] |
| agentgym:textcraft | 0.5000 | 0.5032 | [0.4580, 0.5484] |
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"YOUR_USERNAME/affine-0004-improved",
torch_dtype="auto",
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"YOUR_USERNAME/affine-0004-improved",
trust_remote_code=True
)
# Generate response
prompt = "Solve the following SAT problem: (x1 ∨ x2) ∧ (¬x1 ∨ x3)"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
The model is evaluated on the Affine subnet using the following criteria:
@misc{affine-model-training,
title={Fine-tuned Affine Model for Reasoning and Interactive Agents},
year={2024},
url={https://huggingface.co/YOUR_USERNAME/affine-0004-improved}
}
Apache 2.0