YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
T-Bench Qwen SFT Multi-Task NAT v8
Model Description
This model is fine-tuned from Qwen3-8B using enhanced Negative-Aware Training (NAT) on multiple terminal bench tasks.
Training Details
- Base Model: Qwen/Qwen3-8B
- Training Method: Enhanced Negative-Aware Training (NAT v8)
- Tasks: 5 tasks (fix-git, log-summary-date-ranges, pypi-server, regex-log, cancel-async-tasks)
- Epochs: 200 (trained for 200 epochs)
- Learning Rate: 5e-5
- Batch Size: 2
Dataset Composition
- Total samples: 30 per epoch
- Positive examples: 20 (4 per task)
- Negative examples: 10 (2 per task, task-specific negatives)
NAT v8 Enhancements
Negative examples teach task-specific anti-patterns:
- Hallucinated arguments: Adding message_title, message_description
- Looping behavior: Repeating commands after task completion
- Wrong command format: Using id instead of actual command
- Task-specific failures: Customized negative patterns for each task
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Aznaur/tbench-qwen-sft-multitask-nat-v8")
tokenizer = AutoTokenizer.from_pretrained("Aznaur/tbench-qwen-sft-multitask-nat-v8")
Performance
Trained for 200 epochs with enhanced NAT to improve tool usage and avoid task-specific failure patterns.
Paper Reference
Based on "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents" (arXiv 2402.11651)
Model Checkpoint
- Epoch: 199
- Global Step: 7799
- Training completed successfully
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support