Aznaur
/

terminal_agent_tbench_pro_v2_split

Model card Files Files and versions

terminal_agent_tbench_pro_v2_split / README.md

Aznaur's picture

Upload Terminal Bench Pro v2 model (epoch 9)

96bc847 verified about 1 month ago

|

history blame contribute delete

1.7 kB

	# Terminal Agent - Multi-Task NAT v13

	## Model Description
	This model is fine-tuned from Qwen3-8B on multi-task terminal agent trajectories using Negative-Aware Training (NAT).

	### Key Features
	- 5 Tasks: fix-git, cancel-async-tasks, log-summary-date-ranges, regex-log, pypi-server
	- Fixed Tool Signatures: Corrected critical bug where `note_name` was incorrectly removed
	- Clean Tool Calls: Removed hallucinated parameters (message_title, message_description, message_attachment)
	- Negative Examples: Includes looping and wrong_command negative examples

	### Training Details
	- Base Model: Qwen/Qwen3-8B
	- Training Data: 40 samples (20 positive, 20 negative)
	- Epochs: 300
	- Learning Rate: 5e-5
	- Batch Size: 4

	### Tool Signatures (Corrected)
	- `shell_exec(id, command, block)`
	- `shell_write_content_to_file(content, file_path)`
	- `create_note(note_name, content)`
	- `append_note(note_name, content)`
	- `read_note(note_name)`

	### Usage
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("camel-ai/terminal_agent_multitask_nat_v13")
	tokenizer = AutoTokenizer.from_pretrained("camel-ai/terminal_agent_multitask_nat_v13")
	```

	### V13 Fixes
	1. KEEP note_name - Required by runtime (was incorrectly removed in v12)
	2. System prompt uses note_name - Matches runtime expectations
	3. Remove only hallucinated params - message_title, message_description, message_attachment
	4. Added tool call validation - Catches signature issues before training

	### Evaluation Results
	Expected to achieve >80% success rate on 5 tasks when evaluated with matching task set.

	## License
	MIT License