TOMAGPT
A Qwen3-4B-Instruct-2507 model fine-tuned with GRPO (Group Relative Policy Optimization) to classify legal hearsay by decomposing it into three sub-elements under the U.S. Federal Rules of Evidence.
What It Does
TOMAGPT classifies whether a statement is hearsay by analyzing three sub-elements:
- Assertion -- Is the statement an assertion?
- Out-of-court -- Was the statement made out of court?
- TOMA -- Is the statement offered to prove the truth of the matter asserted?
Hearsay = YES only if all three sub-elements are YES.
Results
Evaluated on the LegalBench hearsay test set (94 examples):
| Metric | Base Model | TOMAGPT | Delta |
|---|---|---|---|
| Overall accuracy | 71.3% | 77.7% | +6.4% |
| TOMA sub-element | 78.0% | 95.1% | +17.1% |
| Assertion sub-element | 90.2% | 95.1% | +4.9% |
| Non-verbal hearsay | 33.3% | 83.3% | +50.0% |
| Standard hearsay | 93.1% | 100.0% | +6.9% |
| Non-assertive conduct | 89.5% | 100.0% | +10.5% |
Training Details
- Method: GRPO (Group Relative Policy Optimization)
- Platform: Prime Intellect Lab
- Environment:
smolclaims/TOMAGPT(v0.3.0) - Base model: Qwen/Qwen3-4B-Instruct-2507
- Training data: DoodDood/HearsayGRPOTrainingData2 (3,140 examples)
- Steps: 500
- Learning rate: 1e-5
- Batch size: 128
- Rollouts per example: 16
LoRA Configuration
- Rank (r): 16
- Alpha: 32
- Dropout: 0.0
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Reward Functions
| Function | Weight | Description |
|---|---|---|
| assertion_reward | 1.5 | +1/-1 on assertion accuracy |
| out_of_court_reward | 1.0 | +1/-1 on out-of-court accuracy |
| toma_reward | 2.0 | +1/-1 on TOMA accuracy |
| consistency_penalty | 1.0 | -0.5 for contradictory outputs |
| format_compliance | 1.0 | -0.25 per missing field |
| constraint_penalty | 1.0 | -0.5 for logical violations |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"DoodDood/TOMAGPT", torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("DoodDood/TOMAGPT")
system_prompt = (
"You are a legal assistant identifying hearsay. Hearsay is defined as "
"an out-of-court statement introduced to prove the truth of the matter "
"asserted.\n\n"
"Respond in EXACTLY this format (semicolon-separated):\n"
"is_hearsay: YES/NO; an_assertion: YES/NO; made_out_of_court: YES/NO; "
"is_for_toma: YES/NO"
)
scenario = "At trial, the prosecution presents testimony from a police officer who states that a bystander at the scene told him, 'The defendant ran the red light.'"
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": scenario}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=128, do_sample=False)
response = tokenizer.decode(output[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(response)
# Expected: is_hearsay: YES; an_assertion: YES; made_out_of_court: YES; is_for_toma: YES
Links
- Training data: DoodDood/HearsayGRPOTrainingData2
- GRPO environment:
smolclaims/TOMAGPTon Prime Intellect - Eval benchmark: nguha/legalbench (hearsay subset)
- Downloads last month
- 13
Model tree for DoodDood/TOMAGPT
Dataset used to train DoodDood/TOMAGPT
Evaluation results
- Decomposed Accuracy on LegalBench Hearsaytest set self-reported77.700