TOMAGPT

A Qwen3-4B-Instruct-2507 model fine-tuned with GRPO (Group Relative Policy Optimization) to classify legal hearsay by decomposing it into three sub-elements under the U.S. Federal Rules of Evidence.

What It Does

TOMAGPT classifies whether a statement is hearsay by analyzing three sub-elements:

Assertion -- Is the statement an assertion?
Out-of-court -- Was the statement made out of court?
TOMA -- Is the statement offered to prove the truth of the matter asserted?

Hearsay = YES only if all three sub-elements are YES.

Results

Evaluated on the LegalBench hearsay test set (94 examples):

Metric	Base Model	TOMAGPT	Delta
Overall accuracy	71.3%	77.7%	+6.4%
TOMA sub-element	78.0%	95.1%	+17.1%
Assertion sub-element	90.2%	95.1%	+4.9%
Non-verbal hearsay	33.3%	83.3%	+50.0%
Standard hearsay	93.1%	100.0%	+6.9%
Non-assertive conduct	89.5%	100.0%	+10.5%

Training Details

Method: GRPO (Group Relative Policy Optimization)
Platform: Prime Intellect Lab
Environment: smolclaims/TOMAGPT (v0.3.0)
Base model: Qwen/Qwen3-4B-Instruct-2507
Training data: DoodDood/HearsayGRPOTrainingData2 (3,140 examples)
Steps: 500
Learning rate: 1e-5
Batch size: 128
Rollouts per example: 16

LoRA Configuration

Rank (r): 16
Alpha: 32
Dropout: 0.0
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Reward Functions

Function	Weight	Description
assertion_reward	1.5	+1/-1 on assertion accuracy
out_of_court_reward	1.0	+1/-1 on out-of-court accuracy
toma_reward	2.0	+1/-1 on TOMA accuracy
consistency_penalty	1.0	-0.5 for contradictory outputs
format_compliance	1.0	-0.25 per missing field
constraint_penalty	1.0	-0.5 for logical violations

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "DoodDood/TOMAGPT", torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("DoodDood/TOMAGPT")

system_prompt = (
    "You are a legal assistant identifying hearsay. Hearsay is defined as "
    "an out-of-court statement introduced to prove the truth of the matter "
    "asserted.\n\n"
    "Respond in EXACTLY this format (semicolon-separated):\n"
    "is_hearsay: YES/NO; an_assertion: YES/NO; made_out_of_court: YES/NO; "
    "is_for_toma: YES/NO"
)

scenario = "At trial, the prosecution presents testimony from a police officer who states that a bystander at the scene told him, 'The defendant ran the red light.'"

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": scenario}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=128, do_sample=False)

response = tokenizer.decode(output[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(response)
# Expected: is_hearsay: YES; an_assertion: YES; made_out_of_court: YES; is_for_toma: YES

Model tree for DoodDood/TOMAGPT

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5529)

this model

Adapters

1 model

Dataset used to train DoodDood/TOMAGPT

Evaluation results

Decomposed Accuracy on LegalBench Hearsay
test set self-reported

77.700

DoodDood
/

TOMAGPT