Trident / README.md

ziadrone

Update README.md

3e0aa0b verified 6 days ago

preview code

raw

history blame contribute delete

4.31 kB

metadata

license: apache-2.0
language:
  - en
pipeline_tag: text-generation
library_name: transformers
model_type: causal-lm
base_model: Qwen/Qwen3-4B
tags:
  - reasoning
  - tree-of-thoughts
  - gnn
  - self-improving
  - autonomous-training
  - multi-agent
  - variance-curriculum
  - reinforcement-learning
datasets:
  - gsm8k
  - mmlu
  - gpqa
  - arc-challenge
  - truthfulqa
metrics:
  - accuracy
inference: true
training: true

TRIDENT

TRIDENT is a reasoning-focused 4B-parameter language model that improves its own reasoning capability through algorithmic self-improvement, rather than parameter scaling.

The model is built on Qwen3-4B and enhanced using the TRIDENT framework: a combination of GNN-guided Tree-of-Thoughts search, multi-agent reasoning policies, and variance-based self-training.

Overview

Traditional large language model training depends on:

Human-written reasoning traces
Manually curated preference datasets
Static fine-tuning pipelines

TRIDENT removes these dependencies.

Instead, the model:

Explores multiple reasoning paths
Evaluates them using a learned GNN policy
Selects high-uncertainty problems automatically
Generates its own training supervision
Distills improvements back into the model using LoRA

model-index:

name: TRIDENT results:
- task: type: text-generation dataset: name: GSM8K type: gsm8k split: test metrics:
  - type: accuracy value: 86.58
- task: type: text-generation dataset: name: MMLU type: mmlu split: test metrics:
  - type: accuracy value: 72.61
- task: type: text-generation dataset: name: GPQA type: gpqa split: test metrics:
  - type: accuracy value: 42.42
- task: type: text-generation dataset: name: ARC-Challenge type: arc-challenge split: test metrics:
  - type: accuracy value: 59.0

Core Capabilities

GNN-Guided Tree-of-Thoughts

Reasoning is represented as a directed graph of intermediate states.
A 3-layer Graph Convolutional Network predicts a promise score for each branch, guiding exploration and pruning.

Multi-Agent Reasoning

Four internal agents (Conservative, Exploratory, Balanced, Reflective) vote on reasoning actions to balance exploration and correctness.

Variance-Based Curriculum

Problems are selected for training based on reward variance, targeting examples where the model is inconsistent and learning signal is highest.

Self-Generative Reasoning Loop

No human-annotated reasoning traces are used.
The model autonomously generates, evaluates, and curates its own reasoning data.

Stable Training

A multi-layer reward stabilization mechanism prevents:

Reward collapse
Loss explosions
Infinite failure loops

The architecture is compatible with future GRPO-style reinforcement learning.

Benchmark Results

Accuracy comparison against the base model:

Benchmark	Qwen3-4B	TRIDENT
GSM8K (5-shot)	74.14	86.58
MMLU (5-shot)	47.70	72.61
ARC-C (25-shot)	54.0	59.0
GPQA (0-shot)	28.28	42.42
Winogrande (0-shot)	59.6	67.08
TruthfulQA (0-shot)	54.9	54.7

Highlight:
+14.14 percentage point improvement on GPQA (0-shot).

Intended Use

TRIDENT is suitable for:

Multi-step mathematical reasoning
Scientific and logical inference
Hard QA benchmarks
Planning and hypothesis exploration
Research on reasoning systems

Limitations

Higher inference-time compute than single-pass models
Not optimized for low-latency chat
Best used where reasoning depth matters more than speed

Ethical Considerations

No human-written reasoning traces used
No preference data collection
Training relies on verifiable task rewards
Like all LLMs, may hallucinate outside structured reasoning workflows

Paper link

https://www.shivik.in/shivik-labs/trident

Citation

@article{puri2025trident,
  title={TRIDENT: Thought-based Reasoning and Improvement through Deep Exploration of Neuronal Trees},
  author={Puri, Shivansh and Khandelwal, Abhisek and Joshi, Vedant and Yadav, Akash},
  year={2025}
}