Trident / README.md
ziadrone's picture
Update README.md
3e0aa0b verified
---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
library_name: transformers
model_type: causal-lm
base_model: Qwen/Qwen3-4B
tags:
- reasoning
- tree-of-thoughts
- gnn
- self-improving
- autonomous-training
- multi-agent
- variance-curriculum
- reinforcement-learning
datasets:
- gsm8k
- mmlu
- gpqa
- arc-challenge
- truthfulqa
metrics:
- accuracy
inference: true
training: true
---
# TRIDENT
**TRIDENT** is a reasoning-focused 4B-parameter language model that improves its own reasoning capability through **algorithmic self-improvement**, rather than parameter scaling.
The model is built on **Qwen3-4B** and enhanced using the **TRIDENT framework**: a combination of GNN-guided Tree-of-Thoughts search, multi-agent reasoning policies, and variance-based self-training.
---
## Overview
Traditional large language model training depends on:
- Human-written reasoning traces
- Manually curated preference datasets
- Static fine-tuning pipelines
**TRIDENT removes these dependencies.**
Instead, the model:
1. Explores multiple reasoning paths
2. Evaluates them using a learned GNN policy
3. Selects high-uncertainty problems automatically
4. Generates its own training supervision
5. Distills improvements back into the model using LoRA
---
model-index:
- name: TRIDENT
results:
- task:
type: text-generation
dataset:
name: GSM8K
type: gsm8k
split: test
metrics:
- type: accuracy
value: 86.58
- task:
type: text-generation
dataset:
name: MMLU
type: mmlu
split: test
metrics:
- type: accuracy
value: 72.61
- task:
type: text-generation
dataset:
name: GPQA
type: gpqa
split: test
metrics:
- type: accuracy
value: 42.42
- task:
type: text-generation
dataset:
name: ARC-Challenge
type: arc-challenge
split: test
metrics:
- type: accuracy
value: 59.0
## Core Capabilities
### GNN-Guided Tree-of-Thoughts
Reasoning is represented as a directed graph of intermediate states.
A 3-layer Graph Convolutional Network predicts a **promise score** for each branch, guiding exploration and pruning.
### Multi-Agent Reasoning
Four internal agents (Conservative, Exploratory, Balanced, Reflective) vote on reasoning actions to balance exploration and correctness.
### Variance-Based Curriculum
Problems are selected for training based on **reward variance**, targeting examples where the model is inconsistent and learning signal is highest.
### Self-Generative Reasoning Loop
No human-annotated reasoning traces are used.
The model autonomously generates, evaluates, and curates its own reasoning data.
### Stable Training
A multi-layer reward stabilization mechanism prevents:
- Reward collapse
- Loss explosions
- Infinite failure loops
The architecture is compatible with future GRPO-style reinforcement learning.
---
---
---
## Benchmark Results
Accuracy comparison against the base model:
| Benchmark | Qwen3-4B | TRIDENT |
|--------|--------|-----------|
| GSM8K (5-shot) | 74.14 | **86.58** |
| MMLU (5-shot) | 47.70 | **72.61** |
| ARC-C (25-shot) | 54.0 | **59.0** |
| GPQA (0-shot) | 28.28 | **42.42** |
| Winogrande (0-shot) | 59.6 | **67.08** |
| TruthfulQA (0-shot) | 54.9 | **54.7** |
**Highlight:**
+14.14 percentage point improvement on **GPQA (0-shot)**.
---
## Intended Use
TRIDENT is suitable for:
- Multi-step mathematical reasoning
- Scientific and logical inference
- Hard QA benchmarks
- Planning and hypothesis exploration
- Research on reasoning systems
---
## Limitations
- Higher inference-time compute than single-pass models
- Not optimized for low-latency chat
- Best used where reasoning depth matters more than speed
---
## Ethical Considerations
- No human-written reasoning traces used
- No preference data collection
- Training relies on verifiable task rewards
- Like all LLMs, may hallucinate outside structured reasoning workflows
---
## Paper link
https://www.shivik.in/shivik-labs/trident
## Citation
```bibtex
@article{puri2025trident,
title={TRIDENT: Thought-based Reasoning and Improvement through Deep Exploration of Neuronal Trees},
author={Puri, Shivansh and Khandelwal, Abhisek and Joshi, Vedant and Yadav, Akash},
year={2025}
}