Trident / README.md

Update README.md

3e0aa0b verified 6 days ago

4.31 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	library_name: transformers
	model_type: causal-lm
	base_model: Qwen/Qwen3-4B
	tags:
	- reasoning
	- tree-of-thoughts
	- gnn
	- self-improving
	- autonomous-training
	- multi-agent
	- variance-curriculum
	- reinforcement-learning
	datasets:
	- gsm8k
	- mmlu
	- gpqa
	- arc-challenge
	- truthfulqa
	metrics:
	- accuracy
	inference: true
	training: true
	---

	# TRIDENT

	TRIDENT is a reasoning-focused 4B-parameter language model that improves its own reasoning capability through algorithmic self-improvement, rather than parameter scaling.

	The model is built on Qwen3-4B and enhanced using the TRIDENT framework: a combination of GNN-guided Tree-of-Thoughts search, multi-agent reasoning policies, and variance-based self-training.

	---

	## Overview

	Traditional large language model training depends on:
	- Human-written reasoning traces
	- Manually curated preference datasets
	- Static fine-tuning pipelines

	TRIDENT removes these dependencies.

	Instead, the model:
	1. Explores multiple reasoning paths
	2. Evaluates them using a learned GNN policy
	3. Selects high-uncertainty problems automatically
	4. Generates its own training supervision
	5. Distills improvements back into the model using LoRA

	---
	model-index:
	- name: TRIDENT
	results:
	- task:
	type: text-generation
	dataset:
	name: GSM8K
	type: gsm8k
	split: test
	metrics:
	- type: accuracy
	value: 86.58
	- task:
	type: text-generation
	dataset:
	name: MMLU
	type: mmlu
	split: test
	metrics:
	- type: accuracy
	value: 72.61
	- task:
	type: text-generation
	dataset:
	name: GPQA
	type: gpqa
	split: test
	metrics:
	- type: accuracy
	value: 42.42
	- task:
	type: text-generation
	dataset:
	name: ARC-Challenge
	type: arc-challenge
	split: test
	metrics:
	- type: accuracy
	value: 59.0

	## Core Capabilities

	### GNN-Guided Tree-of-Thoughts
	Reasoning is represented as a directed graph of intermediate states.
	A 3-layer Graph Convolutional Network predicts a promise score for each branch, guiding exploration and pruning.

	### Multi-Agent Reasoning
	Four internal agents (Conservative, Exploratory, Balanced, Reflective) vote on reasoning actions to balance exploration and correctness.

	### Variance-Based Curriculum
	Problems are selected for training based on reward variance, targeting examples where the model is inconsistent and learning signal is highest.

	### Self-Generative Reasoning Loop
	No human-annotated reasoning traces are used.
	The model autonomously generates, evaluates, and curates its own reasoning data.

	### Stable Training
	A multi-layer reward stabilization mechanism prevents:
	- Reward collapse
	- Loss explosions
	- Infinite failure loops

	The architecture is compatible with future GRPO-style reinforcement learning.

	---


	---

	---

	## Benchmark Results

	Accuracy comparison against the base model:

	\| Benchmark \| Qwen3-4B \| TRIDENT \|
	\|--------\|--------\|-----------\|
	\| GSM8K (5-shot) \| 74.14 \| 86.58 \|
	\| MMLU (5-shot) \| 47.70 \| 72.61 \|
	\| ARC-C (25-shot) \| 54.0 \| 59.0 \|
	\| GPQA (0-shot) \| 28.28 \| 42.42 \|
	\| Winogrande (0-shot) \| 59.6 \| 67.08 \|
	\| TruthfulQA (0-shot) \| 54.9 \| 54.7 \|

	Highlight:
	+14.14 percentage point improvement on GPQA (0-shot).

	---

	## Intended Use

	TRIDENT is suitable for:
	- Multi-step mathematical reasoning
	- Scientific and logical inference
	- Hard QA benchmarks
	- Planning and hypothesis exploration
	- Research on reasoning systems

	---

	## Limitations

	- Higher inference-time compute than single-pass models
	- Not optimized for low-latency chat
	- Best used where reasoning depth matters more than speed

	---

	## Ethical Considerations

	- No human-written reasoning traces used
	- No preference data collection
	- Training relies on verifiable task rewards
	- Like all LLMs, may hallucinate outside structured reasoning workflows

	---
	## Paper link

	https://www.shivik.in/shivik-labs/trident

	## Citation

	```bibtex
	@article{puri2025trident,
	title={TRIDENT: Thought-based Reasoning and Improvement through Deep Exploration of Neuronal Trees},
	author={Puri, Shivansh and Khandelwal, Abhisek and Joshi, Vedant and Yadav, Akash},
	year={2025}
	}