jonathanhe123
/

SemCoT-Sheared-LLaMA-1.3B-multiarith

Text Generation

model_hub_mixin

pytorch_model_hub_mixin

chain-of-thought

implicit-reasoning

Model card Files Files and versions

SemCoT-Sheared-LLaMA-1.3B-multiarith / README.md

jonathanhe123's picture

Add arxiv ID and improve model card (#1)

8ce90b5 verified about 1 month ago

|

history blame contribute delete

2.25 kB

	---
	base_model:
	- princeton-nlp/Sheared-LLaMA-1.3B
	datasets:
	- ChilleD/MultiArith
	license: llama2
	pipeline_tag: text-generation
	library_name: pytorch
	arxiv: 2510.24940
	tags:
	- model_hub_mixin
	- pytorch_model_hub_mixin
	- chain-of-thought
	- implicit-reasoning
	---

	# SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens

	## 🚀 Overview
	SemCoT is a framework that improves the efficiency of Chain-of-Thought (CoT) reasoning by encoding reasoning steps inside hidden representations ("implicit tokens") instead of generating long textual explanations. This approach significantly speeds up inference while maintaining high reasoning performance.

	This specific checkpoint is Sheared-LLaMA-1.3B fine-tuned using the SemCoT framework on the MultiArith dataset.

	- Paper: [SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens](https://huggingface.co/papers/2510.24940)
	- Code: [Official GitHub Repository](https://github.com/YinhanHe123/SemCoT)

	## 🎯 Key Features
	- 🗣️ Semantic Alignment: Uses a contrastively trained sentence transformer to ensure that implicit reasoning tokens remain semantically consistent with human-readable CoT explanations.
	- ⚡ Efficiency Optimization: Introduces a lightweight implicit reasoning generator, fine-tuned via knowledge distillation, to reduce token generation time and enhance inference speed.
	- 🧩 Joint Optimization: SemCoT is the first approach to jointly optimize both token-level generation speed and semantic alignment with ground-truth reasoning.

	## 🛠️ Usage
	To use this model, please refer to the [official implementation on GitHub](https://github.com/YinhanHe123/SemCoT/) as it requires the SemCoT framework to handle the implicit reasoning tokens correctly.

	## Citation
	```bibtex
	@inproceedings{he2025semcot,
	title={SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens},
	author={He, Yinhan and Zheng, Wendy and Zhu, Yaochen and Zheng, Zaiyi and Su, Lin and Vasudevan, Sriram and Guo, Qi and Hong, Liangjie and Li, Jundong},
	booktitle={39th Conference on Neural Information Processing Systems (NeurIPS 2025)},
	year={2025}
	}
	```