Models are not stored in this repository. Please visit:modelscope
Xiaothink-T6-0.08B-Instruct-Preview 模型仓库
⚠️ 使用说明
由于架构特殊,本模型暂不支持ModelScope pipeline调用,请下载我们提供的专用Python库:pip install xiaothink
🌟 模型概述 / Model Overview
2025年7月26日:模型已由Xiaothink-T6-mini-Preview-Base更名为Xiaothink-T6-0.08B-Instruct-Preview
Xiaothink-T6-0.08B-Instruct 是一款创新性的小型语言模型(SLM),采用突破性的 MoF(Mixed of Framework) 混合架构。本模型深度结合了Transformer与RNN两类架构的优势,在单个民用级CPU(Intel Core i7 2.8GHz)上仅需4天即可完成全流程训练,训练数据仅需0.25GB(0.1GB预训练+0.15GB微调)。模型特别优化了中文处理能力,在低资源环境下实现卓越性能。
Xiaothink-T6-0.08B-Instruct is an innovative Small Language Model (SLM) featuring a groundbreaking MoF (Mixed of Framework) hybrid architecture. The model deeply integrates the advantages of both Transformer and RNN architectures, requiring only 4 days for full training on a single consumer-grade CPU (Intel Core i7 2.8GHz) with just 0.25GB training data (0.1GB pre-training + 0.15GB fine-tuning). The model is specially optimized for Chinese language processing, delivering exceptional performance in low-resource environments.
🚀 快速开始 / Quick Start
安装与调用 / Installation & Usage
import xiaothink as xt
import xiaothink.llm.inference.test_formal as tf
import xiaothink.llm.inference.test as test
# 初始化模型配置
model_config = {
'ckpt_dir': '/path/to/your/checkpoint', # 替换为实际checkpoint路径
'MT': 't6_beta_dense', # 模型版本标识
'vocab': '/path/to/your/vocab.txt' # 替换为实际词表路径
}
# ===== 交互式聊天模式 =====
chat_model = tf.QianyanModel(**model_config)
print("【聊天模式已启动】(输入[CLEAN]清空上下文)")
while True:
user_input = input('【问】:')
if user_input == '[CLEAN]':
print('【系统】:上下文已清空\n')
chat_model.clean_his()
continue
response = chat_model.chat(user_input, temp=0.32)
print('\n【答】:', response, '\n')
# ===== 批量测试模式 =====
test_model, test_vocab = test.load(**model_config)
test_cases = {
1: '解释量子纠缠的基本原理',
2: '写一首关于秋天的五言绝句',
3: '用Python实现快速排序算法',
4: '全球变暖对极地生态系统的影响有哪些?'
}
# 执行测试用例
for case_id, prompt in test_cases.items():
test_instruction = f'{{"instruction": "{prompt}", "input": "", "output": "'
print(f"\n=== 测试用例 {case_id} ===")
print(f"输入: {prompt}")
response = test.generate_texts_loop(
test_model,
test_vocab,
test_instruction,
num_generate=100,
temperature=0.32,
window=2048
)
print(f"输出: {response}")
🧠 核心创新 / Core Innovation
MoF混合架构 / MoF Hybrid Architecture
graph LR
A[输入文本] --> B{MoF路由机制}
B --> C[窄深Transformer]
B --> D[宽浅RNN]
C --> E[处理复杂短上下文]
D --> F[处理知识型长上下文]
E & F --> G[融合输出]
窄深Transformer专家:专注最近窗口(128 token)的高复杂度任务,采用线性注意力机制降低计算开销
宽浅RNN专家:处理长上下文(2048 token)的知识型任务,8层GRU网络捕获长期依赖
智能路由机制:动态分配任务给最适合的专家网络,最大化计算效率
**思维空间(Thought Space)**:在Transformer中引入全局上下文推理模块,增强语义理解
Narrow-Deep Transformer Expert: Focuses on high-complexity tasks in the recent context window (128 tokens) using linear attention to reduce computational overhead
Wide-Shallow RNN Expert: Handles knowledge-intensive long-context tasks (2048 tokens) with 8-layer GRU networks capturing long-term dependencies
Intelligent Routing Mechanism: Dynamically allocates tasks to the most suitable expert network to maximize computational efficiency
Thought Space: Introduces global context reasoning module in Transformer to enhance semantic understanding
⚙️ 架构细节 / Architecture Details
关键组件 / Key Components
1. 专家网络 / Expert Networks
# RNN专家 (CLModel_t6)
class CLModel_t6(layers.Layer):
def __init__(self, vocab_size, embedding_dim, window, units, n=8, ...):
# GRU网络 + LayerNormalization
self.context_encoders = [GRU(units) for _ in range(n)]
# Transformer专家 (MemoryEnhancedTransformer_dense)
class MemoryEnhancedTransformer_dense(Model):
def __init__(self, vocab_size, embedding_dim, units, ...):
# 线性注意力Transformer块 + 位置编码
self.transformer_blocks = [LinearAttentionTransformerBlock_dense() for _ in layers]
self.position_embedding = PositionEmbedding_dense(...)
2. MoE路由机制 / MoE Routing Mechanism
class MoEModel_t6(Model):
def __init__(self, experts, vocab_size, num_experts, router_units):
# GRU路由网络 + 权重归一化
self.router_gru = layers.GRU(router_units)
self.router_dense = layers.Dense(num_experts, activation='softmax')
def call(self, inputs):
# 动态计算专家权重
expert_weights = self.router(inputs) # [B, E]
# 加权融合专家输出
combined = tf.reduce_sum(stacked * weights, axis=-1)
3. 思维空间模块 / Thought Space Module
class LinearAttentionTransformerBlock_dense(layers.Layer):
def __init__(self, use_thought_space=True, ...):
if use_thought_space:
# 全局上下文提取 + 思维向量生成
self.context_extractor = GlobalAveragePooling1D()
self.thought_processor = Dense(embed_dim, activation='gelu')
def call(self, inputs):
# 思维空间推理
context = self.context_extractor(out1)
thought_vector = self.thought_processor(context)
# 自适应融合
out1 = out1 + self.alpha * thought_vector
📊 训练配置 / Training Configuration
| 参数 | 值 | 说明 |
|---|---|---|
| 硬件 | Intel Core i7 2.8GHz | 民用级CPU |
| 训练时间 | 4天 | 从零开始训练 |
| 预训练数据 | 0.1GB | 高质量中文文本 |
| 微调数据 | 0.15GB | 指令微调数据集 |
| 上下文窗口 | 4096 tokens | 最大支持长度 |
| 专家数量 | 2 | RNN+Transformer |
🧩 模型参数 / Model Parameters
| 类别 | 参数项 | 值 | 说明 |
|---|---|---|---|
| 嵌入层 | 维度 | 512 | 词向量维度 |
| Transformer | 层数 | 23 | 堆叠层数 |
| dff因子 | 1 | 前馈网络扩展因子 | |
| 注意力头数 | 8 | 多头注意力机制 | |
| 窗口大小 | 130 | 局部注意力范围 | |
| 最大窗口 | 4096 | 位置编码支持长度 | |
| GRU | RNN单元数 | 1400 | 隐藏层维度 |
| 上下文 | 训练长度 | 220 | 训练时上下文长度 |
| 最大长度 | 4096 | 支持的最长上下文 |
📮 联系我们 / Contact
小思框架 - 推动边缘计算AI发展
邮箱: xiaothink@foxmail.com
GitHub: github.com/Ericsjq
Xiaothink Research - Advancing Edge AI
Email: xiaothink@foxmail.com
GitHub: github.com/Ericsjq
许可证 / License: Apache 2.0
模型版本 / Model Version: Preview(4000-batch)
最后更新 / Last Update: 2025-7-26