Models are not stored in this repository. Please visit:modelscope

Xiaothink-T6-0.08B-Instruct-Preview 模型仓库

⚠️ 使用说明
由于架构特殊,本模型暂不支持ModelScope pipeline调用,请下载我们提供的专用Python库:

pip install xiaothink

🌟 模型概述 / Model Overview

2025年7月26日:模型已由Xiaothink-T6-mini-Preview-Base更名为Xiaothink-T6-0.08B-Instruct-Preview

Xiaothink-T6-0.08B-Instruct 是一款创新性的小型语言模型(SLM),采用突破性的 MoF(Mixed of Framework) 混合架构。本模型深度结合了Transformer与RNN两类架构的优势,在单个民用级CPU(Intel Core i7 2.8GHz)上仅需4天即可完成全流程训练,训练数据仅需0.25GB(0.1GB预训练+0.15GB微调)。模型特别优化了中文处理能力,在低资源环境下实现卓越性能。

Xiaothink-T6-0.08B-Instruct is an innovative Small Language Model (SLM) featuring a groundbreaking MoF (Mixed of Framework) hybrid architecture. The model deeply integrates the advantages of both Transformer and RNN architectures, requiring only 4 days for full training on a single consumer-grade CPU (Intel Core i7 2.8GHz) with just 0.25GB training data (0.1GB pre-training + 0.15GB fine-tuning). The model is specially optimized for Chinese language processing, delivering exceptional performance in low-resource environments.

🚀 快速开始 / Quick Start

安装与调用 / Installation & Usage

import xiaothink as xt
import xiaothink.llm.inference.test_formal as tf
import xiaothink.llm.inference.test as test

# 初始化模型配置
model_config = {
    'ckpt_dir': '/path/to/your/checkpoint',  # 替换为实际checkpoint路径
    'MT': 't6_beta_dense',                  # 模型版本标识
    'vocab': '/path/to/your/vocab.txt'      # 替换为实际词表路径
}

# ===== 交互式聊天模式 =====
chat_model = tf.QianyanModel(**model_config)

print("【聊天模式已启动】(输入[CLEAN]清空上下文)")
while True:
    user_input = input('【问】:')
    
    if user_input == '[CLEAN]':
        print('【系统】:上下文已清空\n')
        chat_model.clean_his()
        continue
    
    response = chat_model.chat(user_input, temp=0.32)
    print('\n【答】:', response, '\n')

# ===== 批量测试模式 =====
test_model, test_vocab = test.load(**model_config)

test_cases = {
    1: '解释量子纠缠的基本原理',
    2: '写一首关于秋天的五言绝句',
    3: '用Python实现快速排序算法',
    4: '全球变暖对极地生态系统的影响有哪些?'
}

# 执行测试用例
for case_id, prompt in test_cases.items():
    test_instruction = f'{{"instruction": "{prompt}", "input": "", "output": "'
    
    print(f"\n=== 测试用例 {case_id} ===")
    print(f"输入: {prompt}")
    
    response = test.generate_texts_loop(
        test_model, 
        test_vocab,
        test_instruction,
        num_generate=100,
        temperature=0.32,
        window=2048
    )
    print(f"输出: {response}")

🧠 核心创新 / Core Innovation

MoF混合架构 / MoF Hybrid Architecture

graph LR
A[输入文本] --> B{MoF路由机制}
B --> C[窄深Transformer]
B --> D[宽浅RNN]
C --> E[处理复杂短上下文]
D --> F[处理知识型长上下文]
E & F --> G[融合输出]
  • 窄深Transformer专家:专注最近窗口(128 token)的高复杂度任务,采用线性注意力机制降低计算开销

  • 宽浅RNN专家:处理长上下文(2048 token)的知识型任务,8层GRU网络捕获长期依赖

  • 智能路由机制:动态分配任务给最适合的专家网络,最大化计算效率

  • **思维空间(Thought Space)**:在Transformer中引入全局上下文推理模块,增强语义理解

  • Narrow-Deep Transformer Expert: Focuses on high-complexity tasks in the recent context window (128 tokens) using linear attention to reduce computational overhead

  • Wide-Shallow RNN Expert: Handles knowledge-intensive long-context tasks (2048 tokens) with 8-layer GRU networks capturing long-term dependencies

  • Intelligent Routing Mechanism: Dynamically allocates tasks to the most suitable expert network to maximize computational efficiency

  • Thought Space: Introduces global context reasoning module in Transformer to enhance semantic understanding

⚙️ 架构细节 / Architecture Details

关键组件 / Key Components

1. 专家网络 / Expert Networks

# RNN专家 (CLModel_t6)
class CLModel_t6(layers.Layer):
    def __init__(self, vocab_size, embedding_dim, window, units, n=8, ...):
        # GRU网络 + LayerNormalization
        self.context_encoders = [GRU(units) for _ in range(n)]

# Transformer专家 (MemoryEnhancedTransformer_dense)
class MemoryEnhancedTransformer_dense(Model):
    def __init__(self, vocab_size, embedding_dim, units, ...):
        # 线性注意力Transformer块 + 位置编码
        self.transformer_blocks = [LinearAttentionTransformerBlock_dense() for _ in layers]
        self.position_embedding = PositionEmbedding_dense(...)

2. MoE路由机制 / MoE Routing Mechanism

class MoEModel_t6(Model):
    def __init__(self, experts, vocab_size, num_experts, router_units):
        # GRU路由网络 + 权重归一化
        self.router_gru = layers.GRU(router_units)
        self.router_dense = layers.Dense(num_experts, activation='softmax')
        
    def call(self, inputs):
        # 动态计算专家权重
        expert_weights = self.router(inputs)  # [B, E]
        # 加权融合专家输出
        combined = tf.reduce_sum(stacked * weights, axis=-1)

3. 思维空间模块 / Thought Space Module

class LinearAttentionTransformerBlock_dense(layers.Layer):
    def __init__(self, use_thought_space=True, ...):
        if use_thought_space:
            # 全局上下文提取 + 思维向量生成
            self.context_extractor = GlobalAveragePooling1D()
            self.thought_processor = Dense(embed_dim, activation='gelu')
            
    def call(self, inputs):
        # 思维空间推理
        context = self.context_extractor(out1)
        thought_vector = self.thought_processor(context)
        # 自适应融合
        out1 = out1 + self.alpha * thought_vector

📊 训练配置 / Training Configuration

参数 说明
硬件 Intel Core i7 2.8GHz 民用级CPU
训练时间 4天 从零开始训练
预训练数据 0.1GB 高质量中文文本
微调数据 0.15GB 指令微调数据集
上下文窗口 4096 tokens 最大支持长度
专家数量 2 RNN+Transformer

🧩 模型参数 / Model Parameters

类别 参数项 说明
嵌入层 维度 512 词向量维度
Transformer 层数 23 堆叠层数
dff因子 1 前馈网络扩展因子
注意力头数 8 多头注意力机制
窗口大小 130 局部注意力范围
最大窗口 4096 位置编码支持长度
GRU RNN单元数 1400 隐藏层维度
上下文 训练长度 220 训练时上下文长度
最大长度 4096 支持的最长上下文

📮 联系我们 / Contact

小思框架 - 推动边缘计算AI发展
邮箱: xiaothink@foxmail.com
GitHub: github.com/Ericsjq

Xiaothink Research - Advancing Edge AI
Email: xiaothink@foxmail.com
GitHub: github.com/Ericsjq


许可证 / License: Apache 2.0
模型版本 / Model Version: Preview(4000-batch) 最后更新 / Last Update: 2025-7-26

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support