Llama-3.1-8B-Thinking-Distill-R1

模型简介

这是一个基于 Meta-Llama-3.1-8B 的微调模型，通过 DeepSeek-R1 蒸馏数据集增强了中文理解能力和深度思考能力。模型使用 Unsloth 和 Hugging Face TRL 库进行训练，训练速度提升2倍。

模型详情

开发者： suyu-io
模型ID： suyu-io/Llama-3.1-8B-Thinking-Distill-R1
基础模型： meta-llama/Llama-3.1-8B
量化版本： unsloth/meta-llama-3.1-8b-instruct-bnb-4bit
许可证： Apache-2.0
语言： 中文 (Chinese)
模型类型： 因果语言模型 (Causal Language Model)

核心特性

✨ 深度思考模式：模型具备类似 DeepSeek-R1 的思维链推理能力，能够展示详细的推理过程

🇨🇳 增强中文能力：使用 110k 高质量中文数据集进行训练，大幅提升中文理解和生成能力

⚡ 高效训练：采用 Unsloth 加速框架，训练效率提升2倍

训练数据集

本模型使用 Congliu/Chinese-DeepSeek-R1-Distill-data-110k 数据集进行训练，这是一个从满血版 DeepSeek-R1 蒸馏而来的高质量中文数据集。

数据分布

类别	样本数量	说明
Math（数学）	36,568	数学推理和计算问题
Exam（考试）	2,432	各类考试题目
STEM（理工科）	12,648	科学、技术、工程、数学领域
General（通用）	58,352	弱智吧、逻辑推理、小红书、知乎、Chat等多元场景
总计	110,000	-

模型优势

相较于原版 meta-llama/Llama-3.1-8B 模型：

新增深度思考模式：能够展示完整的推理过程，提高答案的可解释性
中文能力提升：专门针对中文场景优化，涵盖数学、逻辑、日常对话等多个领域
多场景适配：支持学术、考试、社交媒体等多种应用场景

使用方法

快速开始

from transformers import AutoModelForCausalLM, AutoTokenizer

# 加载模型和分词器
model_name = "suyu-io/Llama-3.1-8B-Thinking-Distill-R1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# 推理示例
prompt = "请解释一下量子纠缠的原理"
messages = [
    {"role": "user", "content": prompt}
]

inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=2048,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

使用 Unsloth 加速推理

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "suyu-io/Llama-3.1-8B-Thinking-Distill-R1",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

FastLanguageModel.for_inference(model)

inputs = tokenizer(
    "请详细解释一下相对论",
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    **inputs,  # 解包字典
    max_new_tokens = 2048,
    use_cache = True, 
    temperature = 0.2,
    min_p = 0.2,
    repetition_penalty = 1.1
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

局限性

模型可能仍存在幻觉问题，请谨慎使用于关键决策
思考链推理会增加生成长度和推理时间
建议在使用前进行针对性评估

训练框架

本模型使用以下优秀开源项目进行训练：

Unsloth - 2倍训练加速
Hugging Face TRL - 强化学习训练
Transformers - 模型基础框架

致谢

感谢 Congliu 提供的高质量中文蒸馏数据集，以及 Meta、Unsloth、Hugging Face 社区的支持。

引用

如果您使用了本模型，请引用：

@misc{llama-3.1-8b-thinking-distill-r1,
  author = {suyu-io},
  title = {Llama-3.1-8B-Thinking-Distill-R1},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/suyu-io/Llama-3.1-8B-Thinking-Distill-R1}}
}

Downloads last month: 22

Safetensors

Model size

8B params

Tensor type

BF16

suyu-io
/

Llama-3.1-8B-Thinking-Distill-R1