Update README.md

a92daa2 verified 15 days ago

4.31 kB

license: apache-2.0
language:
  - zh
  - en
pipeline_tag: text-generation
tags:
  - spark
  - iflytek
  - chat
  - pytorch
  - causal-lm

OpenSpark-13B-Chat

中文 | English

⚠️ Note: This is a relatively early version of the iFlytek Spark model (released in 2024). We converted it to Hugging Face format primarily for research purposes — to help the community study early LLM architectures, compare with modern models, and understand how the field has evolved.

This is a community-converted Hugging Face compatible version of the iFlytek Spark 13B model. The original weights were converted from the official Megatron-DeepSpeed format to work seamlessly with the transformers ecosystem.

Source

Original Weights: iFlytek Spark-13B on Gitee
Training Framework: Megatron-DeepSpeed
Release Date: 2024

Requirements

pip install torch transformers sentencepiece

Usage

You can load this model using the transformers library. Ensure you have trust_remote_code=True set to load the model and tokenizer logic.

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_path = "freedomking/OpenSpark-13B-Chat"

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path, 
    torch_dtype=torch.bfloat16, 
    device_map="auto", 
    trust_remote_code=True
)

prompt = "<User> 你好，请自我介绍一下。<end><Bot>"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Using `apply_chat_template` (Recommended)

For multi-turn conversations, use the built-in chat template:

messages = [
    {"role": "user", "content": "你好，请自我介绍一下。"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    return_tensors="pt",
    add_generation_prompt=True
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=8192,
    temperature=0.7,
    top_k=1,
    do_sample=True,
    repetition_penalty=1.02,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Multi-turn Conversation

messages = [
    {"role": "user", "content": "什么是人工智能？"},
    {"role": "assistant", "content": "人工智能是一种模拟人类智能的技术..."},
    {"role": "user", "content": "它有哪些应用场景？"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    return_tensors="pt",
    add_generation_prompt=True
).to(model.device)

outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Model Details

Parameter	Value
Architecture	Transformer Decoder (Spark)
Parameters	~13B
Hidden Size	5120
Layers	40
Attention Heads	40
Vocab Size	60,000
Context Length	32K
RoPE Base (Theta)	1,000,000
Activation	Fast GeLU

Generation Parameters

Parameter	Recommended Value
`max_new_tokens`	8192
`temperature`	0.7
`top_k`	1
`do_sample`	True
`repetition_penalty`	1.02

Why This Conversion?

This project serves several purposes for the research community:

Historical Reference: Study the architecture of early Chinese LLMs
Benchmark Comparison: Compare performance against modern models (Qwen, DeepSeek, etc.)
Educational Value: Understand the evolution of LLM design choices
Ecosystem Compatibility: Run the model using standard Hugging Face APIs

Features

Chat Template: Supports apply_chat_template for multi-turn dialogues (<User>...<end><Bot>... format).
Standardized Naming: Consistent with mainstream models like Qwen and Llama.
Custom Tokenizer: Handles Chinese punctuation, tab formatting, and special tokens (<ret>, <end>).
BFloat16 Support: Optimized for modern GPUs with BF16 precision.

License

This project is licensed under the Apache 2.0 License.