OpenSpark-13B-Chat
中文 | English
⚠️ Note: This is a relatively early version of the iFlytek Spark model (released in 2024). We converted it to Hugging Face format primarily for research purposes — to help the community study early LLM architectures, compare with modern models, and understand how the field has evolved.
This is a community-converted Hugging Face compatible version of the iFlytek Spark 13B model. The original weights were converted from the official Megatron-DeepSpeed format to work seamlessly with the transformers ecosystem.
Source
- Original Weights: iFlytek Spark-13B on Gitee
- Training Framework: Megatron-DeepSpeed
- Release Date: 2024
Requirements
pip install torch transformers sentencepiece
Usage
You can load this model using the transformers library. Ensure you have trust_remote_code=True set to load the model and tokenizer logic.
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_path = "freedomking/OpenSpark-13B-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
prompt = "<User> 你好,请自我介绍一下。<end><Bot>"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Using apply_chat_template (Recommended)
For multi-turn conversations, use the built-in chat template:
messages = [
{"role": "user", "content": "你好,请自我介绍一下。"}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
return_tensors="pt",
add_generation_prompt=True
).to(model.device)
outputs = model.generate(
inputs,
max_new_tokens=8192,
temperature=0.7,
top_k=1,
do_sample=True,
repetition_penalty=1.02,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Multi-turn Conversation
messages = [
{"role": "user", "content": "什么是人工智能?"},
{"role": "assistant", "content": "人工智能是一种模拟人类智能的技术..."},
{"role": "user", "content": "它有哪些应用场景?"}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
return_tensors="pt",
add_generation_prompt=True
).to(model.device)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Model Details
| Parameter | Value |
|---|---|
| Architecture | Transformer Decoder (Spark) |
| Parameters | ~13B |
| Hidden Size | 5120 |
| Layers | 40 |
| Attention Heads | 40 |
| Vocab Size | 60,000 |
| Context Length | 32K |
| RoPE Base (Theta) | 1,000,000 |
| Activation | Fast GeLU |
Generation Parameters
| Parameter | Recommended Value |
|---|---|
max_new_tokens |
8192 |
temperature |
0.7 |
top_k |
1 |
do_sample |
True |
repetition_penalty |
1.02 |
Why This Conversion?
This project serves several purposes for the research community:
- Historical Reference: Study the architecture of early Chinese LLMs
- Benchmark Comparison: Compare performance against modern models (Qwen, DeepSeek, etc.)
- Educational Value: Understand the evolution of LLM design choices
- Ecosystem Compatibility: Run the model using standard Hugging Face APIs
Features
- Chat Template: Supports
apply_chat_templatefor multi-turn dialogues (<User>...<end><Bot>...format). - Standardized Naming: Consistent with mainstream models like Qwen and Llama.
- Custom Tokenizer: Handles Chinese punctuation, tab formatting, and special tokens (
<ret>,<end>). - BFloat16 Support: Optimized for modern GPUs with BF16 precision.
License
This project is licensed under the Apache 2.0 License.
- Downloads last month
- 87