File size: 4,314 Bytes
23fc5a7 32cafd3 a92daa2 32cafd3 a92daa2 32cafd3 a92daa2 32cafd3 a92daa2 32cafd3 a92daa2 32cafd3 a92daa2 32cafd3 a92daa2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | ---
license: apache-2.0
language:
- zh
- en
pipeline_tag: text-generation
tags:
- spark
- iflytek
- chat
- pytorch
- causal-lm
---
# OpenSpark-13B-Chat
[**中文**](./README_zh.md) | **English**
> ⚠️ **Note**: This is a relatively early version of the iFlytek Spark model (released in 2024). We converted it to Hugging Face format primarily for **research purposes** — to help the community study early LLM architectures, compare with modern models, and understand how the field has evolved.
This is a community-converted Hugging Face compatible version of the iFlytek Spark 13B model. The original weights were converted from the official Megatron-DeepSpeed format to work seamlessly with the `transformers` ecosystem.
## Source
- **Original Weights**: [iFlytek Spark-13B on Gitee](https://gitee.com/iflytekopensource/iFlytekSpark-13B)
- **Training Framework**: Megatron-DeepSpeed
- **Release Date**: 2024
## Requirements
```bash
pip install torch transformers sentencepiece
```
## Usage
You can load this model using the `transformers` library. Ensure you have `trust_remote_code=True` set to load the model and tokenizer logic.
### Basic Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_path = "freedomking/OpenSpark-13B-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
prompt = "<User> 你好,请自我介绍一下。<end><Bot>"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Using `apply_chat_template` (Recommended)
For multi-turn conversations, use the built-in chat template:
```python
messages = [
{"role": "user", "content": "你好,请自我介绍一下。"}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
return_tensors="pt",
add_generation_prompt=True
).to(model.device)
outputs = model.generate(
inputs,
max_new_tokens=8192,
temperature=0.7,
top_k=1,
do_sample=True,
repetition_penalty=1.02,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Multi-turn Conversation
```python
messages = [
{"role": "user", "content": "什么是人工智能?"},
{"role": "assistant", "content": "人工智能是一种模拟人类智能的技术..."},
{"role": "user", "content": "它有哪些应用场景?"}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
return_tensors="pt",
add_generation_prompt=True
).to(model.device)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Model Details
| Parameter | Value |
|---|---|
| Architecture | Transformer Decoder (Spark) |
| Parameters | ~13B |
| Hidden Size | 5120 |
| Layers | 40 |
| Attention Heads | 40 |
| Vocab Size | 60,000 |
| Context Length | 32K |
| RoPE Base (Theta) | 1,000,000 |
| Activation | Fast GeLU |
## Generation Parameters
| Parameter | Recommended Value |
|---|---|
| `max_new_tokens` | 8192 |
| `temperature` | 0.7 |
| `top_k` | 1 |
| `do_sample` | True |
| `repetition_penalty` | 1.02 |
## Why This Conversion?
This project serves several purposes for the research community:
1. **Historical Reference**: Study the architecture of early Chinese LLMs
2. **Benchmark Comparison**: Compare performance against modern models (Qwen, DeepSeek, etc.)
3. **Educational Value**: Understand the evolution of LLM design choices
4. **Ecosystem Compatibility**: Run the model using standard Hugging Face APIs
## Features
- **Chat Template**: Supports `apply_chat_template` for multi-turn dialogues (`<User>...<end><Bot>...` format).
- **Standardized Naming**: Consistent with mainstream models like Qwen and Llama.
- **Custom Tokenizer**: Handles Chinese punctuation, tab formatting, and special tokens (`<ret>`, `<end>`).
- **BFloat16 Support**: Optimized for modern GPUs with BF16 precision.
## License
This project is licensed under the [Apache 2.0 License](https://gitee.com/iflytekopensource/iFlytekSpark-13B/blob/master/LICENSE). |