File size: 4,314 Bytes
23fc5a7
 
 
 
 
 
 
 
 
 
 
 
 
 
32cafd3
 
 
 
a92daa2
 
 
 
 
 
 
 
 
32cafd3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a92daa2
32cafd3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a92daa2
32cafd3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a92daa2
32cafd3
 
 
 
 
 
 
 
 
 
 
 
 
 
a92daa2
32cafd3
 
 
 
 
 
 
 
 
 
 
a92daa2
 
 
 
 
 
 
 
 
32cafd3
 
 
 
 
 
 
 
 
a92daa2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
license: apache-2.0
language:
- zh
- en
pipeline_tag: text-generation
tags:
- spark
- iflytek
- chat
- pytorch
- causal-lm
---

# OpenSpark-13B-Chat

[**中文**](./README_zh.md) | **English**

> ⚠️ **Note**: This is a relatively early version of the iFlytek Spark model (released in 2024). We converted it to Hugging Face format primarily for **research purposes** — to help the community study early LLM architectures, compare with modern models, and understand how the field has evolved.

This is a community-converted Hugging Face compatible version of the iFlytek Spark 13B model. The original weights were converted from the official Megatron-DeepSpeed format to work seamlessly with the `transformers` ecosystem.

## Source

- **Original Weights**: [iFlytek Spark-13B on Gitee](https://gitee.com/iflytekopensource/iFlytekSpark-13B)
- **Training Framework**: Megatron-DeepSpeed
- **Release Date**: 2024

## Requirements

```bash
pip install torch transformers sentencepiece
```

## Usage

You can load this model using the `transformers` library. Ensure you have `trust_remote_code=True` set to load the model and tokenizer logic.

### Basic Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_path = "freedomking/OpenSpark-13B-Chat"

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path, 
    torch_dtype=torch.bfloat16, 
    device_map="auto", 
    trust_remote_code=True
)

prompt = "<User> 你好,请自我介绍一下。<end><Bot>"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Using `apply_chat_template` (Recommended)

For multi-turn conversations, use the built-in chat template:

```python
messages = [
    {"role": "user", "content": "你好,请自我介绍一下。"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    return_tensors="pt",
    add_generation_prompt=True
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=8192,
    temperature=0.7,
    top_k=1,
    do_sample=True,
    repetition_penalty=1.02,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Multi-turn Conversation

```python
messages = [
    {"role": "user", "content": "什么是人工智能?"},
    {"role": "assistant", "content": "人工智能是一种模拟人类智能的技术..."},
    {"role": "user", "content": "它有哪些应用场景?"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    return_tensors="pt",
    add_generation_prompt=True
).to(model.device)

outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Model Details

| Parameter | Value |
|---|---|
| Architecture | Transformer Decoder (Spark) |
| Parameters | ~13B |
| Hidden Size | 5120 |
| Layers | 40 |
| Attention Heads | 40 |
| Vocab Size | 60,000 |
| Context Length | 32K |
| RoPE Base (Theta) | 1,000,000 |
| Activation | Fast GeLU |

## Generation Parameters

| Parameter | Recommended Value |
|---|---|
| `max_new_tokens` | 8192 |
| `temperature` | 0.7 |
| `top_k` | 1 |
| `do_sample` | True |
| `repetition_penalty` | 1.02 |

## Why This Conversion?

This project serves several purposes for the research community:

1. **Historical Reference**: Study the architecture of early Chinese LLMs
2. **Benchmark Comparison**: Compare performance against modern models (Qwen, DeepSeek, etc.)
3. **Educational Value**: Understand the evolution of LLM design choices
4. **Ecosystem Compatibility**: Run the model using standard Hugging Face APIs

## Features

- **Chat Template**: Supports `apply_chat_template` for multi-turn dialogues (`<User>...<end><Bot>...` format).
- **Standardized Naming**: Consistent with mainstream models like Qwen and Llama.
- **Custom Tokenizer**: Handles Chinese punctuation, tab formatting, and special tokens (`<ret>`, `<end>`).
- **BFloat16 Support**: Optimized for modern GPUs with BF16 precision.

## License

This project is licensed under the [Apache 2.0 License](https://gitee.com/iflytekopensource/iFlytekSpark-13B/blob/master/LICENSE).