| | --- |
| | license: apache-2.0 |
| | language: |
| | - zh |
| | - en |
| | pipeline_tag: text-generation |
| | tags: |
| | - spark |
| | - iflytek |
| | - chat |
| | - pytorch |
| | - causal-lm |
| | --- |
| | |
| | # OpenSpark-13B-Chat |
| |
|
| | [**中文**](./README_zh.md) | **English** |
| |
|
| | > ⚠️ **Note**: This is a relatively early version of the iFlytek Spark model (released in 2024). We converted it to Hugging Face format primarily for **research purposes** — to help the community study early LLM architectures, compare with modern models, and understand how the field has evolved. |
| |
|
| | This is a community-converted Hugging Face compatible version of the iFlytek Spark 13B model. The original weights were converted from the official Megatron-DeepSpeed format to work seamlessly with the `transformers` ecosystem. |
| |
|
| | ## Source |
| |
|
| | - **Original Weights**: [iFlytek Spark-13B on Gitee](https://gitee.com/iflytekopensource/iFlytekSpark-13B) |
| | - **Training Framework**: Megatron-DeepSpeed |
| | - **Release Date**: 2024 |
| |
|
| | ## Requirements |
| |
|
| | ```bash |
| | pip install torch transformers sentencepiece |
| | ``` |
| |
|
| | ## Usage |
| |
|
| | You can load this model using the `transformers` library. Ensure you have `trust_remote_code=True` set to load the model and tokenizer logic. |
| |
|
| | ### Basic Usage |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | import torch |
| | |
| | model_path = "freedomking/OpenSpark-13B-Chat" |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) |
| | model = AutoModelForCausalLM.from_pretrained( |
| | model_path, |
| | torch_dtype=torch.bfloat16, |
| | device_map="auto", |
| | trust_remote_code=True |
| | ) |
| | |
| | prompt = "<User> 你好,请自我介绍一下。<end><Bot>" |
| | inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| | |
| | outputs = model.generate(**inputs, max_new_tokens=512) |
| | print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| | ``` |
| |
|
| | ### Using `apply_chat_template` (Recommended) |
| |
|
| | For multi-turn conversations, use the built-in chat template: |
| |
|
| | ```python |
| | messages = [ |
| | {"role": "user", "content": "你好,请自我介绍一下。"} |
| | ] |
| | |
| | inputs = tokenizer.apply_chat_template( |
| | messages, |
| | tokenize=True, |
| | return_tensors="pt", |
| | add_generation_prompt=True |
| | ).to(model.device) |
| | |
| | outputs = model.generate( |
| | inputs, |
| | max_new_tokens=8192, |
| | temperature=0.7, |
| | top_k=1, |
| | do_sample=True, |
| | repetition_penalty=1.02, |
| | ) |
| | print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| | ``` |
| |
|
| | ### Multi-turn Conversation |
| |
|
| | ```python |
| | messages = [ |
| | {"role": "user", "content": "什么是人工智能?"}, |
| | {"role": "assistant", "content": "人工智能是一种模拟人类智能的技术..."}, |
| | {"role": "user", "content": "它有哪些应用场景?"} |
| | ] |
| | |
| | inputs = tokenizer.apply_chat_template( |
| | messages, |
| | tokenize=True, |
| | return_tensors="pt", |
| | add_generation_prompt=True |
| | ).to(model.device) |
| | |
| | outputs = model.generate(inputs, max_new_tokens=512) |
| | print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| | ``` |
| |
|
| | ## Model Details |
| |
|
| | | Parameter | Value | |
| | |---|---| |
| | | Architecture | Transformer Decoder (Spark) | |
| | | Parameters | ~13B | |
| | | Hidden Size | 5120 | |
| | | Layers | 40 | |
| | | Attention Heads | 40 | |
| | | Vocab Size | 60,000 | |
| | | Context Length | 32K | |
| | | RoPE Base (Theta) | 1,000,000 | |
| | | Activation | Fast GeLU | |
| |
|
| | ## Generation Parameters |
| |
|
| | | Parameter | Recommended Value | |
| | |---|---| |
| | | `max_new_tokens` | 8192 | |
| | | `temperature` | 0.7 | |
| | | `top_k` | 1 | |
| | | `do_sample` | True | |
| | | `repetition_penalty` | 1.02 | |
| |
|
| | ## Why This Conversion? |
| |
|
| | This project serves several purposes for the research community: |
| |
|
| | 1. **Historical Reference**: Study the architecture of early Chinese LLMs |
| | 2. **Benchmark Comparison**: Compare performance against modern models (Qwen, DeepSeek, etc.) |
| | 3. **Educational Value**: Understand the evolution of LLM design choices |
| | 4. **Ecosystem Compatibility**: Run the model using standard Hugging Face APIs |
| |
|
| | ## Features |
| |
|
| | - **Chat Template**: Supports `apply_chat_template` for multi-turn dialogues (`<User>...<end><Bot>...` format). |
| | - **Standardized Naming**: Consistent with mainstream models like Qwen and Llama. |
| | - **Custom Tokenizer**: Handles Chinese punctuation, tab formatting, and special tokens (`<ret>`, `<end>`). |
| | - **BFloat16 Support**: Optimized for modern GPUs with BF16 precision. |
| |
|
| | ## License |
| |
|
| | This project is licensed under the [Apache 2.0 License](https://gitee.com/iflytekopensource/iFlytekSpark-13B/blob/master/LICENSE). |