--- license: apache-2.0 language: - zh - en pipeline_tag: text-generation tags: - spark - iflytek - chat - pytorch - causal-lm --- # OpenSpark-13B-Chat [**中文**](./README_zh.md) | **English** > ⚠️ **Note**: This is a relatively early version of the iFlytek Spark model (released in 2024). We converted it to Hugging Face format primarily for **research purposes** — to help the community study early LLM architectures, compare with modern models, and understand how the field has evolved. This is a community-converted Hugging Face compatible version of the iFlytek Spark 13B model. The original weights were converted from the official Megatron-DeepSpeed format to work seamlessly with the `transformers` ecosystem. ## Source - **Original Weights**: [iFlytek Spark-13B on Gitee](https://gitee.com/iflytekopensource/iFlytekSpark-13B) - **Training Framework**: Megatron-DeepSpeed - **Release Date**: 2024 ## Requirements ```bash pip install torch transformers sentencepiece ``` ## Usage You can load this model using the `transformers` library. Ensure you have `trust_remote_code=True` set to load the model and tokenizer logic. ### Basic Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_path = "freedomking/OpenSpark-13B-Chat" tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_path, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) prompt = " 你好,请自我介绍一下。" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=512) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### Using `apply_chat_template` (Recommended) For multi-turn conversations, use the built-in chat template: ```python messages = [ {"role": "user", "content": "你好,请自我介绍一下。"} ] inputs = tokenizer.apply_chat_template( messages, tokenize=True, return_tensors="pt", add_generation_prompt=True ).to(model.device) outputs = model.generate( inputs, max_new_tokens=8192, temperature=0.7, top_k=1, do_sample=True, repetition_penalty=1.02, ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### Multi-turn Conversation ```python messages = [ {"role": "user", "content": "什么是人工智能?"}, {"role": "assistant", "content": "人工智能是一种模拟人类智能的技术..."}, {"role": "user", "content": "它有哪些应用场景?"} ] inputs = tokenizer.apply_chat_template( messages, tokenize=True, return_tensors="pt", add_generation_prompt=True ).to(model.device) outputs = model.generate(inputs, max_new_tokens=512) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Model Details | Parameter | Value | |---|---| | Architecture | Transformer Decoder (Spark) | | Parameters | ~13B | | Hidden Size | 5120 | | Layers | 40 | | Attention Heads | 40 | | Vocab Size | 60,000 | | Context Length | 32K | | RoPE Base (Theta) | 1,000,000 | | Activation | Fast GeLU | ## Generation Parameters | Parameter | Recommended Value | |---|---| | `max_new_tokens` | 8192 | | `temperature` | 0.7 | | `top_k` | 1 | | `do_sample` | True | | `repetition_penalty` | 1.02 | ## Why This Conversion? This project serves several purposes for the research community: 1. **Historical Reference**: Study the architecture of early Chinese LLMs 2. **Benchmark Comparison**: Compare performance against modern models (Qwen, DeepSeek, etc.) 3. **Educational Value**: Understand the evolution of LLM design choices 4. **Ecosystem Compatibility**: Run the model using standard Hugging Face APIs ## Features - **Chat Template**: Supports `apply_chat_template` for multi-turn dialogues (`......` format). - **Standardized Naming**: Consistent with mainstream models like Qwen and Llama. - **Custom Tokenizer**: Handles Chinese punctuation, tab formatting, and special tokens (``, ``). - **BFloat16 Support**: Optimized for modern GPUs with BF16 precision. ## License This project is licensed under the [Apache 2.0 License](https://gitee.com/iflytekopensource/iFlytekSpark-13B/blob/master/LICENSE).