OpenSpark-13B-Chat / README.md
freedomking's picture
Update README.md
a92daa2 verified
metadata
license: apache-2.0
language:
  - zh
  - en
pipeline_tag: text-generation
tags:
  - spark
  - iflytek
  - chat
  - pytorch
  - causal-lm

OpenSpark-13B-Chat

中文 | English

⚠️ Note: This is a relatively early version of the iFlytek Spark model (released in 2024). We converted it to Hugging Face format primarily for research purposes — to help the community study early LLM architectures, compare with modern models, and understand how the field has evolved.

This is a community-converted Hugging Face compatible version of the iFlytek Spark 13B model. The original weights were converted from the official Megatron-DeepSpeed format to work seamlessly with the transformers ecosystem.

Source

Requirements

pip install torch transformers sentencepiece

Usage

You can load this model using the transformers library. Ensure you have trust_remote_code=True set to load the model and tokenizer logic.

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_path = "freedomking/OpenSpark-13B-Chat"

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path, 
    torch_dtype=torch.bfloat16, 
    device_map="auto", 
    trust_remote_code=True
)

prompt = "<User> 你好,请自我介绍一下。<end><Bot>"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Using apply_chat_template (Recommended)

For multi-turn conversations, use the built-in chat template:

messages = [
    {"role": "user", "content": "你好,请自我介绍一下。"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    return_tensors="pt",
    add_generation_prompt=True
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=8192,
    temperature=0.7,
    top_k=1,
    do_sample=True,
    repetition_penalty=1.02,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Multi-turn Conversation

messages = [
    {"role": "user", "content": "什么是人工智能?"},
    {"role": "assistant", "content": "人工智能是一种模拟人类智能的技术..."},
    {"role": "user", "content": "它有哪些应用场景?"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    return_tensors="pt",
    add_generation_prompt=True
).to(model.device)

outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Model Details

Parameter Value
Architecture Transformer Decoder (Spark)
Parameters ~13B
Hidden Size 5120
Layers 40
Attention Heads 40
Vocab Size 60,000
Context Length 32K
RoPE Base (Theta) 1,000,000
Activation Fast GeLU

Generation Parameters

Parameter Recommended Value
max_new_tokens 8192
temperature 0.7
top_k 1
do_sample True
repetition_penalty 1.02

Why This Conversion?

This project serves several purposes for the research community:

  1. Historical Reference: Study the architecture of early Chinese LLMs
  2. Benchmark Comparison: Compare performance against modern models (Qwen, DeepSeek, etc.)
  3. Educational Value: Understand the evolution of LLM design choices
  4. Ecosystem Compatibility: Run the model using standard Hugging Face APIs

Features

  • Chat Template: Supports apply_chat_template for multi-turn dialogues (<User>...<end><Bot>... format).
  • Standardized Naming: Consistent with mainstream models like Qwen and Llama.
  • Custom Tokenizer: Handles Chinese punctuation, tab formatting, and special tokens (<ret>, <end>).
  • BFloat16 Support: Optimized for modern GPUs with BF16 precision.

License

This project is licensed under the Apache 2.0 License.