Update README.md

a92daa2 verified 15 days ago

4.31 kB

	---
	license: apache-2.0
	language:
	- zh
	- en
	pipeline_tag: text-generation
	tags:
	- spark
	- iflytek
	- chat
	- pytorch
	- causal-lm
	---

	# OpenSpark-13B-Chat

	[中文](./README_zh.md) \| English

	> ⚠️ Note: This is a relatively early version of the iFlytek Spark model (released in 2024). We converted it to Hugging Face format primarily for research purposes — to help the community study early LLM architectures, compare with modern models, and understand how the field has evolved.

	This is a community-converted Hugging Face compatible version of the iFlytek Spark 13B model. The original weights were converted from the official Megatron-DeepSpeed format to work seamlessly with the `transformers` ecosystem.

	## Source

	- Original Weights: [iFlytek Spark-13B on Gitee](https://gitee.com/iflytekopensource/iFlytekSpark-13B)
	- Training Framework: Megatron-DeepSpeed
	- Release Date: 2024

	## Requirements

	```bash
	pip install torch transformers sentencepiece
	```

	## Usage

	You can load this model using the `transformers` library. Ensure you have `trust_remote_code=True` set to load the model and tokenizer logic.

	### Basic Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_path = "freedomking/OpenSpark-13B-Chat"

	tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_path,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	trust_remote_code=True
	)

	prompt = "<User> 你好，请自我介绍一下。<end><Bot>"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	outputs = model.generate(**inputs, max_new_tokens=512)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Using `apply_chat_template` (Recommended)

	For multi-turn conversations, use the built-in chat template:

	```python
	messages = [
	{"role": "user", "content": "你好，请自我介绍一下。"}
	]

	inputs = tokenizer.apply_chat_template(
	messages,
	tokenize=True,
	return_tensors="pt",
	add_generation_prompt=True
	).to(model.device)

	outputs = model.generate(
	inputs,
	max_new_tokens=8192,
	temperature=0.7,
	top_k=1,
	do_sample=True,
	repetition_penalty=1.02,
	)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Multi-turn Conversation

	```python
	messages = [
	{"role": "user", "content": "什么是人工智能？"},
	{"role": "assistant", "content": "人工智能是一种模拟人类智能的技术..."},
	{"role": "user", "content": "它有哪些应用场景？"}
	]

	inputs = tokenizer.apply_chat_template(
	messages,
	tokenize=True,
	return_tensors="pt",
	add_generation_prompt=True
	).to(model.device)

	outputs = model.generate(inputs, max_new_tokens=512)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Model Details

	\| Parameter \| Value \|
	\|---\|---\|
	\| Architecture \| Transformer Decoder (Spark) \|
	\| Parameters \| ~13B \|
	\| Hidden Size \| 5120 \|
	\| Layers \| 40 \|
	\| Attention Heads \| 40 \|
	\| Vocab Size \| 60,000 \|
	\| Context Length \| 32K \|
	\| RoPE Base (Theta) \| 1,000,000 \|
	\| Activation \| Fast GeLU \|

	## Generation Parameters

	\| Parameter \| Recommended Value \|
	\|---\|---\|
	\| `max_new_tokens` \| 8192 \|
	\| `temperature` \| 0.7 \|
	\| `top_k` \| 1 \|
	\| `do_sample` \| True \|
	\| `repetition_penalty` \| 1.02 \|

	## Why This Conversion?

	This project serves several purposes for the research community:

	1. Historical Reference: Study the architecture of early Chinese LLMs
	2. Benchmark Comparison: Compare performance against modern models (Qwen, DeepSeek, etc.)
	3. Educational Value: Understand the evolution of LLM design choices
	4. Ecosystem Compatibility: Run the model using standard Hugging Face APIs

	## Features

	- Chat Template: Supports `apply_chat_template` for multi-turn dialogues (`<User>...<end><Bot>...` format).
	- Standardized Naming: Consistent with mainstream models like Qwen and Llama.
	- Custom Tokenizer: Handles Chinese punctuation, tab formatting, and special tokens (`<ret>`, `<end>`).
	- BFloat16 Support: Optimized for modern GPUs with BF16 precision.

	## License

	This project is licensed under the [Apache 2.0 License](https://gitee.com/iflytekopensource/iFlytekSpark-13B/blob/master/LICENSE).