Instructions to use Tele-AI/telechat-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Tele-AI/telechat-7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Tele-AI/telechat-7B", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Tele-AI/telechat-7B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Tele-AI/telechat-7B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Tele-AI/telechat-7B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Tele-AI/telechat-7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Tele-AI/telechat-7B
- SGLang
How to use Tele-AI/telechat-7B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Tele-AI/telechat-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Tele-AI/telechat-7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Tele-AI/telechat-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Tele-AI/telechat-7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Tele-AI/telechat-7B with Docker Model Runner:
docker model run hf.co/Tele-AI/telechat-7B
update readme
Browse files
README.md
CHANGED
|
@@ -1,3 +1,27 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
+
# 模型介绍
|
| 5 |
+
### TeleChat星辰语义大模型
|
| 6 |
+
- TeleChat星辰语义大模型是由中电信人工智能科技有限公司(北京)研发训练的大语言模型,采用1.5万亿 Tokens中英文高质量语料进行训练。
|
| 7 |
+
- 本次开源了对话模型**TeleChat-7B-bot**,以及其`huggingface`格式的权重文件。此外,我们还开源了7B模型的int8和int4量化版本。
|
| 8 |
+
|
| 9 |
+
### 模型结构
|
| 10 |
+
|
| 11 |
+
我们采用标准的 `Decoder-only` 结构设计了 **TeleChat** 模型,并在模型维度做了如下的一些改进:
|
| 12 |
+
|
| 13 |
+
- **位置编码**:我们使用 [Rotary Embedding](https://arxiv.org/pdf/2104.09864.pdf) 的位置编码方法,该方法将相对位置信息依赖集成到 self-attention 中,并且具有较好的位置外推性。Rotary Embedding还可以较好地与Flash-Attention v2 配合使用,将模型的训练速度提升约20%。
|
| 14 |
+
- **激活函数**:我们使用 [SwiGLU](https://arxiv.org/pdf/2002.05202.pdf) 激活函数来替代GELU激活函数 , 为了减少计算量,将`ffn_hidden_size`设置为小于原始SwiGLU中的4倍隐藏层大小。
|
| 15 |
+
- **层标准化**: 基于 [RMSNorm](https://arxiv.org/abs/1910.07467) 的 Pre-Normalization。
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
| | layer_num | hidden_size | ffn_hidden_size | head_num | 是否使用embed-layernorm |
|
| 19 |
+
|-----| --------- | ----------- | --------------- | -------- | ----------------------- |
|
| 20 |
+
| 7B | 30 | 4096 | 12288 | 32 | 否
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
我们开源的TeleChat模型:
|
| 24 |
+
- 支持deepspeed微调,开源了基于deepspeed的训练代码,支持Zero并行显存优化,同时集成了FlashAttention2
|
| 25 |
+
- 多轮能力支持。开源了多轮数据构建方式,针对多轮模型训练集成了针对多轮的mask loss训练方式,更好的聚焦多轮答案,提升问答效果。
|
| 26 |
+
- 外推能力提升。开源了8K训练版本模型,采用 NTK-aware + LogN 外推方式,可以外推到32K。
|
| 27 |
+
- 具备较好的长文生成能力。在工作总结、工作计划、PPT大纲、申论、招标书、邮件、方案、周报、JD写作等长文写作任务重具有较好的表现。
|