Instructions to use coyude/Chinese-Pygmalion-13b-GPTQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use coyude/Chinese-Pygmalion-13b-GPTQ with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="coyude/Chinese-Pygmalion-13b-GPTQ")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("coyude/Chinese-Pygmalion-13b-GPTQ") model = AutoModelForCausalLM.from_pretrained("coyude/Chinese-Pygmalion-13b-GPTQ") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use coyude/Chinese-Pygmalion-13b-GPTQ with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "coyude/Chinese-Pygmalion-13b-GPTQ" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "coyude/Chinese-Pygmalion-13b-GPTQ", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/coyude/Chinese-Pygmalion-13b-GPTQ
- SGLang
How to use coyude/Chinese-Pygmalion-13b-GPTQ with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "coyude/Chinese-Pygmalion-13b-GPTQ" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "coyude/Chinese-Pygmalion-13b-GPTQ", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "coyude/Chinese-Pygmalion-13b-GPTQ" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "coyude/Chinese-Pygmalion-13b-GPTQ", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use coyude/Chinese-Pygmalion-13b-GPTQ with Docker Model Runner:
docker model run hf.co/coyude/Chinese-Pygmalion-13b-GPTQ
原始模型:https://huggingface.co/TehVenom/Pygmalion-13b-Merged
lora:https://huggingface.co/ziqingyang/chinese-alpaca-lora-13b
将Pygmalion-13b与chinese-alpaca-lora-13b进行合并,增强模型的中文能力,不过存在翻译腔
使用项目: https://github.com/ymcui/Chinese-LLaMA-Alpaca
https://github.com/qwopqwop200/GPTQ-for-LLaMa
兼容AutoGPTQ和GPTQ-for-LLaMa
若选择GPTQ-for-LLaMa加载,请设置 Wbits=4 groupsize=128 model_type=llama
Text-generation-webui懒人包: https://www.bilibili.com/read/cv23495183
Original model: https://huggingface.co/TehVenom/Pygmalion-13b-Merged
lora:https://huggingface.co/ziqingyang/chinese-alpaca-lora-13b
The Pygmalion-13b model is combined with the chinese-alpaca-lora-13b model to enhance the model's Chinese language capabilities, although there may be some translated tone.
Usage projects: https://github.com/ymcui/Chinese-LLaMA-Alpaca
https://github.com/qwopqwop200/GPTQ-for-LLaMa
Compatible with AutoGPTQ and GPTQ-for-LLaMa
If you choose to load GPTQ-for-LLaMa, please set Wbits=4 groupsize=128 model_type=llama
- Downloads last month
- 8