Instructions to use mxz/llama3-8b-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mxz/llama3-8b-sft with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mxz/llama3-8b-sft")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("mxz/llama3-8b-sft") model = AutoModelForCausalLM.from_pretrained("mxz/llama3-8b-sft") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use mxz/llama3-8b-sft with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mxz/llama3-8b-sft" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mxz/llama3-8b-sft", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/mxz/llama3-8b-sft
- SGLang
How to use mxz/llama3-8b-sft with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mxz/llama3-8b-sft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mxz/llama3-8b-sft", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mxz/llama3-8b-sft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mxz/llama3-8b-sft", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use mxz/llama3-8b-sft with Docker Model Runner:
docker model run hf.co/mxz/llama3-8b-sft
dataset Intruction
datasets: \
- mxz/alpaca_en_zh_ruozhiba_gpt4data \
language:
- zh
- en
metrics:
- Average
- perplexity
pipeline_tag:
- text-generation
tags:
- SFT
- fintune
- alignment
- LoRA
- Llama-3
About mxz-llama-3-8B-sft
This model trained by SFT.
It's have coding, reasoing, chinese QA... .
You could test this model with [Colab]
I published mix-instruction alpaca-style dataset '[mxz/alpaca_en_zh_ruozhiba_gpt4data]'
evaluation
Result:
| Model | MMLU | C-EVAL | C-MMLU |
|---|---|---|---|
| Llama-3-8B | 55.5 | 47.0 | 48.0 |
| Llama-3-8B-Instruct | 60.1 | 49.7 | 49.3 |
| Llama-3-8B-sft | 61.8 | 49.1 | 49.4 |
- Llama-3-8B evaluation result from ymcui/Chinese-LLaMA-Alpaca-3
test
generation like
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
device = 'cuda:0'
model = AutoModelForCausalLM.from_pretrained(
model_name,
load_in_4bit=True, bnb_4bit_quant_type="nf4",
)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True,)
SYSTEM_PROMPT = '''You are MA-RLHF Chatbot, you should friendly answer the question'''
def format_prompt(prompt):
return f"###System: {SYSTEM_PROMPT}\n###Question: {prompt}\n###Answer: "
def generate(prompt, max_new_tokens=256):
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>"),
tokenizer.convert_tokens_to_ids("<|end_of_text|>"),
]
input = format_prompt(prompt)
inputs = tokenizer(input, return_tensors='pt').to(device)
output = model.generate(inputs['input_ids'],max_new_tokens=max_new_tokens, do_sample=False, temperature=1.0, eos_token_id=terminators)
output = tokenizer.decode(output[0], skip_special_tokens=True) # set `skip_special_tokens=False` to debug
return output
test :
prompt = '眼镜蛇是因为近视才带眼镜的吗?'
response = generate(prompt)
print(response)
result:
###System: You are MA-RLHF Chatbot, you should friendly answer the question
###Question: 眼镜蛇是因为近视才带眼镜的吗?
###Answer: 不能。"眼镜蛇"是指一种蛇,它的名字来源于它的眼睛形状,而不是因为它需要佩戴眼镜。
example math and code
###System: You are MA-RLHF Chatbot, you should friendly answer the question
###Question: 简述梯度下降算法,并补充对应的latex公式和python代码实现。
###Answer: 1. 梯度下降算法简述: 梯度下降算法是一种常用的优化算法,用于寻找函数的最小值。
它通过不断迭代,沿着函数的梯度方向移动来实现。
梯度下降算法的基本思想是:沿着函数的负梯度方向移动,直到到达局部最小值。
2. 梯度下降算法的数学公式: 假设我们要优化的函数为f(x),其中x为一个n维向量。
梯度下降算法的更新公式为: x_new = x_old - lr * ∇f(x_old)
其中,x_new为新迭代的x值,x_old为上一迭代的x值,lr为学习率,
∇f(x_old)为函数f(x_old)的梯度。
3. 梯度下降算法的Python代码实现: 下面是一个简单的梯度下降算法的Python代码实现示例:
import numpy as np
def gradient_descent(f, x0, lr=0.01, max_iter=100):
x = x0
- Downloads last month
- 3
docker model run hf.co/mxz/llama3-8b-sft