Text Generation
Transformers
Safetensors
Chinese
English
llama
chat
evaluate
GenRM
text-generation-inference
Instructions to use FlagEval/flageval_judgemodel with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FlagEval/flageval_judgemodel with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="FlagEval/flageval_judgemodel")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("FlagEval/flageval_judgemodel") model = AutoModelForCausalLM.from_pretrained("FlagEval/flageval_judgemodel") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use FlagEval/flageval_judgemodel with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "FlagEval/flageval_judgemodel" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FlagEval/flageval_judgemodel", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/FlagEval/flageval_judgemodel
- SGLang
How to use FlagEval/flageval_judgemodel with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FlagEval/flageval_judgemodel" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FlagEval/flageval_judgemodel", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "FlagEval/flageval_judgemodel" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FlagEval/flageval_judgemodel", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use FlagEval/flageval_judgemodel with Docker Model Runner:
docker model run hf.co/FlagEval/flageval_judgemodel
flageval_judgemodel Card
Model Details
flageval_judgemodel is a judgeLLM (also GenRM -- generative reward model) developed by FlagEval team (https://flageval.baai.ac.cn/#/home).
- Developed by: FlagEval, BAAI
- Model type: An auto-regressive language model based on the transformer architecture.
- License: Non-commercial license
- Finetuned from model: Vicuna.
Uses
The flageval_judgemodel is designed to evaluate the performance of large language models on CLCC dataset. This dataset (https://huggingface.co/datasets/eyuansu71/CLCC_v1) is a Chinese Linguistics & Cognition Challenge dataset. The flageval_judgemodel aims to provide an automated evaluation, potentially replacing human judgment in assessing the models' outputs.
Quickstart
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
def promptify(prompt, pred, gold):
sys = "We would like to request your feedback on the performance of two AI assistants in response to the user question displayed above.\nPlease rate the helpfulness, relevance, accuracy, level of details of their responses. Each assistant receives an overall score on a scale of 1 to 10, where a higher score indicates better overall performance.\nPlease first output a single line containing only two values indicating the scores for Assistant 1 and 2, respectively. The two scores are separated by a space. In the subsequent line, please provide a comprehensive explanation of your evaluation, avoiding any potential bias and ensuring that the order in which the responses were presented does not affect your judgment."
prompt_template = f"You are a helpful and precise assistant for checking the quality of the answer.\n[Question]\n{prompt}\n\n[The Start of Assistant 1's Answer]\n{gold}\n\n[The End of Assistant 1's Answer]\n\n[The Start of Assistant 2's Answer]\n{pred}\n\n[The End of Assistant 2's Answer]\n\n[System]\n{sys}\n\n### Response:10"
return prompt_template
model = AutoModelForCausalLM.from_pretrained("FlagEval/flageval_judgemodel", torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, attn_implementation="flash_attention_2").cuda()
tokenizer = AutoTokenizer.from_pretrained("FlagEval/flageval_judgemodel")
prompt, pred, gold = '1、约翰喜欢看电影,玛丽也喜欢。\n2、约翰也喜欢看足球比赛。\n请问以上两句话是否是一个意思?', "不一样", "不一样"
with torch.no_grad():
data_sample = promptify(prompt, pred, gold)
input_ids = tokenizer(data_sample, return_tensors="pt").input_ids
output_ids = model.generate(
torch.as_tensor(input_ids).cuda(),
max_new_tokens=128,
)
text = tokenizer.decode(output_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
prompt_length = len(data_sample)
ans = text[prompt_length:].strip()
pred_label = 1 if int(ans) == 10 else 0
- Downloads last month
- 9