Instructions to use AALF/gemma-2-27b-it-SimPO-37K with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AALF/gemma-2-27b-it-SimPO-37K with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AALF/gemma-2-27b-it-SimPO-37K") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AALF/gemma-2-27b-it-SimPO-37K") model = AutoModelForCausalLM.from_pretrained("AALF/gemma-2-27b-it-SimPO-37K") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use AALF/gemma-2-27b-it-SimPO-37K with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AALF/gemma-2-27b-it-SimPO-37K" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AALF/gemma-2-27b-it-SimPO-37K", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AALF/gemma-2-27b-it-SimPO-37K
- SGLang
How to use AALF/gemma-2-27b-it-SimPO-37K with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AALF/gemma-2-27b-it-SimPO-37K" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AALF/gemma-2-27b-it-SimPO-37K", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AALF/gemma-2-27b-it-SimPO-37K" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AALF/gemma-2-27b-it-SimPO-37K", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use AALF/gemma-2-27b-it-SimPO-37K with Docker Model Runner:
docker model run hf.co/AALF/gemma-2-27b-it-SimPO-37K
gemma-2-27b-it-SimPO-37K Model Card
Implementation Details
We first followed the SimPO framework to apply On-Policy Preference Data Generation on the HuggingFaceH4/ultrafeedback_binarized dataset using the google/gemma-2-27b-it model, using RLHFlow/ArmoRM-Llama3-8B-v0.1 as reward model to annotate responses. We then selected prompts where the chosen reward was at least 0.01 higher than the rejected reward, resulting in 37,040 training data points.
Model training was conducted using 8x80G A800 GPUs, leveraging the SimPO and alignment-handbook library. We used deepspeed_zero_stage3 with optimizer offloading to the CPU. The training configs were as follows:
# SimPOTrainer arguments
bf16: true
beta: 10
gamma_beta_ratio: 0.5
gradient_accumulation_steps: 8
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: true
hub_model_id: simpo-exps
learning_rate: 8.0e-7
log_level: info
logging_steps: 1
lr_scheduler_type: cosine
max_length: 2048
max_prompt_length: 1800
num_train_epochs: 1
optim: adamw_torch
output_dir: outputs/gemma-2-27b-it-SimPO
run_name: gemma-2-27b-it-SimPO
per_device_train_batch_size: 2
push_to_hub: false
save_strategy: "steps"
save_steps: 100
save_total_limit: 20
seed: 42
warmup_ratio: 0.1
save_only_model: true
# deepspeed_zero3_offload_optimizer.yaml
compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
deepspeed_multinode_launcher: standard
offload_optimizer_device: cpu
offload_param_device: none
zero3_init_flag: true
zero3_save_16bit_model: true
zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
main_process_port: 2390
mixed_precision: bf16
num_machines: 1
num_processes: 8
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
Citation
gemma model:
@article{gemma_2024,
title={Gemma},
url={https://www.kaggle.com/m/3301},
DOI={10.34740/KAGGLE/M/3301},
publisher={Kaggle},
author={Gemma Team},
year={2024}
}
SimPO paper:
@article{meng2024simpo,
title={{SimPO}: Simple preference optimization with a reference-free reward},
author={Meng, Yu and Xia, Mengzhou and Chen, Danqi},
journal={arXiv preprint arXiv:2405.14734},
year={2024}
}
UltraFeedback paper:
@article{cui2023ultrafeedback,
title={{UltraFeedback}: Boosting language models with high-quality feedback},
author={Cui, Ganqu and Yuan, Lifan and Ding, Ning and Yao, Guanming and Zhu, Wei and Ni, Yuan and Xie, Guotong and Liu, Zhiyuan and Sun, Maosong},
journal={arXiv preprint arXiv:2310.01377},
year={2023}
}
- Downloads last month
- 24
Model tree for AALF/gemma-2-27b-it-SimPO-37K
Base model
google/gemma-2-27b