How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/gemma-2-Ifable-9B-GGUF:
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/gemma-2-Ifable-9B-GGUF:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/gemma-2-Ifable-9B-GGUF:
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/gemma-2-Ifable-9B-GGUF:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf QuantFactory/gemma-2-Ifable-9B-GGUF:
# Run inference directly in the terminal:
./llama-cli -hf QuantFactory/gemma-2-Ifable-9B-GGUF:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf QuantFactory/gemma-2-Ifable-9B-GGUF:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf QuantFactory/gemma-2-Ifable-9B-GGUF:
Use Docker
docker model run hf.co/QuantFactory/gemma-2-Ifable-9B-GGUF:
Quick Links

QuantFactory Banner

QuantFactory/gemma-2-Ifable-9B-GGUF

This is quantized version of ifable/gemma-2-Ifable-9B created using llama.cpp

Original Model Card

ifable/gemma-2-Ifable-9B

Training and evaluation data

Training procedure

Training method: SimPO (GitHub - princeton-nlp/SimPO: SimPO: Simple Preference Optimization with a Reference-Free Reward)

It achieves the following results on the evaluation set:

  • Loss: 1.0163
  • Rewards/chosen: -21.6822
  • Rewards/rejected: -47.8754
  • Rewards/accuracies: 0.9167
  • Rewards/margins: 26.1931
  • Logps/rejected: -4.7875
  • Logps/chosen: -2.1682
  • Logits/rejected: -17.0475
  • Logits/chosen: -12.0041

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-07
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Sft Loss
1.4444 0.9807 35 1.0163 -21.6822 -47.8754 0.9167 26.1931 -4.7875 -2.1682 -17.0475 -12.0041 0.0184

Framework versions

  • Transformers 4.43.4
  • Pytorch 2.3.0a0+ebedce2
  • Datasets 2.20.0
  • Tokenizers 0.19.1

We are looking for product manager and operations managers to build applications through our model, and also open for business cooperation, and also AI engineer to join us, contact with : contact@ifable.ai

Downloads last month
273
GGUF
Model size
9B params
Architecture
gemma2
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support