How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf sangwon1472/Gemma-MLX-Studio:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf sangwon1472/Gemma-MLX-Studio:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf sangwon1472/Gemma-MLX-Studio:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf sangwon1472/Gemma-MLX-Studio:Q4_K_M
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf sangwon1472/Gemma-MLX-Studio:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf sangwon1472/Gemma-MLX-Studio:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf sangwon1472/Gemma-MLX-Studio:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf sangwon1472/Gemma-MLX-Studio:Q4_K_M
Use Docker
docker model run hf.co/sangwon1472/Gemma-MLX-Studio:Q4_K_M
Quick Links

Gemma MLX Studio

GGUF exports for the best_4b_v2_stronger run produced with Gemma MLX Studio.

Files

  • best_4b_v2_stronger-Q4_K_M.gguf
    • recommended for practical LM Studio / llama.cpp use
  • best_4b_v2_stronger-f16.gguf
    • high precision archive / conversion source

Notes

  • Base family: Gemma 4 E4B IT
  • Fine-tuning workflow: local MLX + LoRA, then fused export
  • Primary language: Korean

Suggested LM Studio settings

  • Temperature: 0.2
  • Top P: 0.9
  • Top K: 40
  • Repetition Penalty: 1.1

Prompt style

This model tends to work best with:

  • short and structured instructions
  • explicit output length constraints
  • proper noun preservation

Example:

First Fire Horizon와 Nightglass Relay의 차이를 3문장으로 설명해줘.
First Fire Horizon에는 튜토리얼, 기본 보급, 첫 항로 개방만 연결하고,
Nightglass Relay에는 신호, 기록, 중계만 연결해.
고유명사는 번역하지 마.
Downloads last month
66
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sangwon1472/Gemma-MLX-Studio

Quantized
(194)
this model