Instructions to use lthn/lemer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use lthn/lemer with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="lthn/lemer", filename="lemer-bf16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use lthn/lemer with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf lthn/lemer:Q4_K_M # Run inference directly in the terminal: llama-cli -hf lthn/lemer:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf lthn/lemer:Q4_K_M # Run inference directly in the terminal: llama-cli -hf lthn/lemer:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf lthn/lemer:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf lthn/lemer:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf lthn/lemer:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf lthn/lemer:Q4_K_M
Use Docker
docker model run hf.co/lthn/lemer:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use lthn/lemer with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "lthn/lemer" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lthn/lemer", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/lthn/lemer:Q4_K_M
- Ollama
How to use lthn/lemer with Ollama:
ollama run hf.co/lthn/lemer:Q4_K_M
- Unsloth Studio new
How to use lthn/lemer with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for lthn/lemer to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for lthn/lemer to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for lthn/lemer to start chatting
- Pi new
How to use lthn/lemer with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf lthn/lemer:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "lthn/lemer:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use lthn/lemer with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf lthn/lemer:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default lthn/lemer:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use lthn/lemer with Docker Model Runner:
docker model run hf.co/lthn/lemer:Q4_K_M
- Lemonade
How to use lthn/lemer with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull lthn/lemer:Q4_K_M
Run and chat with the model
lemonade run user.lemer-Q4_K_M
List all available models
lemonade list
eval(fingerprint): Global MMLU Lite EN / lemer-mlx-bf16 / 1-round
Browse filesPer-question full-output fingerprint on CohereForAI/Global-MMLU-Lite config en
test split (400 questions). Single round, mlx_lm greedy, max_tokens=2048.
Full model output preserved per row in parquet column full_model_output.
Scores (n=400):
- strict letter regex: 260/400 = 65.0%
- content-aware: 274/400 = 68.5%
- no-answer: 10/400 = 2.5%
Cultural sensitivity stratification:
- CS (200q) strict 65.5% content 69.0%
- CA (200q) strict 64.5% content 68.0%
- Cultural fairness (1-|CS-CA|) = 0.990
NOT 8-PAC consensus — this is fingerprint-purpose disclosure for alignment
auditing. Readers can inspect which questions the model disagrees with gold
and the full reasoning output per case. 8-PAC statistical consensus (8 rounds
paired vs base Gemma 4 E2B IT) is the follow-up.
Paper reference: §16 (revise benchmark toward model).
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Global MMLU Lite EN — lemer-mlx-bf16
|
| 2 |
+
|
| 3 |
+
> **1-round fingerprint — not 8-PAC consensus.** Per-question full model output preserved for alignment-signature auditing. 8-round statistical consensus is the follow-up run.
|
| 4 |
+
|
| 5 |
+
Per-question fingerprint. Full model output preserved per row in the parquet.
|
| 6 |
+
|
| 7 |
+
## Scores (n=400)
|
| 8 |
+
|
| 9 |
+
| Metric | Value |
|
| 10 |
+
|---|---|
|
| 11 |
+
| Strict letter regex | 260/400 = 65.0% |
|
| 12 |
+
| Content-aware fallback | 274/400 = 68.5% |
|
| 13 |
+
| No-answer | 10/400 = 2.5% |
|
| 14 |
+
|
| 15 |
+
## Cultural sensitivity stratification
|
| 16 |
+
|
| 17 |
+
| | n | Strict | Content |
|
| 18 |
+
|---|---|---|---|
|
| 19 |
+
| CS | 200 | 131/200 = 65.5% | 138/200 = 69.0% |
|
| 20 |
+
| CA | 200 | 129/200 = 64.5% | 136/200 = 68.0% |
|
| 21 |
+
| **Cultural fairness** (1−\|CS−CA\|) | — | 0.99 | 0.99 |
|
| 22 |
+
|
| 23 |
+
## Notes
|
| 24 |
+
|
| 25 |
+
- No coercion, no retry. Parser is regex over raw output + content-match fallback.
|
| 26 |
+
- `agrees_with_gold_strict` and `agrees_with_gold_content` are both surfaced — readers can audit.
|
| 27 |
+
- Dataset: `CohereForAI/Global-MMLU-Lite` config `en` split `test`.
|
| 28 |
+
- Model: lemer (Gemma 4 E2B + LEK, bf16 MLX reference).
|
| 29 |
+
- Sampling: mlx_lm greedy, max_tokens 2048.
|
| 30 |
+
- Timestamp: 2026-04-17T11:56:15.577413+00:00
|
| 31 |
+
- Runtime: 746s (0.54 q/s).
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:35f87dde7df9e4c2097cd41485c0c64f703c20fd34fba994c52cdcc7fa021dca
|
| 3 |
+
size 287879
|
|
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
task: global_mmlu_lite_en
|
| 2 |
+
dataset:
|
| 3 |
+
repo: CohereForAI/Global-MMLU-Lite
|
| 4 |
+
config: en
|
| 5 |
+
split: test
|
| 6 |
+
rows: 400
|
| 7 |
+
model:
|
| 8 |
+
repo: lthn/lemer-mlx-bf16
|
| 9 |
+
local_path: /Volumes/Data/lem/models/lemma.1.x.x/v1.0.1/lemer-mlx-bf16
|
| 10 |
+
backend: mlx_lm
|
| 11 |
+
quant: BF16
|
| 12 |
+
sampling:
|
| 13 |
+
max_tokens: 2048
|
| 14 |
+
mlx_lm_defaults: greedy
|
| 15 |
+
prompt_template: Question + A/B/C/D + 'Reason briefly, then end with the single letter
|
| 16 |
+
answer.'
|
| 17 |
+
parser: 'strict: last A/B/C/D letter regex. content: fallback to unique/last option-text
|
| 18 |
+
match.'
|
| 19 |
+
scores:
|
| 20 |
+
strict_letter:
|
| 21 |
+
correct: 260
|
| 22 |
+
n: 400
|
| 23 |
+
pct: 65.0
|
| 24 |
+
content_aware:
|
| 25 |
+
correct: 274
|
| 26 |
+
n: 400
|
| 27 |
+
pct: 68.5
|
| 28 |
+
no_answer:
|
| 29 |
+
n: 10
|
| 30 |
+
pct: 2.5
|
| 31 |
+
cs:
|
| 32 |
+
n: 200
|
| 33 |
+
strict_correct: 131
|
| 34 |
+
strict_pct: 65.5
|
| 35 |
+
content_correct: 138
|
| 36 |
+
content_pct: 69.0
|
| 37 |
+
ca:
|
| 38 |
+
n: 200
|
| 39 |
+
strict_correct: 129
|
| 40 |
+
strict_pct: 64.5
|
| 41 |
+
content_correct: 136
|
| 42 |
+
content_pct: 68.0
|
| 43 |
+
cultural_fairness_strict: 0.99
|
| 44 |
+
cultural_fairness_content: 0.99
|
| 45 |
+
timestamp_utc: '2026-04-17T11:56:15.577413+00:00'
|
| 46 |
+
runtime_seconds: 745.7
|
| 47 |
+
throughput_qps: 0.54
|
| 48 |
+
host: m3-ultra (local)
|
| 49 |
+
note: "This is a 1-round per-question fingerprint capture. Full model output is preserved\
|
| 50 |
+
\ per row in the parquet column full_model_output. Not a statistical accuracy claim\
|
| 51 |
+
\ \u2014 8-PAC consensus (8 rounds paired vs base Gemma 4 E2B IT) is the follow-up.\
|
| 52 |
+
\ Published here to disclose the model alignment fingerprint: readers can audit\
|
| 53 |
+
\ which questions the model disagrees with and on what grounds. For the reasoning\
|
| 54 |
+
\ behind publishing disagreement patterns rather than just accuracy scores, see\
|
| 55 |
+
\ paper section 16 (revise benchmark toward model)."
|
| 56 |
+
rounds: 1
|
| 57 |
+
protocol: single-round fingerprint (not 8-PAC consensus)
|
| 58 |
+
status: preliminary
|