Text Generation
PEFT
Safetensors
GGUF
English
gemma
gemma-4
lora
unsloth
clinical
wellness
structured-output
json
sft
trl
conversational
Instructions to use Maelstrome/lora-wave-session-r32 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Maelstrome/lora-wave-session-r32 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-4-e2b-it-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "Maelstrome/lora-wave-session-r32") - llama-cpp-python
How to use Maelstrome/lora-wave-session-r32 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Maelstrome/lora-wave-session-r32", filename="gguf/gemma-4-e2b-it-peft.Q4_K_M-00001-of-00005.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Maelstrome/lora-wave-session-r32 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Maelstrome/lora-wave-session-r32:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Maelstrome/lora-wave-session-r32:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Maelstrome/lora-wave-session-r32:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Maelstrome/lora-wave-session-r32:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Maelstrome/lora-wave-session-r32:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Maelstrome/lora-wave-session-r32:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Maelstrome/lora-wave-session-r32:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Maelstrome/lora-wave-session-r32:Q4_K_M
Use Docker
docker model run hf.co/Maelstrome/lora-wave-session-r32:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Maelstrome/lora-wave-session-r32 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Maelstrome/lora-wave-session-r32" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Maelstrome/lora-wave-session-r32", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Maelstrome/lora-wave-session-r32:Q4_K_M
- Ollama
How to use Maelstrome/lora-wave-session-r32 with Ollama:
ollama run hf.co/Maelstrome/lora-wave-session-r32:Q4_K_M
- Unsloth Studio new
How to use Maelstrome/lora-wave-session-r32 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Maelstrome/lora-wave-session-r32 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Maelstrome/lora-wave-session-r32 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Maelstrome/lora-wave-session-r32 to start chatting
- Docker Model Runner
How to use Maelstrome/lora-wave-session-r32 with Docker Model Runner:
docker model run hf.co/Maelstrome/lora-wave-session-r32:Q4_K_M
- Lemonade
How to use Maelstrome/lora-wave-session-r32 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Maelstrome/lora-wave-session-r32:Q4_K_M
Run and chat with the model
lemonade run user.lora-wave-session-r32-Q4_K_M
List all available models
lemonade list
Upload report/README.md with huggingface_hub
Browse files- report/README.md +12 -8
report/README.md
CHANGED
|
@@ -14,9 +14,11 @@ datasets:
|
|
| 14 |
- Maelstrome/lora-wave-session-dataset
|
| 15 |
---
|
| 16 |
|
| 17 |
-
#
|
| 18 |
|
| 19 |
-
|
|
|
|
|
|
|
| 20 |
|
| 21 |
## Documents
|
| 22 |
|
|
@@ -26,13 +28,15 @@ Documentation-only repo. Contains the full training/eval write-up for the **rank
|
|
| 26 |
| [`COMPARISON.md`](./COMPARISON.md) | Head-to-head vs the rank-16 / 3-epoch sibling run (`lora-wave-session`). Same dataset, same seed, same test split. r32 wins on every probability metric. |
|
| 27 |
| [`MORNING_REPORT.md`](./MORNING_REPORT.md) | First-pass overnight summary written immediately after training completed. Preserved for history; superseded by `REPORT.md`. |
|
| 28 |
|
| 29 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
-
|
| 32 |
-
- π» **Merged bf16:** [`Maelstrome/lora-wave-session-r32-merged`](https://huggingface.co/Maelstrome/lora-wave-session-r32-merged) β drop-in for `transformers`/vLLM (~10 GB)
|
| 33 |
-
- π§ **GGUF Q4_K_M:** [`Maelstrome/lora-wave-session-r32-gguf`](https://huggingface.co/Maelstrome/lora-wave-session-r32-gguf) β llama.cpp / Ollama / wllama (~4 GB)
|
| 34 |
-
- π **Dataset:** [`Maelstrome/lora-wave-session-dataset`](https://huggingface.co/datasets/Maelstrome/lora-wave-session-dataset) β 4,277 examples, frozen splits (seed `7`)
|
| 35 |
-
- π **Sibling run (rank-16):** [`Maelstrome/lora-wave-session`](https://huggingface.co/Maelstrome/lora-wave-session) β same dataset, different recipe
|
| 36 |
|
| 37 |
## Headline numbers
|
| 38 |
|
|
|
|
| 14 |
- Maelstrome/lora-wave-session-dataset
|
| 15 |
---
|
| 16 |
|
| 17 |
+
# `report/` β Training & evaluation write-up
|
| 18 |
|
| 19 |
+
The full training/eval documentation for the **rank-32 / 1-epoch A100** WAVE fine-tune of Gemma 4 E2B Instruct, plus the head-to-head comparison against its rank-16 sibling.
|
| 20 |
+
|
| 21 |
+
Originally a standalone repo (`Maelstrome/lora-wave-session-r32-report`); now lives as a subdirectory of the consolidated [`Maelstrome/lora-wave-session-r32`](https://huggingface.co/Maelstrome/lora-wave-session-r32) repo alongside the adapter weights, GGUF, and MediaPipe artifacts.
|
| 22 |
|
| 23 |
## Documents
|
| 24 |
|
|
|
|
| 28 |
| [`COMPARISON.md`](./COMPARISON.md) | Head-to-head vs the rank-16 / 3-epoch sibling run (`lora-wave-session`). Same dataset, same seed, same test split. r32 wins on every probability metric. |
|
| 29 |
| [`MORNING_REPORT.md`](./MORNING_REPORT.md) | First-pass overnight summary written immediately after training completed. Preserved for history; superseded by `REPORT.md`. |
|
| 30 |
|
| 31 |
+
## Sibling artifacts in this repo
|
| 32 |
+
|
| 33 |
+
- π¦₯ **PEFT adapter** β at the [repo root](../) (~194 MB). Pairs with `unsloth/gemma-4-E2B-it`.
|
| 34 |
+
- π§ **GGUF Q4_K_M (5-shard split)** β [`gguf/`](../gguf) (~3.2 GB). For llama.cpp / Ollama / [wllama](https://github.com/ngxson/wllama) browser runtime.
|
| 35 |
+
- π± **MediaPipe LiteRT bundle** β [`mediapipe/`](../mediapipe) (~4.95 GB). For MediaPipe LLM Inference (Android / iOS / web).
|
| 36 |
+
- π **Dataset:** [`Maelstrome/lora-wave-session-dataset`](https://huggingface.co/datasets/Maelstrome/lora-wave-session-dataset) β 4,277 examples, frozen splits (seed `7`).
|
| 37 |
+
- π **Sibling rank-16 run:** [`Maelstrome/lora-wave-session`](https://huggingface.co/Maelstrome/lora-wave-session) β same dataset, different recipe (rank-16 / 3-epoch RTX 5080). Has the same subdir layout.
|
| 38 |
|
| 39 |
+
> The `-merged` and `-gguf` sibling repos referenced in older versions of this doc were consolidated and deleted. The current `gguf/` subdir is a fresh build from a PEFT re-merge (the original unsloth-merged base produced corrupt all-`<pad>` output and is no longer published).
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
## Headline numbers
|
| 42 |
|