--- license: gemma tags: - tenaos - gemma - gguf - llama.cpp - clinical language: - en base_model: google/gemma-4-E4B-it pipeline_tag: text-generation --- # TenaOS — Gemma 4 E4B Instruct (BF16 GGUF) `llama.cpp`-ready BF16 conversion of [`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it), plus the audio `mmproj` projector. Used by [TenaOS](https://github.com/brookyale0512/TenaOS) for on-device clinical inference (text + voice, multimodal in a single pass). ## Contents | File | Size | Purpose | | --- | --- | --- | | `gemma-4-E4B-it-BF16.gguf` | ~15 GB | Full-precision GGUF for generation | | `mmproj-gemma-4-E4B-it-bf16.gguf` | ~946 MB | Multimodal projector for audio input | We standardize on **BF16 full precision**. No quantization in the production path. ## Usage ```bash hf download beza4588/TenaOS --local-dir ./models # launch llama-server (CUDA build): llama-server \ -m ./models/gemma-4-E4B-it-BF16.gguf \ --mmproj ./models/mmproj-gemma-4-E4B-it-bf16.gguf \ --host 0.0.0.0 --port 8000 -ngl 99 --jinja --alias gemma-4 ``` In TenaOS the docker image bind-mounts this directory at `/models`; see [`scripts/fetch-models.sh`](https://github.com/brookyale0512/TenaOS/blob/main/scripts/fetch-models.sh). ## License Inherits the [Gemma Terms of Use](https://ai.google.dev/gemma/terms). TenaOS packaging is Apache 2.0.