---
license: gemma
tags:
  - tenaos
  - gemma
  - gguf
  - llama.cpp
  - clinical
language:
  - en
base_model: google/gemma-4-E4B-it
pipeline_tag: text-generation
---

# TenaOS — Gemma 4 E4B Instruct (BF16 GGUF)

`llama.cpp`-ready BF16 conversion of
[`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it),
plus the audio `mmproj` projector. Used by
[TenaOS](https://github.com/brookyale0512/TenaOS) for on-device clinical
inference (text + voice, multimodal in a single pass).

## Contents

| File | Size | Purpose |
| --- | --- | --- |
| `gemma-4-E4B-it-BF16.gguf`        | ~15 GB  | Full-precision GGUF for generation |
| `mmproj-gemma-4-E4B-it-bf16.gguf` | ~946 MB | Multimodal projector for audio input |

We standardize on **BF16 full precision**. No quantization in the
production path.

## Usage

```bash
hf download beza4588/TenaOS --local-dir ./models
# launch llama-server (CUDA build):
llama-server \
    -m ./models/gemma-4-E4B-it-BF16.gguf \
    --mmproj ./models/mmproj-gemma-4-E4B-it-bf16.gguf \
    --host 0.0.0.0 --port 8000 -ngl 99 --jinja --alias gemma-4
```

In TenaOS the docker image bind-mounts this directory at `/models`; see
[`scripts/fetch-models.sh`](https://github.com/brookyale0512/TenaOS/blob/main/scripts/fetch-models.sh).

## License

Inherits the [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
TenaOS packaging is Apache 2.0.