ig1
/

BioMistral-7B-FP8-Dynamic

Text Generation

text-generation-inference

compressed-tensors

Model card Files Files and versions

ihebig1 commited on Dec 16, 2025

Commit

f5b264a

·

verified ·

1 Parent(s): bb6ff2f

Create README.md

Files changed (1) hide show

README.md +55 -0

README.md ADDED Viewed

	@@ -0,0 +1,55 @@

+---
+license: apache-2.0
+tags:
+- biomedical
+- medical
+- mistral
+- fp8
+- quantization
+- vllm
+- text-generation
+library_name: transformers
+---
+# BioMistral-7B-FP8-Dynamic
+## Overview
+**BioMistral-7B-FP8-Dynamic** is an **FP8 Dynamic–quantized** version of the **BioMistral-7B** model, designed for high-performance inference while maintaining strong quality on biomedical and medical NLP tasks.
+This model is primarily intended for deployment with **vLLM** on modern GPUs (Hopper / Ada architectures).
+---
+## Base Model
+- **Base model**: BioMistral-7B
+- **Architecture**: Mistral-style decoder-only Transformer
+- **Domain**: Biomedical / Medical Natural Language Processing
+---
+## Quantization
+- **Method**: FP8 Dynamic
+- **Scope**: Linear layers
+- **Objective**: Reduce VRAM usage and improve inference throughput
+### Notes
+- The weights are **already quantized**.
+- Do **not** apply additional runtime quantization.
+---
+## Intended Use
+- Biomedical and medical text generation
+- Medical writing assistance
+- Summarization and analysis of scientific literature
+- Medical RAG pipelines (clinical notes, research papers)
+---
+## Deployment (vLLM)
+### Recommended
+```bash
+vllm serve ig1/BioMistral-7B-FP8-Dynamic \
+  --served-model-name biomistral-7b-fp8 \
+  --dtype auto