File size: 1,298 Bytes

f5b264a

---
license: apache-2.0
tags:
- biomedical
- medical
- mistral
- fp8
- quantization
- vllm
- text-generation
library_name: transformers
---

# BioMistral-7B-FP8-Dynamic

## Overview
**BioMistral-7B-FP8-Dynamic** is an **FP8 Dynamic–quantized** version of the **BioMistral-7B** model, designed for high-performance inference while maintaining strong quality on biomedical and medical NLP tasks.

This model is primarily intended for deployment with **vLLM** on modern GPUs (Hopper / Ada architectures).

---

## Base Model
- **Base model**: BioMistral-7B  
- **Architecture**: Mistral-style decoder-only Transformer  
- **Domain**: Biomedical / Medical Natural Language Processing  

---

## Quantization
- **Method**: FP8 Dynamic  
- **Scope**: Linear layers  
- **Objective**: Reduce VRAM usage and improve inference throughput  

### Notes
- The weights are **already quantized**.
- Do **not** apply additional runtime quantization.

---

## Intended Use
- Biomedical and medical text generation  
- Medical writing assistance  
- Summarization and analysis of scientific literature  
- Medical RAG pipelines (clinical notes, research papers)

---

## Deployment (vLLM)

### Recommended
```bash
vllm serve ig1/BioMistral-7B-FP8-Dynamic \
  --served-model-name biomistral-7b-fp8 \
  --dtype auto