ihebig1's picture
Create README.md
f5b264a verified
---
license: apache-2.0
tags:
- biomedical
- medical
- mistral
- fp8
- quantization
- vllm
- text-generation
library_name: transformers
---
# BioMistral-7B-FP8-Dynamic
## Overview
**BioMistral-7B-FP8-Dynamic** is an **FP8 Dynamic–quantized** version of the **BioMistral-7B** model, designed for high-performance inference while maintaining strong quality on biomedical and medical NLP tasks.
This model is primarily intended for deployment with **vLLM** on modern GPUs (Hopper / Ada architectures).
---
## Base Model
- **Base model**: BioMistral-7B
- **Architecture**: Mistral-style decoder-only Transformer
- **Domain**: Biomedical / Medical Natural Language Processing
---
## Quantization
- **Method**: FP8 Dynamic
- **Scope**: Linear layers
- **Objective**: Reduce VRAM usage and improve inference throughput
### Notes
- The weights are **already quantized**.
- Do **not** apply additional runtime quantization.
---
## Intended Use
- Biomedical and medical text generation
- Medical writing assistance
- Summarization and analysis of scientific literature
- Medical RAG pipelines (clinical notes, research papers)
---
## Deployment (vLLM)
### Recommended
```bash
vllm serve ig1/BioMistral-7B-FP8-Dynamic \
--served-model-name biomistral-7b-fp8 \
--dtype auto