|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- biomedical |
|
|
- medical |
|
|
- mistral |
|
|
- fp8 |
|
|
- quantization |
|
|
- vllm |
|
|
- text-generation |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# BioMistral-7B-FP8-Dynamic |
|
|
|
|
|
## Overview |
|
|
**BioMistral-7B-FP8-Dynamic** is an **FP8 Dynamic–quantized** version of the **BioMistral-7B** model, designed for high-performance inference while maintaining strong quality on biomedical and medical NLP tasks. |
|
|
|
|
|
This model is primarily intended for deployment with **vLLM** on modern GPUs (Hopper / Ada architectures). |
|
|
|
|
|
--- |
|
|
|
|
|
## Base Model |
|
|
- **Base model**: BioMistral-7B |
|
|
- **Architecture**: Mistral-style decoder-only Transformer |
|
|
- **Domain**: Biomedical / Medical Natural Language Processing |
|
|
|
|
|
--- |
|
|
|
|
|
## Quantization |
|
|
- **Method**: FP8 Dynamic |
|
|
- **Scope**: Linear layers |
|
|
- **Objective**: Reduce VRAM usage and improve inference throughput |
|
|
|
|
|
### Notes |
|
|
- The weights are **already quantized**. |
|
|
- Do **not** apply additional runtime quantization. |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Use |
|
|
- Biomedical and medical text generation |
|
|
- Medical writing assistance |
|
|
- Summarization and analysis of scientific literature |
|
|
- Medical RAG pipelines (clinical notes, research papers) |
|
|
|
|
|
--- |
|
|
|
|
|
## Deployment (vLLM) |
|
|
|
|
|
### Recommended |
|
|
```bash |
|
|
vllm serve ig1/BioMistral-7B-FP8-Dynamic \ |
|
|
--served-model-name biomistral-7b-fp8 \ |
|
|
--dtype auto |