File size: 1,298 Bytes
f5b264a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
license: apache-2.0
tags:
- biomedical
- medical
- mistral
- fp8
- quantization
- vllm
- text-generation
library_name: transformers
---

# BioMistral-7B-FP8-Dynamic

## Overview
**BioMistral-7B-FP8-Dynamic** is an **FP8 Dynamic–quantized** version of the **BioMistral-7B** model, designed for high-performance inference while maintaining strong quality on biomedical and medical NLP tasks.

This model is primarily intended for deployment with **vLLM** on modern GPUs (Hopper / Ada architectures).

---

## Base Model
- **Base model**: BioMistral-7B  
- **Architecture**: Mistral-style decoder-only Transformer  
- **Domain**: Biomedical / Medical Natural Language Processing  

---

## Quantization
- **Method**: FP8 Dynamic  
- **Scope**: Linear layers  
- **Objective**: Reduce VRAM usage and improve inference throughput  

### Notes
- The weights are **already quantized**.
- Do **not** apply additional runtime quantization.

---

## Intended Use
- Biomedical and medical text generation  
- Medical writing assistance  
- Summarization and analysis of scientific literature  
- Medical RAG pipelines (clinical notes, research papers)

---

## Deployment (vLLM)

### Recommended
```bash
vllm serve ig1/BioMistral-7B-FP8-Dynamic \
  --served-model-name biomistral-7b-fp8 \
  --dtype auto