ihebig1 commited on
Commit
f5b264a
·
verified ·
1 Parent(s): bb6ff2f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - biomedical
5
+ - medical
6
+ - mistral
7
+ - fp8
8
+ - quantization
9
+ - vllm
10
+ - text-generation
11
+ library_name: transformers
12
+ ---
13
+
14
+ # BioMistral-7B-FP8-Dynamic
15
+
16
+ ## Overview
17
+ **BioMistral-7B-FP8-Dynamic** is an **FP8 Dynamic–quantized** version of the **BioMistral-7B** model, designed for high-performance inference while maintaining strong quality on biomedical and medical NLP tasks.
18
+
19
+ This model is primarily intended for deployment with **vLLM** on modern GPUs (Hopper / Ada architectures).
20
+
21
+ ---
22
+
23
+ ## Base Model
24
+ - **Base model**: BioMistral-7B
25
+ - **Architecture**: Mistral-style decoder-only Transformer
26
+ - **Domain**: Biomedical / Medical Natural Language Processing
27
+
28
+ ---
29
+
30
+ ## Quantization
31
+ - **Method**: FP8 Dynamic
32
+ - **Scope**: Linear layers
33
+ - **Objective**: Reduce VRAM usage and improve inference throughput
34
+
35
+ ### Notes
36
+ - The weights are **already quantized**.
37
+ - Do **not** apply additional runtime quantization.
38
+
39
+ ---
40
+
41
+ ## Intended Use
42
+ - Biomedical and medical text generation
43
+ - Medical writing assistance
44
+ - Summarization and analysis of scientific literature
45
+ - Medical RAG pipelines (clinical notes, research papers)
46
+
47
+ ---
48
+
49
+ ## Deployment (vLLM)
50
+
51
+ ### Recommended
52
+ ```bash
53
+ vllm serve ig1/BioMistral-7B-FP8-Dynamic \
54
+ --served-model-name biomistral-7b-fp8 \
55
+ --dtype auto