ig1
/

BioMistral-7B-FP8-Dynamic

Text Generation

text-generation-inference

compressed-tensors

Model card Files Files and versions

BioMistral-7B-FP8-Dynamic / README.md

ihebig1's picture

Create README.md

f5b264a verified 13 days ago

|

history blame contribute delete

1.3 kB

	---
	license: apache-2.0
	tags:
	- biomedical
	- medical
	- mistral
	- fp8
	- quantization
	- vllm
	- text-generation
	library_name: transformers
	---

	# BioMistral-7B-FP8-Dynamic

	## Overview
	BioMistral-7B-FP8-Dynamic is an FP8 Dynamic–quantized version of the BioMistral-7B model, designed for high-performance inference while maintaining strong quality on biomedical and medical NLP tasks.

	This model is primarily intended for deployment with vLLM on modern GPUs (Hopper / Ada architectures).

	---

	## Base Model
	- Base model: BioMistral-7B
	- Architecture: Mistral-style decoder-only Transformer
	- Domain: Biomedical / Medical Natural Language Processing

	---

	## Quantization
	- Method: FP8 Dynamic
	- Scope: Linear layers
	- Objective: Reduce VRAM usage and improve inference throughput

	### Notes
	- The weights are already quantized.
	- Do not apply additional runtime quantization.

	---

	## Intended Use
	- Biomedical and medical text generation
	- Medical writing assistance
	- Summarization and analysis of scientific literature
	- Medical RAG pipelines (clinical notes, research papers)

	---

	## Deployment (vLLM)

	### Recommended
	```bash
	vllm serve ig1/BioMistral-7B-FP8-Dynamic \
	--served-model-name biomistral-7b-fp8 \
	--dtype auto