PV-BioMistral-1 / README.md
Vickstester's picture
Upload README.md
2f97eee verified
---
language:
- en
license: cc-by-nc-4.0
tags:
- pharmacovigilance
- medical
- mistral
- qlora
- faers
- drug-safety
- adverse-events
base_model: mistralai/Mistral-7B-Instruct-v0.3
---
# pv-biomistral-7b
A pharmacovigilance-specialised language model fine-tuned from
[Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)
on 100,000 FAERS-derived training examples across five structured PV tasks.
This is the community testing release. It contains only the Q4_K_M quantized
GGUF for local inference via Ollama or llama-cpp-python.
---
## ⚠️ Important Disclaimer
This model is a **research prototype** intended for pharmacovigilance
professionals to evaluate and provide feedback on. It is **not a validated
system** and must not be used for:
- Autonomous pharmacovigilance decision-making
- Generating or contributing to regulatory submissions
- Replacing qualified pharmacovigilance assessor judgment
- Clinical or safety-critical decisions of any kind
All model outputs require review by a qualified pharmacovigilance professional.
This tool is for exploratory and research purposes only.
---
## Model Details
| Property | Value |
|---|---|
| Base model | mistralai/Mistral-7B-Instruct-v0.3 |
| Fine-tuning method | QLoRA (4-bit NF4, LoRA r=16) |
| Training records | 100,000 |
| Training epochs | 3 |
| Data source | FAERS public database (FDA) |
| Quantization | Q4_K_M (GGUF) |
| Model size | 4.37 GB |
| Context window | 8192 tokens |
| Framework | TRL 1.0.0, Transformers, PEFT |
## Setup — Ollama (Recommended)
### Requirements
- [Ollama](https://ollama.com/download) installed
- ~5 GB free disk space
- 8 GB RAM minimum, 16 GB recommended
- GPU optional but recommended for faster inference
### Installation
**Step 1 — Download both files from this repository:**
- `pv-biomistral-7b-Q4_K_M.gguf` (4.37 GB)
- `Modelfile`
Place both in the same folder.
**Step 2 — Create the Ollama model**
```bash
cd /path/to/downloaded/files
ollama create pv-mistral-v2 -f Modelfile
```
**Step 3 — Run**
```bash
ollama run pv-mistral-v2
```
**Windows users:** Use the full path e.g. `cd C:\Users\YourName\Downloads\pv-model\`
---
## Setup — llama-cpp-python (Alternative)
```bash
pip install llama-cpp-python[server]
python -m llama_cpp.server \
--model pv-biomistral-7b-Q4_K_M.gguf \
--chat_format mistral-instruct \
--n_gpu_layers -1 \
--n_ctx 8192
```
Then open `http://localhost:8000/docs` for the Swagger UI.
---
## Setup — Jan App (Windows/Mac)
1. Download [Jan](https://jan.ai)
2. Import Model → select the GGUF file
3. Set temperature to 0.1 in chat settings
4. Add system prompt from the Modelfile SYSTEM field
---
## Expected Performance by Hardware
| Hardware | Speed | Response Time |
|---|---|---|
| Mac Mini M4 / Apple Silicon | 25-35 tokens/sec | 2-5 sec/case |
| Windows + NVIDIA GPU (8GB+ VRAM) | 25-40 tokens/sec | 2-4 sec/case |
| Snapdragon X Elite (16GB) | 8-15 tokens/sec | 5-12 sec/case |
| Windows CPU only (16-24GB RAM) | 3-6 tokens/sec | 15-30 sec/case |
---
## Known Limitations
- **Probable causality underrepresented:** Training data contained only 70 Probable
causality examples out of 100,000 records, reflecting real-world FAERS spontaneous
reporting patterns. The model may default to Possible even for cases with confirmed
positive dechallenge and no confounders.
- **Spontaneous reports only:** Trained exclusively on FAERS spontaneous adverse
event reports. Performance on clinical trial safety data, EHR-derived cases,
or non-English source material is untested.
- **Not formally validated:** The model has not been validated against any regulatory
standard including ICH E2D, ICH E2A, or WHO-UMC guidelines.
- **Short context optimised:** Designed for single-case inputs under 512 tokens.
---
## CIOMS WG XIV Alignment
This model is designed to operate within a Human-in-the-Loop (HITL) framework
consistent with CIOMS Working Group XIV recommendations for AI in drug safety.
All outputs are decision-support signals requiring human adjudication by a
qualified pharmacovigilance professional.
---
## Feedback
This is a community testing release. Please evaluate the model on real cases
from your practice area and share findings. Particular interest in:
- Causality outputs where you would classify Probable
- Cases with unusual drug combinations or rare reactions
- Narrative quality from a safety database entry perspective
- Therapeutic areas where performance appears weaker
---
## Training Data
Trained on 10,000 cases from the FDA Adverse Event Reporting System (FAERS),
accessed via public database export. No proprietary, confidential, or
patient-identifiable data beyond what is publicly available in FAERS was used.
---
## License
Base model (Mistral-7B-Instruct-v0.3): Apache 2.0
Fine-tuned weights: CC BY-NC 4.0 (non-commercial research use only)
By downloading this model you agree to use it for research purposes only
and not for any commercial application or regulatory submission.