--- language: - en license: cc-by-nc-4.0 tags: - pharmacovigilance - medical - mistral - qlora - faers - drug-safety - adverse-events base_model: mistralai/Mistral-7B-Instruct-v0.3 --- # pv-biomistral-7b A pharmacovigilance-specialised language model fine-tuned from [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) on 100,000 FAERS-derived training examples across five structured PV tasks. This is the community testing release. It contains only the Q4_K_M quantized GGUF for local inference via Ollama or llama-cpp-python. --- ## ⚠️ Important Disclaimer This model is a **research prototype** intended for pharmacovigilance professionals to evaluate and provide feedback on. It is **not a validated system** and must not be used for: - Autonomous pharmacovigilance decision-making - Generating or contributing to regulatory submissions - Replacing qualified pharmacovigilance assessor judgment - Clinical or safety-critical decisions of any kind All model outputs require review by a qualified pharmacovigilance professional. This tool is for exploratory and research purposes only. --- ## Model Details | Property | Value | |---|---| | Base model | mistralai/Mistral-7B-Instruct-v0.3 | | Fine-tuning method | QLoRA (4-bit NF4, LoRA r=16) | | Training records | 100,000 | | Training epochs | 3 | | Data source | FAERS public database (FDA) | | Quantization | Q4_K_M (GGUF) | | Model size | 4.37 GB | | Context window | 8192 tokens | | Framework | TRL 1.0.0, Transformers, PEFT | ## Setup — Ollama (Recommended) ### Requirements - [Ollama](https://ollama.com/download) installed - ~5 GB free disk space - 8 GB RAM minimum, 16 GB recommended - GPU optional but recommended for faster inference ### Installation **Step 1 — Download both files from this repository:** - `pv-biomistral-7b-Q4_K_M.gguf` (4.37 GB) - `Modelfile` Place both in the same folder. **Step 2 — Create the Ollama model** ```bash cd /path/to/downloaded/files ollama create pv-mistral-v2 -f Modelfile ``` **Step 3 — Run** ```bash ollama run pv-mistral-v2 ``` **Windows users:** Use the full path e.g. `cd C:\Users\YourName\Downloads\pv-model\` --- ## Setup — llama-cpp-python (Alternative) ```bash pip install llama-cpp-python[server] python -m llama_cpp.server \ --model pv-biomistral-7b-Q4_K_M.gguf \ --chat_format mistral-instruct \ --n_gpu_layers -1 \ --n_ctx 8192 ``` Then open `http://localhost:8000/docs` for the Swagger UI. --- ## Setup — Jan App (Windows/Mac) 1. Download [Jan](https://jan.ai) 2. Import Model → select the GGUF file 3. Set temperature to 0.1 in chat settings 4. Add system prompt from the Modelfile SYSTEM field --- ## Expected Performance by Hardware | Hardware | Speed | Response Time | |---|---|---| | Mac Mini M4 / Apple Silicon | 25-35 tokens/sec | 2-5 sec/case | | Windows + NVIDIA GPU (8GB+ VRAM) | 25-40 tokens/sec | 2-4 sec/case | | Snapdragon X Elite (16GB) | 8-15 tokens/sec | 5-12 sec/case | | Windows CPU only (16-24GB RAM) | 3-6 tokens/sec | 15-30 sec/case | --- ## Known Limitations - **Probable causality underrepresented:** Training data contained only 70 Probable causality examples out of 100,000 records, reflecting real-world FAERS spontaneous reporting patterns. The model may default to Possible even for cases with confirmed positive dechallenge and no confounders. - **Spontaneous reports only:** Trained exclusively on FAERS spontaneous adverse event reports. Performance on clinical trial safety data, EHR-derived cases, or non-English source material is untested. - **Not formally validated:** The model has not been validated against any regulatory standard including ICH E2D, ICH E2A, or WHO-UMC guidelines. - **Short context optimised:** Designed for single-case inputs under 512 tokens. --- ## CIOMS WG XIV Alignment This model is designed to operate within a Human-in-the-Loop (HITL) framework consistent with CIOMS Working Group XIV recommendations for AI in drug safety. All outputs are decision-support signals requiring human adjudication by a qualified pharmacovigilance professional. --- ## Feedback This is a community testing release. Please evaluate the model on real cases from your practice area and share findings. Particular interest in: - Causality outputs where you would classify Probable - Cases with unusual drug combinations or rare reactions - Narrative quality from a safety database entry perspective - Therapeutic areas where performance appears weaker --- ## Training Data Trained on 10,000 cases from the FDA Adverse Event Reporting System (FAERS), accessed via public database export. No proprietary, confidential, or patient-identifiable data beyond what is publicly available in FAERS was used. --- ## License Base model (Mistral-7B-Instruct-v0.3): Apache 2.0 Fine-tuned weights: CC BY-NC 4.0 (non-commercial research use only) By downloading this model you agree to use it for research purposes only and not for any commercial application or regulatory submission.