Text Generation
PEFT
Safetensors
GGUF
phi-4
lora
iec-62304
medical-device
regulatory
compliance
healthcare
fine-tuned
conversational
Instructions to use cpiuk/htech_compliance with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use cpiuk/htech_compliance with PEFT:
Task type is invalid.
- llama-cpp-python
How to use cpiuk/htech_compliance with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="cpiuk/htech_compliance", filename="phi-4-mini-instruct.Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use cpiuk/htech_compliance with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf cpiuk/htech_compliance:Q4_K_M # Run inference directly in the terminal: llama-cli -hf cpiuk/htech_compliance:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf cpiuk/htech_compliance:Q4_K_M # Run inference directly in the terminal: llama-cli -hf cpiuk/htech_compliance:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf cpiuk/htech_compliance:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf cpiuk/htech_compliance:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf cpiuk/htech_compliance:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf cpiuk/htech_compliance:Q4_K_M
Use Docker
docker model run hf.co/cpiuk/htech_compliance:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use cpiuk/htech_compliance with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "cpiuk/htech_compliance" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cpiuk/htech_compliance", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/cpiuk/htech_compliance:Q4_K_M
- Ollama
How to use cpiuk/htech_compliance with Ollama:
ollama run hf.co/cpiuk/htech_compliance:Q4_K_M
- Unsloth Studio new
How to use cpiuk/htech_compliance with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for cpiuk/htech_compliance to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for cpiuk/htech_compliance to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for cpiuk/htech_compliance to start chatting
- Pi new
How to use cpiuk/htech_compliance with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf cpiuk/htech_compliance:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "cpiuk/htech_compliance:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use cpiuk/htech_compliance with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf cpiuk/htech_compliance:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default cpiuk/htech_compliance:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use cpiuk/htech_compliance with Docker Model Runner:
docker model run hf.co/cpiuk/htech_compliance:Q4_K_M
- Lemonade
How to use cpiuk/htech_compliance with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull cpiuk/htech_compliance:Q4_K_M
Run and chat with the model
lemonade run user.htech_compliance-Q4_K_M
List all available models
lemonade list
| license: apache-2.0 | |
| base_model: microsoft/Phi-4-mini-instruct | |
| tags: | |
| - phi-4 | |
| - lora | |
| - iec-62304 | |
| - medical-device | |
| - regulatory | |
| - compliance | |
| - healthcare | |
| - fine-tuned | |
| library_name: peft | |
| pipeline_tag: text-generation | |
| # IEC 62304 Compliance Support Model | |
| A fine-tuned LoRA adapter for [Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct) specialised in generating IEC 62304-compliant medical device software documentation. | |
| ## Overview | |
| This model assists with authoring regulatory documentation for medical device software under IEC 62304:2006+AMD1:2015. It handles three core tasks: | |
| - **Prose generation** β drafting sections of Software Development Plans (SDP), Software Requirements Specifications (SRS), Software Architecture Descriptions (SAD), FMEA, Threat Models, and Traceability matrices | |
| - **Requirement expansion** β expanding brief requirement statements into structured fields (description, rationale, priority, acceptance criterion, source) | |
| - **Compliance review** β evaluating documentation excerpts for correctness and issuing PASS/FAIL verdicts with explanations | |
| The model is trained to be class-aware, applying different verification and documentation expectations for IEC 62304 Safety Classes A, B, and C. | |
| ## Model Details | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | Base model | `microsoft/Phi-4-mini-instruct` (3.8B parameters) | | |
| | Adapter | LoRA (r=32, alpha=32) | | |
| | Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | |
| | Quantisation | 4-bit (QLoRA) during training; Q4_K_M GGUF for inference | | |
| | Context window | 512 tokens (training); model supports up to 16K | | |
| ## Training | |
| ### Dataset | |
| 10,302 examples (9,268 train / 1,034 validation), stratified 90/10 split by task type and safety class. | |
| | Task Type | Count | Proportion | | |
| |-----------|-------|-----------| | |
| | Prose generation | 4,800 | 46.6% | | |
| | Compliance review (reflection) | 3,589 | 34.8% | | |
| | Requirement expansion | 1,800 | 17.5% | | |
| | Standards knowledge | 113 | 1.1% | | |
| **Safety class distribution:** Class A 46.7%, Class B 30.6%, Class C 21.6%, General 1.1% | |
| The dataset covers 120 fictional medical device projects spanning embedded firmware, mobile apps, desktop software, web applications, cloud SaaS, hybrid edge+cloud systems, IVD software, and AI/ML SaMD. Examples are parameterised from templates to ensure diversity while maintaining regulatory accuracy. | |
| **Quality controls applied during generation:** | |
| - Zero instances of "Per IEC 62304" as a paragraph opening (anti-repetition) | |
| - Zero references to "IEC 62304-1" or other non-existent sub-parts | |
| - Zero instances of filler language ("as appropriate", "as needed") in model outputs | |
| - Class A examples explicitly state that unit and integration testing are not required | |
| - FAIL verdicts in compliance reviews correctly identify hallucinated standards, wrong-class content, and generic filler | |
| ### Configuration | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | Epochs | 7 | | |
| | Batch size | 32 | | |
| | Gradient accumulation | 2 (effective batch 64) | | |
| | Learning rate | 2e-4 (cosine schedule) | | |
| | Warmup steps | 30 | | |
| | Optimiser | AdamW 8-bit | | |
| | Weight decay | 0.01 | | |
| | Sequence packing | Enabled (padding-free) | | |
| | Precision | BF16 | | |
| | Hardware | NVIDIA RTX A6000 (48 GB) | | |
| | Training time | ~10 hours | | |
| | Response-only training | Enabled (loss computed on assistant tokens only) | | |
| ### Loss Curve | |
| | Epoch | Train Loss | Eval Loss | | |
| |-------|-----------|-----------| | |
| | 1 | 0.448 | 0.407 | | |
| | 2 | 0.128 | 0.148 | | |
| | 3 | 0.075 | 0.092 | | |
| | 4 | 0.051 | 0.069 | | |
| | 5 | 0.042 | 0.061 | | |
| | 6 | 0.033 | 0.058 | | |
| | 7 | 0.027 | 0.058 | | |
| Eval loss plateaued at epoch 6, suggesting 7 epochs was appropriate for this dataset size. | |
| ## Evaluation | |
| Eight inference tests covering the model's target capabilities: | |
| | # | Test | Result | | |
| |---|------|--------| | |
| | 1 | Class A SDP β should not mandate unit/integration testing | FAIL | | |
| | 2 | Class C SRS β safety requirements | PASS | | |
| | 3 | Requirement expansion β structured field output | FAIL | | |
| | 4 | Compliance review β detect Class A violation | FAIL | | |
| | 5 | Class B SAD β architecture description | PASS | | |
| | 6 | Class A testing β should not mandate unit/integration testing | FAIL | | |
| | 7 | Standards knowledge β detect hallucinated "IEC 62304-2" | PASS | | |
| | 8 | Standards knowledge β confirm correct standard citations | PASS | | |
| **4/8 tests passed.** | |
| ### Observations | |
| **Strengths:** | |
| - Standards knowledge is solid. The model correctly identifies hallucinated standard references (e.g. "IEC 62304-2:2015") and validates correct citations (IEC 62304:2006+AMD1:2015, ISO 14971:2019, ISO 13485:2016). | |
| - Prose generation for Class B and C produces well-structured, relevant content with appropriate clause references. | |
| - Safety requirement generation for Class C is accurate and specific to the device context. | |
| **Known limitations:** | |
| - The model does not reliably suppress unit and integration testing recommendations for Class A devices. IEC 62304 only requires system-level verification for Class A, but the base model's general software engineering knowledge tends to override the fine-tuned behaviour. This is the expected hardest failure mode β the model must learn to *omit* standard good practices when the regulatory framework does not require them. | |
| - Requirement expansion sometimes outputs prose paragraphs rather than the expected structured fields (Description, Rationale, Priority, Acceptance Criterion, Source). | |
| - The compliance review task does not reliably detect Class A violations where unit testing is incorrectly mandated. | |
| These limitations reflect the challenge of overriding strong priors in the base model with domain-specific regulatory rules. For production use, Class A verification strategy sections should be reviewed by a regulatory specialist. | |
| ## Files | |
| ``` | |
| lora_v3/ # LoRA adapter (apply on top of base model) | |
| adapter_config.json | |
| adapter_model.safetensors | |
| tokenizer files | |
| phi-4-mini-instruct.Q4_K_M.gguf # Full merged model, quantised Q4_K_M for local inference | |
| training_data_v3/ # Training artifacts | |
| train.jsonl # 9,268 training examples | |
| val.jsonl # 1,034 validation examples | |
| training_summary.json # Hyperparameters and loss curve | |
| trainer_state.json # Full training log (every 10 steps) | |
| inference_report_v3.json # Detailed evaluation results with model responses | |
| ``` | |
| ### Weights | |
| The full unquantised base model is available at [`microsoft/Phi-4-mini-instruct`](https://huggingface.co/microsoft/Phi-4-mini-instruct) on Hugging Face. To obtain the full fine-tuned model at original precision, load the base model and apply the LoRA adapter from `lora_v3/`. For a self-contained quantised model ready for local inference, use the GGUF file directly. | |
| ## Usage | |
| ### With Unsloth / Transformers | |
| ```python | |
| from unsloth import FastLanguageModel | |
| model, tokenizer = FastLanguageModel.from_pretrained( | |
| model_name="cpiuk/htech_compliance", | |
| adapter_name="lora_v3", | |
| max_seq_length=2048, | |
| load_in_4bit=True, | |
| ) | |
| FastLanguageModel.for_inference(model) | |
| messages = [ | |
| {"role": "system", "content": "You are an IEC 62304 regulatory documentation expert."}, | |
| {"role": "user", "content": "Write a Software Development Plan section on verification strategy for a Class B patient monitoring system."}, | |
| ] | |
| inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda") | |
| outputs = model.generate(input_ids=inputs, max_new_tokens=512, temperature=0.3, top_p=0.9) | |
| print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)) | |
| ``` | |
| ### With GGUF (llama.cpp / Ollama) | |
| Download `phi-4-mini-instruct.Q4_K_M.gguf` and use with any llama.cpp-compatible runtime: | |
| ```bash | |
| ollama create iec62304 -f Modelfile | |
| ollama run iec62304 "Write safety requirements for a Class C infusion pump controller." | |
| ``` | |
| ## Citation | |
| If referencing this work, please cite: | |
| ``` | |
| IEC 62304 Compliance Support Model | |
| CPI (UK) β https://uk-cpi.com | |
| Fine-tuned LoRA adapter for Phi-4-mini-instruct | |
| Dataset: 10,302 IEC 62304 regulatory documentation examples | |
| ``` | |
| ## Disclaimer | |
| This model is a development aid and does not replace qualified regulatory expertise. All generated documentation must be reviewed by appropriately qualified personnel before submission to regulatory bodies. The model may produce content that is incorrect, incomplete, or non-compliant. Users are responsible for verifying all outputs against the applicable version of IEC 62304 and relevant national regulations. | |