| # HCUP LLM Humanizer Adapter |
|
|
| ## Model Overview |
| - **Base Model:** mistralai/Mistral-7B-v0.3 |
| - **Adapter Type:** LoRA (QLoRA 4-bit) |
| - **Purpose:** Humanized, non-AI-sounding clinical manuscript generation from HCUP administrative data |
|
|
| ## Training Data |
| - **400 instruction pairs** from HCUP corpus (trinetx, nis, neds, hcup_general) |
| - **50 DPO preference pairs** for humanization style transfer |
| |
| ## Humanization Rules |
| - AVOID: Furthermore, It is important to note, In conclusion, Delve into, Notably |
| - USE: short declarative sentences, clinical connectors (Then, But, So) |
| - TONE: confident, direct, patient-centered clinical framing |
| |
| ## LoRA Config |
| - rank=16, alpha=32 |
| - target_modules: q_proj, v_proj |
| - dropout: 0.05 |
|
|
| ## Usage |
| ```python |
| from peft import PeftModel, AutoModelForCausalLM, AutoTokenizer |
| import torch |
| |
| base = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.3", torch_dtype=torch.float16, device_map="auto") |
| model = PeftModel.from_pretrained(base, "Sharpener9290/hcup-llm-humanizer") |
| tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.3") |
| ``` |
|
|