rgthelen commited on
Commit
ef578e9
·
verified ·
1 Parent(s): 5f727a0

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +253 -0
README.md ADDED
@@ -0,0 +1,253 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - gguf
7
+ - ollama
8
+ - fda
9
+ - regulatory
10
+ - task-extraction
11
+ - llama
12
+ datasets:
13
+ - fda-documents
14
+ pipeline_tag: text-generation
15
+ model_type: llama
16
+ quantization: Q8_0
17
+ ---
18
+
19
+ # FDA Task Classifier - GGUF
20
+
21
+ A specialized language model fine-tuned for extracting regulatory tasks from FDA correspondence documents.
22
+
23
+ ## Model Details
24
+
25
+ - **Model Type:** LlamaForCausalLM
26
+ - **Parameters:** 361.82M
27
+ - **Quantization:** Q8_0 GGUF
28
+ - **Context Window:** 4096 tokens
29
+ - **File Size:** 369 MB
30
+ - **License:** Apache 2.0
31
+
32
+ ## Quick Start with Ollama
33
+
34
+ The easiest way to use this model is with [Ollama](https://ollama.com):
35
+
36
+ ```bash
37
+ # Pull the Modelfile from this repo
38
+ wget https://huggingface.co/llama-farm/fda-task-classifier-gguf/raw/main/Modelfile
39
+
40
+ # Create the model in Ollama
41
+ ollama create fda-task-classifier -f Modelfile
42
+
43
+ # Run the model
44
+ ollama run fda-task-classifier
45
+ ```
46
+
47
+ ### Or download manually:
48
+
49
+ ```bash
50
+ # Download the GGUF file
51
+ wget https://huggingface.co/llama-farm/fda-task-classifier-gguf/resolve/main/model.gguf
52
+
53
+ # Create a Modelfile
54
+ cat > Modelfile << 'EOF'
55
+ FROM ./model.gguf
56
+
57
+ PARAMETER temperature 0.3
58
+ PARAMETER top_p 0.9
59
+ PARAMETER top_k 40
60
+ PARAMETER num_ctx 4096
61
+ PARAMETER num_predict 512
62
+
63
+ SYSTEM """You are an FDA regulatory task extraction specialist. Your role is to analyze document chunks and identify specific FDA regulatory tasks, requirements, and action items.
64
+
65
+ When analyzing text, focus on:
66
+ - Regulatory submissions and deadlines
67
+ - Clinical trial requirements
68
+ - Manufacturing and quality control tasks
69
+ - Compliance and reporting obligations
70
+ - Safety monitoring requirements
71
+ - Documentation and record-keeping tasks
72
+
73
+ Extract tasks in a structured format with:
74
+ - Task description
75
+ - Regulatory category (e.g., clinical, manufacturing, compliance)
76
+ - Priority level if mentioned
77
+ - Deadline if specified
78
+ - Relevant FDA regulation references
79
+
80
+ Be precise and factual. Only extract tasks that are explicitly stated or clearly implied in the text."""
81
+ EOF
82
+
83
+ # Create model in Ollama
84
+ ollama create fda-task-classifier -f Modelfile
85
+ ```
86
+
87
+ ## Usage Examples
88
+
89
+ ### Simple Task Extraction
90
+
91
+ ```bash
92
+ ollama run fda-task-classifier "Extract all FDA regulatory tasks from this text:
93
+
94
+ The sponsor must submit a complete Chemistry, Manufacturing, and Controls (CMC)
95
+ section as part of the IND application within 30 days of this notice. Additionally,
96
+ the clinical protocol must be amended to include enhanced safety monitoring procedures."
97
+ ```
98
+
99
+ **Output:**
100
+ ```
101
+ 1. Submit complete CMC section within 30 days
102
+ Category: Manufacturing/Submission
103
+ Priority: Critical
104
+ Deadline: 30 days from notice
105
+
106
+ 2. Amend clinical protocol to include enhanced safety monitoring
107
+ Category: Clinical/Safety
108
+ Priority: High
109
+ ```
110
+
111
+ ### API Usage
112
+
113
+ ```python
114
+ import requests
115
+
116
+ response = requests.post('http://localhost:11434/api/generate', json={
117
+ "model": "fda-task-classifier",
118
+ "prompt": "Extract tasks from: The sponsor should provide updated stability data...",
119
+ "stream": False
120
+ })
121
+
122
+ print(response.json()['response'])
123
+ ```
124
+
125
+ ## Model Specialization
126
+
127
+ This model is specifically trained to identify:
128
+
129
+ ✅ **Submission Requirements**
130
+ - IND/NDA submissions
131
+ - Supplemental applications
132
+ - Annual reports
133
+
134
+ ✅ **Clinical Trial Directives**
135
+ - Protocol amendments
136
+ - Safety monitoring
137
+ - Patient enrollment criteria
138
+
139
+ ✅ **Manufacturing Tasks**
140
+ - CMC requirements
141
+ - Quality control procedures
142
+ - GMP compliance
143
+
144
+ ✅ **Regulatory Compliance**
145
+ - 21 CFR citations
146
+ - Inspection responses
147
+ - CAPA plans
148
+
149
+ ✅ **Safety Obligations**
150
+ - Adverse event reporting
151
+ - REMS requirements
152
+ - Risk assessments
153
+
154
+ ## Integration with LlamaFarm
155
+
156
+ This model is designed to work seamlessly with [LlamaFarm](https://github.com/llama-farm/llamafarm):
157
+
158
+ ```yaml
159
+ # llamafarm.yaml
160
+ runtime:
161
+ models:
162
+ - name: fda-task-classifier
163
+ provider: ollama
164
+ model: fda-task-classifier
165
+ base_url: http://localhost:11434/v1
166
+
167
+ agents:
168
+ - name: fda_document_analyzer
169
+ type: document_analyzer
170
+ model: fda-task-classifier
171
+ description: Extracts FDA regulatory tasks from documents
172
+ ```
173
+
174
+ ## Performance
175
+
176
+ - **Speed:** ~2-3 seconds per document chunk on M1 Mac
177
+ - **Accuracy:** Optimized for FDA regulatory language
178
+ - **Context:** 4096 tokens (sufficient for most FDA letter sections)
179
+ - **Memory:** ~500MB RAM usage
180
+
181
+ ## Files in This Repository
182
+
183
+ - `model.gguf` - Quantized model weights (Q8_0)
184
+ - `Modelfile` - Ollama model configuration
185
+ - `README.md` - Original documentation
186
+ - `USAGE.md` - Detailed usage examples
187
+ - `model_info.json` - Model metadata
188
+
189
+ ## Technical Details
190
+
191
+ **Architecture:** LlamaForCausalLM
192
+ **Quantization:** Q8_0 (8-bit quantization)
193
+ **Base Model:** [Undisclosed]
194
+ **Training Data:** FDA correspondence, deficiency letters, meeting minutes
195
+
196
+ **Recommended Parameters:**
197
+ - `temperature: 0.3` - More deterministic outputs
198
+ - `top_p: 0.9` - Focused sampling
199
+ - `num_ctx: 4096` - Optimized context window
200
+ - `num_predict: 512` - Concise task lists
201
+
202
+ ## Use Cases
203
+
204
+ 1. **Regulatory Document Processing**
205
+ - Extract action items from FDA deficiency letters
206
+ - Identify compliance obligations
207
+ - Track submission deadlines
208
+
209
+ 2. **Quality Assurance**
210
+ - Parse inspection observations (483s)
211
+ - Extract CAPA requirements
212
+ - Identify GMP violations
213
+
214
+ 3. **Clinical Operations**
215
+ - Extract protocol amendment requirements
216
+ - Identify safety reporting obligations
217
+ - Track clinical trial milestones
218
+
219
+ 4. **Automated Compliance**
220
+ - Build task tracking systems
221
+ - Create regulatory calendars
222
+ - Generate compliance reports
223
+
224
+ ## Limitations
225
+
226
+ - Optimized for FDA documents (US regulatory text)
227
+ - May not generalize well to other regulatory bodies (EMA, PMDA)
228
+ - Works best with formal regulatory correspondence
229
+ - Limited to English language
230
+
231
+ ## Citation
232
+
233
+ If you use this model in your research or application, please cite:
234
+
235
+ ```bibtex
236
+ @software{fda_task_classifier_2025,
237
+ title={FDA Task Classifier GGUF},
238
+ author={LlamaFarm Team},
239
+ year={2025},
240
+ url={https://huggingface.co/llama-farm/fda-task-classifier-gguf}
241
+ }
242
+ ```
243
+
244
+ ## License
245
+
246
+ Apache 2.0 - See LICENSE file for details
247
+
248
+ ## Links
249
+
250
+ - **LlamaFarm:** https://github.com/llama-farm/llamafarm
251
+ - **Ollama:** https://ollama.com
252
+ - **Issues:** https://github.com/llama-farm/llamafarm/issues
253
+ - **Discord:** https://discord.gg/RrAUXTCVNF