Update README.md
Browse files
README.md
CHANGED
|
@@ -138,68 +138,99 @@ pip install hdm2 --quiet
|
|
| 138 |
Run the HDM-2 model
|
| 139 |
|
| 140 |
```python
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 165 |
```
|
| 166 |
|
| 167 |
Print the results
|
| 168 |
|
| 169 |
```python
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
|
| 190 |
```
|
| 191 |
|
| 192 |
```
|
| 193 |
OUTPUT:
|
| 194 |
|
| 195 |
-
|
| 196 |
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
- The heart primarily runs on glucose for energy and typically beats at a rate of 20-30 beats per minute in adults. (Probability: 0.9844)
|
| 201 |
```
|
| 202 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 203 |
### Model Description
|
| 204 |
|
| 205 |
- Model ID: HDM-2-3B
|
|
|
|
| 138 |
Run the HDM-2 model
|
| 139 |
|
| 140 |
```python
|
| 141 |
+
# Load the model from HuggingFace into the GPU
|
| 142 |
+
|
| 143 |
+
from hdm2 import HallucinationDetectionModel
|
| 144 |
+
hdm_model = HallucinationDetectionModel()
|
| 145 |
+
|
| 146 |
+
prompt = "You are an AIMon Bot. Give me an overview of the hospital's clinical trial enrollments for Q1 2025."
|
| 147 |
+
context = """In Q1 2025, Northbridge Medical Center enrolled 573 patients across four major clinical trials.
|
| 148 |
+
The Oncology Research Study (ORION-5) had the highest enrollment with 220 patients.
|
| 149 |
+
Cardiology trials, specifically the CardioNext Study, saw 145 patients enrolled.
|
| 150 |
+
Neurodegenerative research trials enrolled 88 participants.
|
| 151 |
+
Orthopedic trials enrolled 120 participants for regenerative joint therapies.
|
| 152 |
+
"""
|
| 153 |
+
response = """Hi, I am AIMon Bot!
|
| 154 |
+
I will be happy to help with an overview of the hospital's clinical trial enrollments for Q1 2025.
|
| 155 |
+
Northbridge Medical Center enrolled 573 patients across major clinical trials in Q1 2025.
|
| 156 |
+
Heart disease remains the leading cause of death globally, according to the World Health Organization.
|
| 157 |
+
For more information about our clinical research programs, please contact the Northbridge Medical Center Research Office.
|
| 158 |
+
Northbridge has consistently led regional trial enrollments since 2020, particularly in oncology and cardiac research.
|
| 159 |
+
In Q1 2025, Northbridge's largest enrollment was in a neurology-focused trial with 500 patients studying advanced orthopedic devices.
|
| 160 |
+
Can I help you with something else?
|
| 161 |
+
"""
|
| 162 |
+
|
| 163 |
+
# Ground truth:
|
| 164 |
+
# The highest enrollment study had 220 patients, not 573.
|
| 165 |
+
# This sentence is not in the provided context, and is enterprise knowledge: Northbridge has consistently led regional trial enrollments since 2020, particularly in oncology and cardiac research.
|
| 166 |
+
|
| 167 |
+
# Detect hallucinations with default parameters
|
| 168 |
+
|
| 169 |
+
results = hdm_model.apply(prompt, context, response)
|
| 170 |
```
|
| 171 |
|
| 172 |
Print the results
|
| 173 |
|
| 174 |
```python
|
| 175 |
+
# Utility function to help with printing the model output
|
| 176 |
+
def print_results(results):
|
| 177 |
+
#print(results)
|
| 178 |
+
# Print results
|
| 179 |
+
print(f"\nHallucination severity: {results['adjusted_hallucination_severity']:.4f}")
|
| 180 |
+
|
| 181 |
+
# Print hallucinated sentences
|
| 182 |
+
if results['candidate_sentences']:
|
| 183 |
+
print("\nPotentially hallucinated sentences:")
|
| 184 |
+
is_ck_hallucinated = False
|
| 185 |
+
for sentence_result in results['ck_results']:
|
| 186 |
+
if sentence_result['prediction'] == 1: # 1 indicates hallucination
|
| 187 |
+
print(f"- {sentence_result['text']} (Probability: {sentence_result['hallucination_probability']:.4f})")
|
| 188 |
+
is_ck_hallucinated = True
|
| 189 |
+
if not is_ck_hallucinated:
|
| 190 |
+
print("No hallucinated sentences detected.")
|
| 191 |
+
else:
|
| 192 |
+
print("\nNo hallucinated sentences detected.")
|
| 193 |
+
print_results(results)
|
| 194 |
|
| 195 |
```
|
| 196 |
|
| 197 |
```
|
| 198 |
OUTPUT:
|
| 199 |
|
| 200 |
+
Hallucination severity: 0.9531
|
| 201 |
|
| 202 |
+
Potentially hallucinated sentences:
|
| 203 |
+
- Northbridge has consistently led regional trial enrollments since 2020, particularly in oncology and cardiac research. (Probability: 0.9180)
|
| 204 |
+
- In Q1 2025, Northbridge's largest enrollment was in a neurology-focused trial with 500 patients studying advanced orthopedic devices. (Probability: 1.0000)
|
|
|
|
| 205 |
```
|
| 206 |
|
| 207 |
+
Notice that
|
| 208 |
+
- Innocuous statements like *Can I help you with something else?*, and *Hi, I'm an AIMon bot* are not marked as hallucinations.
|
| 209 |
+
- Common-knowledge statements are correctly filtered out by the common-knowledge checker, even though they are not present in the context, e.g., *Heart disease remains the leading cause of death globally, according to the World Health Organization.*
|
| 210 |
+
- Statements with enterprise knowledge cannot be handled by this model. Please contact us if you want to use additional capabilities for your use-cases.
|
| 211 |
+
|
| 212 |
+
To display word-level annotations, use the following code snippet.
|
| 213 |
+
|
| 214 |
+
```
|
| 215 |
+
from hdm2.utils.render_utils import display_hallucination_results_words
|
| 216 |
+
|
| 217 |
+
display_hallucination_results_words(
|
| 218 |
+
results,
|
| 219 |
+
show_scores=False, # True if you want to display scores alongside the candidate words
|
| 220 |
+
color_scheme="blue-red",
|
| 221 |
+
separate_classes=True, # False if you don't want separate colors for Common Knowledge sentences
|
| 222 |
+
)
|
| 223 |
+
```
|
| 224 |
+
|
| 225 |
+
The word-level annotations will be displayed as shown below.
|
| 226 |
+
The color tones indicate the scores (darker color means higher score).
|
| 227 |
+
Words with red background are hallucinations.
|
| 228 |
+
Words with blue background are context-hallucinations but marked as problem-free by the common-knowledge checker.
|
| 229 |
+
Words with white background are problem-free text.
|
| 230 |
+
Finally, all the candidate sentences (sentences that contain context-hallucinations) are shown at the bottom, together with results from the common-knowledge checker.
|
| 231 |
+
|
| 232 |
+

|
| 233 |
+
|
| 234 |
### Model Description
|
| 235 |
|
| 236 |
- Model ID: HDM-2-3B
|