Safetensors
obalcells commited on
Commit
db581ce
verified
1 Parent(s): 19842a2

Updated README

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md CHANGED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - obalcells/longfact-augmented-annotations
5
+ - obalcells/longfact-annotations
6
+ - obalcells/longfact-augmented-prompts
7
+ ---
8
+ # Hallucination Detection Probes
9
+
10
+ This repository contains hallucination detection probes for various large language models. These probes are trained to detect factual inaccuracies in model outputs.
11
+
12
+ ## Probe Types
13
+
14
+ We provide three types of probes for each model:
15
+
16
+ ### 1. **Linear Probes** (`*_linear`)
17
+ Simple linear classifiers trained on model hidden states to detect hallucinations.
18
+
19
+ ### 2. **LoRA Probes with KL Regularization** (`*_lora_lambda_kl_0_05`)
20
+ LoRA adapters trained with KL divergence regularization (位=0.05) to maintain proximity to the base model while learning to detect hallucinations.
21
+
22
+ ### 3. **LoRA Probes with LM Regularization** (`*_lora_lambda_lm_0_01`)
23
+ LoRA adapters trained with cross-entropy loss regularization (位=0.01) to preserve language modeling capabilities while detecting hallucinations.
24
+
25
+ ## Supported Models
26
+
27
+ - Llama 3.3 70B
28
+ - Llama 3.1 8B
29
+ - Gemma 2 9B
30
+ - Mistral Small 24B
31
+ - Qwen 2.5 7B
32
+
33
+ ## Usage
34
+
35
+ For loading and using these probes, see the reference implementation:
36
+ [probe_loader.py](https://github.com/obalcells/hallucination_probes/blob/main/utils/probe_loader.py)
37
+
38
+ ## Citation
39
+
40
+ If you find this useful in your research, please consider citing:
41
+
42
+ ```bibtex
43
+ @misc{obeso2025realtimedetectionhallucinatedentities,
44
+ title={Real-Time Detection of Hallucinated Entities in Long-Form Generation},
45
+ author={Oscar Obeso and Andy Arditi and Javier Ferrando and Joshua Freeman and Cameron Holmes and Neel Nanda},
46
+ year={2025},
47
+ eprint={2509.03531},
48
+ archivePrefix={arXiv},
49
+ primaryClass={cs.CL},
50
+ url={https://arxiv.org/abs/2509.03531},
51
+ }
52
+ ```