|
|
--- |
|
|
license: llama3.1 |
|
|
language: en |
|
|
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct |
|
|
--- |
|
|
|
|
|
# Self-Corrective Llama 3.1 8B |
|
|
|
|
|
This is a fine-tuned version of `meta-llama/Meta-Llama-3.1-8B-Instruct`, augmented with a novel self-correction mechanism designed to mitigate hallucinations. The LoRA adapter has been merged into the base model for easy deployment. |
|
|
|
|
|
This model features a custom **hallucination detection head** that works in parallel with the main language model. When it detects a potential error in its own generated text, it can insert special instructions like `[rewrite sentence]` or `[rewrite response]` into the output, effectively flagging its own mistakes for correction. This makes the model more reliable for tasks requiring factual accuracy. |
|
|
|
|
|
## How it Works |
|
|
|
|
|
The model, an instance of the custom `SelfCorrectiveLlama` class, adds a small, efficient hallucination detection module to the standard Llama architecture. This module analyzes the model's internal states (hidden states) at each generation step to predict the likelihood of a hallucination. |
|
|
|
|
|
The model's custom `generate` method then uses these predictions. If a hallucination is likely, it overrides the standard token generation process to insert a corrective instruction. This entire process happens in a single forward pass, making it significantly more efficient than multi-step, agent-based correction pipelines that require multiple LLM calls. |
|
|
|
|
|
## Intended Use & Prompting |
|
|
|
|
|
This model is intended for tasks where factual accuracy and faithfulness to a source context are critical, such as question answering or summarization. |
|
|
|
|
|
While it can be used with standard prompts, its self-correction behavior was reinforced during training using a specific instruction. To achieve the best results and fully leverage the self-correction mechanism, you should include the following note in your system prompt or at the beginning of your input: |
|
|
|
|
|
<br> |
|
|
|
|
|
> **Note on Self-Correction**: As you generate your response, you may encounter an automated instruction. This indicates a potential error was detected. |
|
|
> - If you see the instruction `[rewrite sentence]`, it means the preceding sentence is incorrect. You must immediately provide a new, corrected version of that sentence. |
|
|
> - If you see the instruction `[rewrite response]`, it means the entire preceding response is incorrect. You must immediately provide a new, complete response from the beginning. |
|
|
|
|
|
<br> |
|
|
|
|
|
## How to Use |
|
|
|
|
|
Because this model uses a custom architecture with a modified `generate` method, you **must** use `trust_remote_code=True` when loading it. The required `modeling.py` file is included in this repository. |
|
|
|
|
|
**Note:** The custom `generate` method currently only supports a **batch size of 1**. |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
model_name = "MathBite/self_corrective_llama_3.1_8B" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
|
|
# Important: You must trust the remote code |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
trust_remote_code=True, |
|
|
torch_dtype=torch.bfloat16 # or your preferred dtype |
|
|
).to("cuda") # move model to GPU |
|
|
|
|
|
# Example prompt with the self-correction instruction |
|
|
prompt = """ |
|
|
... |
|
|
Note on Self-Correction: As you generate your response, you may encounter an automated instruction. This indicates a potential error was detected. |
|
|
- If you see the instruction `[rewrite sentence]`, it means the preceding sentence is incorrect. You must immediately provide a new, corrected version of that sentence. |
|
|
- If you see the instruction `[rewrite response]`, it means the entire preceding response is incorrect. You must immediately provide a new, complete response from the beginning. |
|
|
|
|
|
--- |
|
|
|
|
|
Context: The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France. It is named after the engineer Gustave Eiffel, whose company designed and built the tower. |
|
|
|
|
|
Question: Who was the first person to climb the Eiffel Tower? |
|
|
""" |
|
|
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
|
|
|
|
|
# The custom generate method requires the tokenizer instance |
|
|
generated_ids = model.generate( |
|
|
inputs.input_ids, |
|
|
tokenizer=tokenizer, |
|
|
max_new_tokens=100, |
|
|
temperature=0.7 |
|
|
) |
|
|
|
|
|
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True) |
|
|
print(generated_text) |
|
|
``` |
|
|
|
|
|
## Model Details |
|
|
|
|
|
This model was programmatically merged and uploaded using a deployment script. The custom class `SelfCorrectiveLlama` can be found in the `modeling.py` file included in this repository. |
|
|
|
|
|
The code in `modeling.py` is licensed under the Apache 2.0 License. The model weights are subject to the original license of the base model. |