File size: 1,903 Bytes
6807253
 
 
 
 
 
 
 
 
 
85f6847
6807253
 
 
85f6847
6807253
 
 
85f6847
6807253
 
 
 
 
 
 
85f6847
 
 
 
 
 
 
 
 
 
 
 
 
 
6807253
 
85f6847
 
6807253
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
license: llama3.1
language: en
base_model: meta-llama/Llama-3.1-8B-Instruct
---

# MathBite/self_corrective_llama_3.1_8B_untrained

This is a version of `meta-llama/Llama-3.1-8B-Instruct` modified with a custom architecture to support self-correction via hallucination detection.

This model, an instance of `SelfCorrectiveLlama`, includes a hallucination detection head that can intervene during generation to insert corrective instructions like `[delete previous sentence]`.

## How to Use

Because this model uses a custom architecture with a modified `generate` method, you **must** use `trust_remote_code=True` when loading it. The required `modeling.py` file is included in this repository.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "MathBite/self_corrective_llama_3.1_8B_untrained"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Important: You must trust the remote code
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16 # or your preferred dtype
).to("cuda") # move model to GPU

# You can now use the model's custom generate method
prompt = "YOUR PROMPT HERE" # your prompt here
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

# The custom generate method requires the tokenizer instance
generated_ids = model.generate(
    inputs.input_ids,
    tokenizer=tokenizer,
    max_new_tokens=100,
    temperature=0.7
)

generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(generated_text)
```

## Model Details

This model was programmatically converted and uploaded using a deployment script. The custom class `SelfCorrectiveLlama` can be found in the `modeling.py` file.

The code in `modeling.py` is licensed under the Apache 2.0 License. The model weights are subject to the original license of the base model.