--- license: llama3.1 language: en base_model: meta-llama/Llama-3.1-8B-Instruct --- # MathBite/self_corrective_llama_3.1_8B_untrained This is a version of `meta-llama/Llama-3.1-8B-Instruct` modified with a custom architecture to support self-correction via hallucination detection. This model, an instance of `SelfCorrectiveLlama`, includes a hallucination detection head that can intervene during generation to insert corrective instructions like `[delete previous sentence]`. ## How to Use Because this model uses a custom architecture with a modified `generate` method, you **must** use `trust_remote_code=True` when loading it. The required `modeling.py` file is included in this repository. ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_name = "MathBite/self_corrective_llama_3.1_8B_untrained" tokenizer = AutoTokenizer.from_pretrained(model_name) # Important: You must trust the remote code model = AutoModelForCausalLM.from_pretrained( model_name, trust_remote_code=True, torch_dtype=torch.bfloat16 # or your preferred dtype ).to("cuda") # move model to GPU # You can now use the model's custom generate method prompt = "YOUR PROMPT HERE" # your prompt here inputs = tokenizer(prompt, return_tensors="pt").to("cuda") # The custom generate method requires the tokenizer instance generated_ids = model.generate( inputs.input_ids, tokenizer=tokenizer, max_new_tokens=100, temperature=0.7 ) generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True) print(generated_text) ``` ## Model Details This model was programmatically converted and uploaded using a deployment script. The custom class `SelfCorrectiveLlama` can be found in the `modeling.py` file. The code in `modeling.py` is licensed under the Apache 2.0 License. The model weights are subject to the original license of the base model.