monsoon-nlp
/

codellama-abliterated

Text Generation

text-generation-inference

Model card Files Files and versions

codellama-abliterated / README.md

monsoon-nlp's picture

link to 2xd

7901461 verified over 1 year ago

|

history blame contribute delete

1.63 kB

	---
	license: llama2
	base_model: codellama/CodeLlama-7b-Instruct-hf
	language:
	- en
	tags:
	- arxiv:2406.11717
	---

	# codellama-abliterated

	CodeLlama-7b-Instruct-hf adapted using the abliteration notebook from [Maxime Labonne's LLM Course](https://github.com/mlabonne/llm-course)

	Based on the paper ["Refusal in Language Models Is Mediated by a Single Direction"](https://arxiv.org/abs/2406.11717)

	Based on CodeLlama/Llama2 and subject to the restrictions of that model and license - not for unapproved uses:

	## Concept

	There are hundreds of "abliterated" models on HuggingFace, using safety prompt datasets to edit a model and remove safety-tuning methods.

	None of these abliterated models have explored code LLMs, code-generation, and CyberSecEval. I don't know a lot about how well these will
	work, but this is a first step.

	Blog: https://huggingface.co/blog/monsoon-nlp/refusal-in-code-llms

	Model with 2x intervention: https://huggingface.co/monsoon-nlp/codellama-abliterated-2xd

	## Usage

	```python
	! pip install transformers accelerate --quiet
	from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer, AutoConfig

	tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-7b-Instruct-hf")
	model = AutoModelForCausalLM.from_pretrained("monsoon-nlp/codellama-abliterated", device_map="auto")

	code_generator = pipeline('text-generation', model=model, tokenizer=tokenizer, do_sample=False)

	input_string = "[INST] Write a python function to calculate the factorial of a number [/INST]"
	generated_code = code_generator(input_string, max_length=100)[0]['generated_text']
	print(generated_code)
	```