NeuralDaredevil-8B-abliterated-NF4
Permanently quantized to NF4
This is a 4-bit NF4 quantized version of:
→ mlabonne/NeuralDaredevil-8B-abliterated
(fine-tuned & abliteration by mlabonne)
License: Meta Llama 3 Community License
Same as base model – commercial use allowed.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "ikarius/NeuralDaredevil-8B-abliterated-NF4"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16, # Optimal for Llama 3
trust_remote_code=True
)
input_text = "Hva er abliteration?"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=150,
do_sample=True,
temperature=0.7,
top_p=0.9
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Quantization Details
'''
- Original model: mlabonne/NeuralDaredevil-8B-abliterated
- Architecture: Llama 3 8B
- Quantization:
bitsandbytes4-bit NF4 (permanent) - Double quantization: Enabled
- Compute dtype:
bfloat16← recommended for Llama 3 - Estimated VRAM: ~6.3–6.7 GB (CUDA) '''
Credits
Base model & abliteration: @mlabonne NF4 quantization: @ikarius
License Same as base model: llama3
- Downloads last month
- 3
Model tree for ikarius/NeuralDaredevil-8B-abliterated-NF4
Base model
mlabonne/NeuralDaredevil-8B-abliterated