awras/awras-chat-v0

awras-chat-v0 is a conversational language model built specifically for Algerian Darija, the everyday spoken dialect of Algeria. It is the first chat release from the awras project, which aims to build open Arabic dialect AI for the Algerian community.

The model is built on a Gemma3 4B architecture and fine-tuned in two phases: first to learn the Darija language and its patterns, then to follow instructions and hold conversations.

Due to specific tokenization and generation configurations, it is highly recommended to run this model using the unsloth library to ensure proper ChatML formatting and optimal inference speed.

Installation

Run the following setup to install unsloth and its required dependencies. This works for both local setups and cloud environments like Google Colab.

%%capture
import os, re
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth  # Do this in local & cloud setups
else:
    import torch; v = re.match(r'[\d]{1,}\.[\d]{1,}', str(torch.__version__)).group(0)
    xformers = 'xformers==' + {'2.10':'0.0.34','2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, "0.0.34")
    !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer
    !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth
!pip install transformers==4.56.2
!pip install --no-deps trl==0.22.2

Usage

import unsloth
from unsloth.chat_templates import get_chat_template

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "awras/awras-chat-v0"

# 1. Load tokenizer and force ChatML template
tokenizer = AutoTokenizer.from_pretrained(model_id, token=hf_token)
tokenizer = get_chat_template(tokenizer, chat_template="chatml")

# 2. Load the model on CUDA to prevent accelerate hook conflicts
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="cuda", 
    token=hf_token
)

# 3. Define your prompt
messages = [
    {"role": "user", "content": "كيفاش ندير الحريرة الوهرانية؟"}
]

# 4. Format inputs
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    return_tensors="pt",
    add_generation_prompt=True,
    return_dict=True
).to(model.device)

im_end_id = tokenizer.convert_tokens_to_ids("<|im_end|>")

# Suppress pad_token_id warnings
if tokenizer.pad_token_id is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id

# 5. Generate response
output = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    eos_token_id=im_end_id,
    pad_token_id=tokenizer.pad_token_id
)

# 6. Decode and print only the new generated tokens
input_length = inputs['input_ids'].shape[-1]
generated_tokens = output[0][input_length:]

print(tokenizer.decode(generated_tokens, skip_special_tokens=True))

Developed by: bahaeddine09

Downloads last month
45
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for awras/awras-chat-v0

Finetuned
(1)
this model