| | --- |
| | license: afl-3.0 |
| | language: |
| | - kk |
| | base_model: nur-dev/llama-1.9B-kaz |
| | library_name: transformers |
| | --- |
| | |
| | # LLaMA 1.9B Kazakh Instruct Model |
| |
|
| | This repository contains the LLaMA 1.9B model fine-tuned on a Kazakh language dataset for instruction-based tasks. The model is trained to provide helpful, relevant, and context-aware responses to various prompts in Kazakh. It is particularly effective in answering questions, providing explanations, and assisting in educational and professional contexts. |
| | This model comes with an integrated chat template that structures conversations for proper input formatting. The `Tokenizer` supports this feature, allowing for easier interaction by formatting messages before they are passed to the model. |
| |
|
| | The template follows this structure: |
| |
|
| | ```jinja |
| | {%- if messages[0]['role'] == 'system' %} |
| | {%- set offset = 1 %} |
| | {%- else %} |
| | {%- set offset = 0 %} |
| | {%- endif %} |
| | <|begin_of_text|> |
| | {%- for message in messages %} |
| | {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n' + message['content'] | trim + '<|eot_id|>' }} |
| | {%- endfor %} |
| | {{- '<|start_header_id|>' + 'көмекші' + '<|end_header_id|>\n\n' }} |
| | ``` |
| | ## Model Details |
| |
|
| | - **Model Name**: LLaMA 1.9B Kazakh Instruct |
| | - **Model ID**: `nur-dev/llama-1.9B-kaz-instruct` |
| | - **Parameters**: 1.94 billion |
| | - **Architecture**: Causal Language Model (LLaMA) |
| | - **Tokenizer**: LLaMA tokenizer |
| | - **Language**: Kazakh |
| |
|
| | ## Training Data |
| |
|
| | The model was fine-tuned on a dataset containing 22000 samples designed for instruction-based tasks. The dataset includes a diverse set of prompts and responses to help the model learn to handle a wide range of topics, from everyday queries to specialized questions. |
| |
|
| | ## How to Use |
| |
|
| | ### Using the Model Directly for Inference |
| | Using the LlamaForCausalLM and AutoTokenizer classes to load a custom model, format a conversation, and generate a response using various generation parameters like top_k, top_p, and temperature. |
| |
|
| | ```python |
| | from transformers import LlamaForCausalLM, AutoTokenizer |
| | import torch |
| | |
| | # Load the model and tokenizer |
| | model_directory = "nur-dev/llama-1.9B-kaz-instruct" |
| | model = LlamaForCausalLM.from_pretrained(model_directory) |
| | tokenizer = AutoTokenizer.from_pretrained(model_directory) |
| | |
| | # Set the model to evaluation mode and move to appropriate device |
| | model.eval() |
| | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
| | model.to(device) |
| | |
| | # Example input in Kazakh |
| | |
| | # Conversation history |
| | conversation_history = [ |
| | {"role": "system", "content": "Сіз сұрақтарға жауап беріп, ақпарат ұсынатын сенімді AI көмекшісісіз."}, |
| | {"role": "пайдаланушы", "content": "Жасанды интеллект денсаулық сақтау саласына қандай өзгерістер енгізе алады?"} |
| | ] |
| | |
| | # Format conversation using the chat template (custom method) |
| | formatted_conversation = tokenizer.apply_chat_template(conversation_history, tokenize=False) |
| | |
| | # Tokenize input |
| | input_ids = tokenizer.encode(formatted_conversation, return_tensors="pt").to(device) |
| | |
| | # Generate a response from the model |
| | with torch.no_grad(): |
| | output = model.generate( |
| | input_ids, |
| | max_length=1000, |
| | num_return_sequences=1, |
| | pad_token_id=tokenizer.eos_token_id, |
| | no_repeat_ngram_size=2, |
| | early_stopping=True, |
| | do_sample=True, |
| | top_k=10, |
| | top_p=0.5, |
| | eos_token_id=tokenizer.eos_token_id, |
| | temperature=1.3 |
| | ) |
| | |
| | # Decode and print the model's response |
| | response = tokenizer.decode(output[0], skip_special_tokens=False) |
| | print(response) |
| | ``` |
| | ### Using the Pipeline for Text Generation |
| | Using the pipeline API, which abstracts much of the setup, allowing you to generate responses with less boilerplate. The assistant responds in a “pirate” style to a user query. |
| |
|
| | ```python |
| | from transformers import pipeline |
| | |
| | # Initialize the text generation pipeline |
| | pipe = pipeline("text-generation", model="nur-dev/llama-1.9B-kaz-instruct") |
| | |
| | # Define the conversation messages |
| | messages = [ |
| | {"role": "system", "content": "Сіз сұрақтарға жауап беріп, ақпарат ұсынатын сенімді AI көмекшісісіз."}, |
| | {"role": "пайдаланушы", "content": "Жасанды интеллект денсаулық сақтау саласына қандай өзгерістер енгізе алады?"} |
| | ] |
| | |
| | response = pipe(messages, max_new_tokens=128)[0]['generated_text'] |
| | |
| | print(response) |
| | ``` |
| |
|
| |
|
| | @misc {nurgali_kadyrbek_2024, |
| | author = { {NURGALI Kadyrbek} }, |
| | title = { llama-1.9B-kaz-instruct (Revision 4059a4e) }, |
| | year = 2024, |
| | url = { https://huggingface.co/nur-dev/llama-1.9B-kaz-instruct }, |
| | doi = { 10.57967/hf/3114 }, |
| | publisher = { Hugging Face } |
| | } |