|
|
| --- |
| |
| license: apache-2.0 |
| language: |
| - en |
| library_name: transformers |
| pipeline_tag: text-generation |
|
|
| --- |
| |
|  |
|
|
| # QuantFactory/BabyMistral-GGUF |
| This is quantized version of [OEvortex/BabyMistral](https://huggingface.co/OEvortex/BabyMistral) created using llama.cpp |
|
|
| # Original Model Card |
|
|
|
|
| # BabyMistral Model Card |
|
|
| ## Model Overview |
|
|
| **BabyMistral** is a compact yet powerful language model designed for efficient text generation tasks. Built on the Mistral architecture, this model offers impressive performance despite its relatively small size. |
|
|
| ### Key Specifications |
|
|
| - **Parameters:** 1.5 billion |
| - **Training Data:** 1.5 trillion tokens |
| - **Architecture:** Based on Mistral |
| - **Training Duration:** 70 days |
| - **Hardware:** 4x NVIDIA A100 GPUs |
|
|
| ## Model Details |
|
|
| ### Architecture |
|
|
| BabyMistral utilizes the Mistral AI architecture, which is known for its efficiency and performance. The model scales this architecture to 1.5 billion parameters, striking a balance between capability and computational efficiency. |
|
|
| ### Training |
| - **Dataset Size:** 1.5 trillion tokens |
| - **Training Approach:** Trained from scratch |
| - **Hardware:** 4x NVIDIA A100 GPUs |
| - **Duration:** 70 days of continuous training |
|
|
| ### Capabilities |
|
|
| BabyMistral is designed for a wide range of natural language processing tasks, including: |
|
|
| - Text completion and generation |
| - Creative writing assistance |
| - Dialogue systems |
| - Question answering |
| - Language understanding tasks |
|
|
| ## Usage |
|
|
| ### Getting Started |
|
|
| To use BabyMistral with the Hugging Face Transformers library: |
|
|
| ```python |
| import torch |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model = AutoModelForCausalLM.from_pretrained("OEvortex/BabyMistral") |
| tokenizer = AutoTokenizer.from_pretrained("OEvortex/BabyMistral") |
| |
| # Define the chat input |
| chat = [ |
| # { "role": "system", "content": "You are BabyMistral" }, |
| { "role": "user", "content": "Hey there! How are you? ๐" } |
| ] |
| |
| inputs = tokenizer.apply_chat_template( |
| chat, |
| add_generation_prompt=True, |
| return_tensors="pt" |
| ).to(model.device) |
| |
| |
| # Generate text |
| outputs = model.generate( |
| inputs, |
| max_new_tokens=256, |
| do_sample=True, |
| temperature=0.6, |
| top_p=0.9, |
| eos_token_id=tokenizer.eos_token_id, |
| |
| |
| ) |
| |
| response = outputs[0][inputs.shape[-1]:] |
| print(tokenizer.decode(response, skip_special_tokens=True)) |
| |
| #I am doing well! How can I assist you today? ๐ |
| |
| ``` |
|
|
| ### Ethical Considerations |
|
|
| While BabyMistral is a powerful tool, users should be aware of its limitations and potential biases: |
|
|
| - The model may reproduce biases present in its training data |
| - It should not be used as a sole source of factual information |
| - Generated content should be reviewed for accuracy and appropriateness |
|
|
|
|
| ### Limitations |
|
|
| - May struggle with very specialized or technical domains |
| - Lacks real-time knowledge beyond its training data |
| - Potential for generating plausible-sounding but incorrect information |
|
|
|
|
|
|