| --- |
| license: apache-2.0 |
| tags: |
| - Drenel |
| - Hippo |
| - LLM |
| - MultiLingual |
| - Drenel/Hippo-6B |
| base_model: |
| - Drenel/Hippo-6B |
| - Drenel/Hippo-6B |
| library_name: transformers |
| --- |
| |
| ## Model Details |
|
|
| Hippo-6B is a cutting-edge, transformer-based language model designed to provide state-of-the-art performance across a wide range of natural language processing tasks. With 6.2 billion parameters, Hippo-6B strikes a balance between computational efficiency and high performance, making it a versatile model for various applications. |
|
|
| **Context Length:** Supports up to 4K context length |
|
|
| **Publisher:** Drenel |
|
|
| **Paper:** [Model Paper](https://huggingface.co/Drenel/Hippo-6B/blob/main/Hippo-6B_%20A-6.2B-Parameter-Language-Model-with-Efficient-Attention-and-Mixture-of-Experts.pdf) |
|
|
| ## Key Features and Technologies |
|
|
| ### 1. Efficient Attention Mechanism |
|
|
| - **Flash Attention:** Hippo-6B leverages flash attention techniques, including flash attention functions (`flash_attn_func` and `flash_attn_varlen_func`), to efficiently compute attention scores. This reduces the computational overhead and memory usage, enabling the model to handle longer context lengths without performance degradation. |
| - **Support for Window Size:** The model includes conditional support for attention windows, allowing for flexible and scalable attention mechanisms based on the available hardware and task requirements. |
|
|
| ### 2. Rotary Embeddings |
|
|
| - **Rotary Position Embeddings:** Hippo-6B employs rotary position embeddings (`RotaryEmbedding`) to encode positional information in a more continuous and differentiable manner, enhancing the model's ability to capture long-range dependencies. |
| - **Scaled Rotary Embeddings:** Variations such as `SuScaledRotaryEmbedding` and `YarnScaledRotaryEmbedding` adapt the rotary embeddings to different scaling factors, providing finer control over the embedding space. |
|
|
| ### 3. RMS Norm |
|
|
| - **RMS Normalization:** The model utilizes Root Mean Square (RMS) normalization layers (`RMSNorm`) to stabilize training and improve convergence. RMS normalization helps in maintaining consistent gradient flow across layers, leading to more efficient training dynamics. |
|
|
| ### 4. Modular and Scalable Design |
|
|
| - **Modular Attention Classes:** Hippo-6B features a modular design with different attention classes (`Attention`, `FlashAttention2`, `SdpaAttention`). This modularity allows easy customization and scalability of the attention mechanisms based on specific use cases. |
| - **MLP Layers:** The model incorporates Multi-Layer Perceptron (MLP) layers with gating mechanisms to enhance the model's expressive power. The `MLP` class includes techniques such as expert gating and intermediate projections for more sophisticated representations. |
|
|
| ### 5. Caching and Memory Efficiency |
|
|
| - **Dynamic Caching:** The model supports dynamic caching strategies (`Cache`, `DynamicCache`) to optimize memory usage during inference, allowing for faster and more efficient processing of long sequences. |
|
|
| ### 6. Loss Functions |
|
|
| - **Cross-Entropy Loss:** The model uses Cross-Entropy Loss for classification tasks, ensuring accurate and efficient learning of categorical distributions. |
| - **Mean Squared Error (MSE) Loss:** For regression tasks, MSE Loss is employed to minimize the difference between predicted and actual values, providing robust performance in continuous prediction tasks. |
|
|
| ## Usage |
|
|
| Hippo-6B can be used for a variety of NLP tasks, including but not limited to: |
|
|
| - Text Generation |
| - Language Translation |
| - Sentiment Analysis |
| - Named Entity Recognition |
| - Text Classification |
|
|
| ### Chat Format |
|
|
| You can provide the prompt as a question with a generic template as follow: |
| ```markdown |
| <|user|>\nQuestion<|end|>\n<|assistant|> |
| ``` |
|
|
| ## Example |
|
|
| Here is a quick example of how to use Hippo-6B for text generation: |
|
|
| ```python |
| # Libraries installation |
| # pip install -q transformers accelerate flash-attn |
| |
| import torch |
| from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline |
| |
| torch.random.manual_seed(0) |
| modelName = "Drenel/Hippo-6B" |
| |
| model = AutoModelForCausalLM.from_pretrained(modelName, device_map="cuda",torch_dtype="auto",trust_remote_code=True) |
| tokenizer = AutoTokenizer.from_pretrained(modelName) |
| |
| messages = [ |
| {"role": "user", "content": "What is the capital of France? <|end|><|assistant|>"}, |
| ] |
| |
| pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) |
| generation_args = {"max_new_tokens": 50, "return_full_text": False, "temperature": 0.7, "do_sample": False, "top_k": 50, "top_p": 0.95} |
| output = pipe(messages, **generation_args) |
| print(output[0]['generated_text']) |
| ``` |
|
|
|
|
| ## License |
|
|
| Hippo-6B is distributed under the Apache-2.0. |