| --- |
| license: apache-2.0 |
| --- |
| --- |
| language: en |
| tags: |
| - text-generation |
| - causal-lm |
| - fine-tuning |
| - unsupervised |
| --- |
|
|
| # Model Name: olabs-ai/reflection_model |
| |
| ## Model Description |
| |
| The `olabs-ai/reflection_model` is a fine-tuned language model based on [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/Meta-Llama-3.1-8B-Instruct). It has been further fine-tuned using LoRA (Low-Rank Adaptation) for improved performance in specific tasks. This model is designed for text generation and can be used for various applications like conversational agents, content creation, and more. |
|
|
| ## Model Details |
|
|
| - **Base Model**: Meta-Llama-3.1-8B-Instruct |
| - **Fine-Tuning Method**: LoRA |
| - **Architecture**: LlamaForCausalLM |
| - **Number of Parameters**: 8 Billion (Base Model) |
| - **Training Data**: [Details about the training data used for fine-tuning, if available] |
|
|
| ## Usage |
|
|
| To use this model, you need to have the `transformers` and `unsloth` libraries installed. You can load the model and tokenizer as follows: |
|
|
| ```python |
| from transformers import AutoConfig, AutoModel, AutoTokenizer |
| from unsloth import FastLanguageModel |
| |
| # Load base model configuration |
| base_model_name = "olabs-ai/Meta-Llama-3.1-8B-Instruct" |
| base_config = AutoConfig.from_pretrained(base_model_name) |
| base_model = AutoModel.from_pretrained(base_model_name, config=base_config) |
| tokenizer = AutoTokenizer.from_pretrained(base_model_name) |
| |
| # Load LoRA adapter |
| adapter_config_path = "path_to_your_adapter_config.json" |
| adapter_weights_path = "path_to_your_adapter_weights" |
| |
| # Use FastLanguageModel to apply LoRA adapter |
| model = FastLanguageModel.from_pretrained( |
| model_name=base_model_name, |
| adapter_weights=adapter_weights_path, |
| config=adapter_config_path |
| ) |
| |
| # Set inference mode for LoRA |
| FastLanguageModel.for_inference(model) |
| |
| # Prepare inputs |
| custom_prompt = "What is a famous tall tower in Paris?" |
| inputs = tokenizer([custom_prompt], return_tensors="pt").to("cuda") |
| |
| from transformers import TextStreamer |
| text_streamer = TextStreamer(tokenizer) |
| |
| # Generate outputs |
| outputs = model.generate(**inputs, streamer=text_streamer, max_new_tokens=1000) |
| |