Instructions to use Neetree/KoLama with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Neetree/KoLama with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Neetree/KoLama")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Neetree/KoLama") model = AutoModelForCausalLM.from_pretrained("Neetree/KoLama") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Neetree/KoLama with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Neetree/KoLama" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Neetree/KoLama", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Neetree/KoLama
- SGLang
How to use Neetree/KoLama with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Neetree/KoLama" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Neetree/KoLama", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Neetree/KoLama" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Neetree/KoLama", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio
How to use Neetree/KoLama with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Neetree/KoLama to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Neetree/KoLama to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Neetree/KoLama to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Neetree/KoLama", max_seq_length=2048, ) - Docker Model Runner
How to use Neetree/KoLama with Docker Model Runner:
docker model run hf.co/Neetree/KoLama
KoLama: Fine-Tuned Llama3.1-8B Model
Overview
KoLama is a fine-tuned version of the Meta-Llama-3.1-8B-bnb-4bit model, developed by Neetree. This model was trained using the Unsloth library, which significantly accelerated the training process, and Huggingface's TRL (Transformer Reinforcement Learning) library. The model is optimized for text generation tasks and is licensed under Apache-2.0.
Model Details
- Base Model: unsloth/Meta-Llama-3.1-8B-bnb-4bit
- Fine-Tuned by: Neetree
- License: Apache-2.0
- Language: English
- Training Dataset: Neetree/raw_enko_opus_CCM
Key Features
- Efficient Training: The model was trained 2x faster using Unsloth, making the fine-tuning process more efficient.
- Text Generation: Optimized for text generation tasks, leveraging the power of the Llama3.1 architecture.
- Reinforcement Learning: Fine-tuned using Huggingface's TRL library, which incorporates reinforcement learning techniques to improve model performance.
Usage
To use KoLama for text generation, you can load the model using the transformers library:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Neetree/KoLama"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
input_text = "Once upon a time"
inputs = tokenizer(input_text, return_tensors="pt")
# Generate text
outputs = model.generate(**inputs, max_length=50)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
Training Details
- Training Speed: 2x faster training using Unsloth.
- Fine-Tuning Method: Supervised Fine-Tuning (SFT) with reinforcement learning via Huggingface's TRL library.
- Dataset: The model was fine-tuned on the Neetree/raw_enko_opus_CCM dataset, which contains English-Korean parallel text data.
License
This model is licensed under the Apache-2.0 license. For more details, please refer to the LICENSE file.
Acknowledgments
- Unsloth: For providing the tools to accelerate the training process.
- Huggingface: For the TRL library and the transformers framework.
- Meta: For the original Llama3.1-8B model.
- Downloads last month
- -
Model tree for Neetree/KoLama
Base model
meta-llama/Llama-3.1-8B