Text Generation
Transformers
Safetensors
Korean
English
llama
conversational
text-generation-inference
Instructions to use moreh/Llama-3-Motif-102B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use moreh/Llama-3-Motif-102B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="moreh/Llama-3-Motif-102B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("moreh/Llama-3-Motif-102B-Instruct") model = AutoModelForCausalLM.from_pretrained("moreh/Llama-3-Motif-102B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use moreh/Llama-3-Motif-102B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "moreh/Llama-3-Motif-102B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "moreh/Llama-3-Motif-102B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/moreh/Llama-3-Motif-102B-Instruct
- SGLang
How to use moreh/Llama-3-Motif-102B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "moreh/Llama-3-Motif-102B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "moreh/Llama-3-Motif-102B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "moreh/Llama-3-Motif-102B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "moreh/Llama-3-Motif-102B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use moreh/Llama-3-Motif-102B-Instruct with Docker Model Runner:
docker model run hf.co/moreh/Llama-3-Motif-102B-Instruct
Introduction
We introduce Llama-3-Motif, a new language model family of Moreh, specialized in Korean and English.
Llama-3-Motif-102B-Instruct is a chat model tuned from the base model Llama-3-Motif-102B.
Training Platform
- Llama-3-Motif-102B model family is trained on MoAI platform, refer to link for more information.
Quick Usage
You can chat directly with our model Llama-3-Motif through our Model hub.
Details
More details will be provided in the upcoming technical report.
Effective context length is 32k(avg 81) based on RULER benchmark.
Release Date
2024.12.02
Benchmark Results
| Provider | Model | kmmlu_direct score | |
|---|---|---|---|
| Moreh | Llama-3-Motif-102B | 64.74 | + |
| Moreh | Llama-3-Motif-102B-Instruct | 64.81 | + |
| Meta | Llama3-70B-instruct | 54.5* | |
| Meta | Llama3.1-70B-instruct | 52.1* | |
| Meta | Llama3.1-405B-instruct | 65.8* | |
| Alibaba | Qwen2-72B-instruct | 64.1* | |
| OpenAI | GPT-4-0125-preview | 59.95* | |
| OpenAI | GPT-4o-2024-05-13 | 64.11** | |
| gemini pro | 50.18* | ||
| LG | exaone 3.0 | 44.5* | + |
| Naver | HyperCLOVA X | 53.4* | + |
| Upstage | SOLAR-10.7B | 41.65* | + |
* : Community report
** : Measured by Moreh
+ : Claimed to have better capability in Korean
How to use
Use with vLLM
- Refer to this link to install vllm
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
# Change tensor_parallel_size to GPU numbers you can afford
model = LLM("moreh/Motif-102B-Instruct", tensor_parallel_size=4)
tokenizer = AutoTokenizer.from_pretrained("moreh/Llama-3-Motif-102B-Instruct")
messages = [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "μ μΉμμμκ² λΉ
λ±
μ΄λ‘ μ κ°λ
μ μ€λͺ
ν΄λ³΄μΈμ"},
]
messages_batch = [tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)]
# vllm does not support generation_config of hf. So we have to set it like below
sampling_params = SamplingParams(max_tokens=512, temperature=0, repetition_penalty=1.0, stop_token_ids=[tokenizer.eos_token_id])
responses = model.generate(messages_batch, sampling_params=sampling_params)
print(responses[0].outputs[0].text)
Use with transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "moreh/Llama-3-Motif-102B-Instruct"
# all generation configs are set in generation_configs.json
model = AutoModelForCausalLM.from_pretrained(model_id).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "μ μΉμμμκ² λΉ
λ±
μ΄λ‘ μ κ°λ
μ μ€λͺ
ν΄λ³΄μΈμ"},
]
messages_batch = tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
input_ids = tokenizer(messages_batch, padding=True, return_tensors='pt')['input_ids'].cuda()
outputs = model.generate(input_ids)
- Downloads last month
- -
Model tree for moreh/Llama-3-Motif-102B-Instruct
Spaces using moreh/Llama-3-Motif-102B-Instruct 2
Collection including moreh/Llama-3-Motif-102B-Instruct
Collection
2 items β’ Updated β’ 3
