Instructions to use 42MARU/llama-2-ko-7b-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use 42MARU/llama-2-ko-7b-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="42MARU/llama-2-ko-7b-instruct")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("42MARU/llama-2-ko-7b-instruct")
model = AutoModelForCausalLM.from_pretrained("42MARU/llama-2-ko-7b-instruct")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use 42MARU/llama-2-ko-7b-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "42MARU/llama-2-ko-7b-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "42MARU/llama-2-ko-7b-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/42MARU/llama-2-ko-7b-instruct

SGLang

How to use 42MARU/llama-2-ko-7b-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "42MARU/llama-2-ko-7b-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "42MARU/llama-2-ko-7b-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "42MARU/llama-2-ko-7b-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "42MARU/llama-2-ko-7b-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use 42MARU/llama-2-ko-7b-instruct with Docker Model Runner:
```
docker model run hf.co/42MARU/llama-2-ko-7b-instruct
```

llama-2-ko-7b-instruct

Model Details

Developed by: 42MARU
Backbone Model: llama-2-ko-7b
Library: transformers

Used Datasets

Orca-style dataset
KOpen-platypus

Prompt Template

### User:
{User}

### Assistant:
{Assistant}

Intruduce 42MARU

At 42Maru we study QA (Question Answering) and are developing advanced search paradigms that help users spend less time searching by understanding natural language and intention thanks to AI and Deep Learning.
About Us
Contact Us

Downloads last month: 16

llama-2-ko-7b-instruct

Model Details

Used Datasets

Prompt Template

Intruduce 42MARU

License

USE_POLICY

Responsible Use Guide