Instructions to use woojin0412/common with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use woojin0412/common with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="woojin0412/common")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("woojin0412/common")
model = AutoModelForCausalLM.from_pretrained("woojin0412/common")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use woojin0412/common with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "woojin0412/common"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "woojin0412/common",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/woojin0412/common

SGLang

How to use woojin0412/common with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "woojin0412/common" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "woojin0412/common",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "woojin0412/common" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "woojin0412/common",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use woojin0412/common with Docker Model Runner:
```
docker model run hf.co/woojin0412/common
```

common / README.md

woojin0412

Update README.md

a0ba452 verified about 1 year ago

preview code

raw

history blame contribute delete

3.78 kB

	---
	language: ko
	license: apache-2.0
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	datasets: woojin0412/common
	pipeline_tag: text-generation
	tags:
	- llama
	- korean
	- fine-tuning
	- instruction-tuning
	library_name: transformers
	---

	This model is a fine-tuned version of the `meta-llama/Meta-Llama-3.1-8B-Instruct` model, specifically adapted for enhanced Korean text generation and question answering. It has been trained on the `woojin0412/common` dataset.

	## Features

	This model offers the following key features.

	- Korean Text Generation: Optimized for generating coherent and contextually relevant text in the Korean language.
	- Instruction Following: Fine-tuned to understand and respond to user instructions effectively.
	- Question Answering: Capable of providing answers to questions posed in Korean.
	- Efficient Inference: Designed for efficient inference by utilizing 4-bit quantization and float16 precision.

	## How to Get Started with the Model

	1. Import huggingface library and login huggingface

	This code imports the `huggingface_hub` library, which is used to interact with the Hugging Face Hub, a platform for sharing and storing machine learning models, datasets, and tokenizers. The `huggingface_hub.login()` function then prompts the user to log in to the Hugging Face Hub, typically requiring an API key. This login is often necessary for accessing private models or uploading models to the Hub. This initial block handles the authentication process with the Hugging Face Hub.

	```py
	import huggingface_hub
	huggingface_hub.login()
	```

	2. Run korean text generation

	This code segment focuses on loading and utilizing a pre-trained causal language model. It begins by importing necessary libraries from `transformers` and `torch`. It then specifies the model to be loaded from the Hugging Face Hub (`loadModel = "woojin0412/common"`). The code loads the model using `AutoModelForCausalLM.from_pretrained`, optimizing memory usage and setting data types for efficiency. It also loads the corresponding tokenizer. An example Korean input is defined, formatted into a chat-style message, and then converted into a prompt suitable for the model. The model generates a response using the `model.generate` function with specified generation parameters. Finally, the generated tokens are decoded back into a human-readable text response, cleaned up, and printed to the console. This block essentially demonstrates the process of loading a language model, preparing input, generating text, and displaying the output.

	```py
	from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
	import torch

	# Specify the model name
	loadModel = "woojin0412/common"

	# Load the fine-tuned model
	model = AutoModelForCausalLM.from_pretrained(loadModel, low_cpu_mem_usage=True, return_dict=True, torch_dtype=torch.float16, device_map= "auto")
	model.eval() # Set the model to evaluation mode

	# Load the tokenizer
	tokenizer = AutoTokenizer.from_pretrained(loadModel, trust_remote_code=True)
	tokenizer.pad_token = tokenizer.eos_token
	tokenizer.padding_side = "right"

	# Example input
	input_text = "한국어로 '안녕하세요'를 세 번 반복해서 말해줘."

	messages = [{"role": "user", "content": input_text}]

	prompt_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

	inputs = tokenizer(prompt_text, return_tensors="pt").to(model.device)

	outputs = model.generate(
	**inputs,
	max_new_tokens=512,
	do_sample=True,
	temperature=0.1,
	top_p=0.95,
	eos_token_id=tokenizer.eos_token_id
	)

	response = tokenizer.decode(outputs[0], skip_special_tokens=True)

	if "assistant" in response:
	response = response.split("assistant")[-1].strip()

	# Decode and print the output
	print(response.strip())
	```