Instructions to use davidkim205/kgrammar-2-9b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use davidkim205/kgrammar-2-9b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="davidkim205/kgrammar-2-9b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("davidkim205/kgrammar-2-9b") model = AutoModelForCausalLM.from_pretrained("davidkim205/kgrammar-2-9b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use davidkim205/kgrammar-2-9b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "davidkim205/kgrammar-2-9b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "davidkim205/kgrammar-2-9b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/davidkim205/kgrammar-2-9b
- SGLang
How to use davidkim205/kgrammar-2-9b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "davidkim205/kgrammar-2-9b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "davidkim205/kgrammar-2-9b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "davidkim205/kgrammar-2-9b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "davidkim205/kgrammar-2-9b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use davidkim205/kgrammar-2-9b with Docker Model Runner:
docker model run hf.co/davidkim205/kgrammar-2-9b
kgrammar-2-9b
kgrammar-2-9b is a state-of-the-art model designed to evaluate Korean sentences, particularly focusing on identifying instances where the response deviates by using a different language or mixing multiple languages within a sentence. The model is based on the gemma-2-9b architecture and aims to provide accurate assessments for language consistency and clarity in Korean text.
Model Details
- Model Name: kgrammar-2-9b
- Base Model: Google/Gemma-2-9b-it
- Fine-Tuning Techniques: Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO)
Benchmarks and Dataset
keval leverages the custom-built ko-bench dataset, which draws inspiration from MT-Bench but has been tailored specifically for Korean language assessments. This dataset includes tasks spanning a wide range of user scenarios to effectively evaluate key elements like multi-turn conversation ability and instruction adherence.
Usage Application Form
To use this model, please complete the application form and submit it via email [davidkim205@gmail.com].
Access will be granted after your application is reviewed and approved.
We appreciate your cooperation and look forward to assisting you.
1. **Name:**
- (e.g., John Doe)
2. **Date of Birth:**
- (e.g., January 1, 1990)
3. **Affiliation:**
- Are you applying as a company or an individual? [ ] Company [ ] Individual
- Company Name (if applicable):
- Department (if applicable):
4. **Position/Role:**
- (e.g., Data Scientist, Researcher, etc.)
5. **Contact Information:**
- Email:
- Phone Number:
6. **Purpose of Use:**
- (e.g., Research and Development, Commercial use, Educational purposes, etc.)
7. **Detailed Reason for Use:**
- 1. Name and version of the model you wish to use:
- 2. Reason for selecting this model:
- 3. Objectives to achieve using this model:
- 4. Expected use cases (please describe in as much detail as possible):
8. **Data Security and Ethical Use Plan:**
- (Please describe your plans for data protection and ethical use.)
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "davidkim205/kgrammar-2-9b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
# The model is loaded in 4-bit precision for memory efficiency
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True, device_map="auto")
prompt = "ํ๊ตญ์ด ๋ฌธ๋งฅ์ ๋ถ์์ฐ์ค๋ฌ์ด ๋ถ๋ถ์ ์ฐพ์ผ์์ค. ์ค๋ฅ ๋ฌธ์ฅ๊ณผ ๊ฐ์๋ <incorrect grammar> </incorrect grammar> tag, ์ฆ <incorrect grammar> - ์ค๋ฅ ๋ฌธ์ฅ๊ณผ ์ค๋ช
</incorrect grammar> ์์ ๋ด๊ฒจ ์์ผ๋ฉฐ, <wrong count> </wrong count> tag, ์ฆ <wrong count> ์ค๋ฅ ๊ฐ์ </wrong count> ์ด๋ค."
text = "์๋๋ ์ฒซ ๋ฒ์งธ ์์ ์์ libros๋ฅผ ์ฑ
๋น $19.6923077์ ๊ตฌ์
ํ์ต๋๋ค (1280 รท 65). ๊ทธ๋
์ ํ๊ท ์ฑ
๋น ๊ฐ๊ฒฉ์ $18์๊ธฐ ๋๋ฌธ์ ๋ ๋ฒ์งธ ์์ ์์ ์ฑ
์ ์ด $907.5์ ๊ตฌ์
ํ์ต๋๋ค (18 ร 120 - 1280 = 907.5)."
conversation = [
{"role": "system", "content": ""},
{"role": "user", "content": prompt + text}
]
formatted_conversation = tokenizer.apply_chat_template(
conversation, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(formatted_conversation, return_tensors="pt", add_special_tokens=False)
inputs = {key: tensor.to(model.device) for key, tensor in inputs.items()}
with torch.no_grad():
# Generate the output response based on the input tokens
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.7)
print(tokenizer.decode(
outputs[0][inputs['input_ids'].size(1):], skip_special_tokens=True
))
<incorrect grammar>
- "libros"๋ ์คํ์ธ์ด๋ก "์ฑ
"์ ์๋ฏธํ๋ฉฐ, ๋ฌธ๋งฅ์ ํ๊ตญ์ด๋ก ๋์ฒดํ ํ์๊ฐ ์์ต๋๋ค.
</incorrect grammar> <wrong count>1</wrong count>
Evaluation
Diff
The diff refers to the difference between the label scores and predicted scores, represented as a score. The wrong count refers to the number of incorrect answers that do not match the required format, while length represents the total number of test data. Other columns containing numbers indicate the count and percentage of differences between label and predicted scores for each value.
The score is calculated by:
- Calculating the difference between the label and predicted score for each pair.
- Assigning full points for a difference of 0, and half a point for a difference of 1.
- The total score is the sum of all points divided by the number of data points.
| model | wrong | score | length | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | kgrammar-2-9b | 0 (0.0%) | 77.5% | 80 | 52 (65.0%) | 20 (25.0%) | 5 (6.2%) | 1 (1.2%) | 1 (1.2%) | 0 | 1 (1.2%) | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | kgrammar-2-3b | 0 (0.0%) | 74.4% | 80 | 51 (63.7%) | 17 (21.2%) | 8 (10.0%) | 1 (1.2%) | 1 (1.2%) | 0 | 1 (1.2%) | 1 (1.2%) | 0 | 0 | 0 | 0 | 0 |
| 2 | kgrammar-2-1b | 1 (1.2%) | 67.5% | 80 | 44 (55.0%) | 20 (25.0%) | 8 (10.0%) | 2 (2.5%) | 2 (2.5%) | 1 (1.2%) | 0 | 2 (2.5%) | 0 | 0 | 0 | 0 | 0 |
| 3 | gpt-4o | 1 (1.2%) | 56.9% | 80 | 34 (42.5%) | 23 (28.7%) | 14 (17.5%) | 3 (3.8%) | 2 (2.5%) | 2 (2.5%) | 0 | 0 | 0 | 0 | 0 | 0 | 1 (1.2%) |
| 4 | gpt-4o-mini | 0 (0.0%) | 44.4% | 80 | 19 (23.8%) | 33 (41.2%) | 18 (22.5%) | 3 (3.8%) | 1 (1.2%) | 3 (3.8%) | 0 | 1 (1.2%) | 0 | 1 (1.2%) | 1 (1.2%) | 0 | 0 |
Accuracy
The score column represents the ratio of correctly predicted labels to the total number of data points. The wrong column shows the count and percentage of incorrectly formatted answers. The columns labeled "0" through "10" represent the number and percentage of correct predictions for each label, based on how well the model predicted each specific label.
| model | wrong | score | length | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | kgrammar-2-9b | 0 (0.0%) | 65.0% | 80 | 35 (97.2%) | 5 (71.4%) | 7 (50.0%) | 3 (37.5%) | 2 (40.0%) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | kgrammar-2-3b | 0 (0.0%) | 63.7% | 80 | 35 (97.2%) | 2 (28.6%) | 8 (57.1%) | 3 (37.5%) | 2 (40.0%) | 1 (50.0%) | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | kgrammar-2-1b | 1 (1.2%) | 55.0% | 80 | 34 (94.4%) | 3 (42.9%) | 4 (28.6%) | 2 (25.0%) | 0 | 0 | 1 (50.0%) | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | gpt-4o | 1 (1.2%) | 42.5% | 80 | 9 (25.0%) | 6 (85.7%) | 8 (57.1%) | 7 (87.5%) | 1 (20.0%) | 2 (100.0%) | 0 | 1 (100.0%) | 0 | 0 | 0 | 0 | 0 |
| 4 | gpt-4o-mini | 0 (0.0%) | 23.8% | 80 | 1 (2.8%) | 5 (71.4%) | 8 (57.1%) | 5 (62.5%) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Error Detection Accuracy
This accuracy metric evaluates a model's error detection performance by measuring how well it identifies the presence or absence of errors. It differs from conventional accuracy by focusing on correct and incorrect error predictions rather than overall classification correctness.
| model | accuracy | wrong | length | |
|---|---|---|---|---|
| 0 | kgrammar-2-9b | 95.0% | 0 | 80 |
| 1 | kgrammar-2-3b | 93.8% | 0 | 80 |
| 2 | kgrammar-2-1b | 92.5% | 1 | 80 |
| 3 | gpt-4o | 65.0% | 1 | 80 |
| 4 | gpt-4o-mini | 55.0% | 0 | 80 |
- Downloads last month
- -