Instructions to use 2beone/mistral_edit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use 2beone/mistral_edit with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="2beone/mistral_edit") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("2beone/mistral_edit") model = AutoModelForCausalLM.from_pretrained("2beone/mistral_edit") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use 2beone/mistral_edit with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "2beone/mistral_edit" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "2beone/mistral_edit", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/2beone/mistral_edit
- SGLang
How to use 2beone/mistral_edit with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "2beone/mistral_edit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "2beone/mistral_edit", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "2beone/mistral_edit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "2beone/mistral_edit", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use 2beone/mistral_edit with Docker Model Runner:
docker model run hf.co/2beone/mistral_edit
komt : korean multi task instruction tuning model
Recently, due to the success of ChatGPT, numerous large language models have emerged in an attempt to catch up with ChatGPT's capabilities. However, when it comes to Korean language performance, it has been observed that many models still struggle to provide accurate answers or generate Korean text effectively. This study addresses these challenges by introducing a multi-task instruction technique that leverages supervised datasets from various tasks to create training data for Large Language Models (LLMs).
Model Details
- Model Developers : davidkim(changyeon kim)
- Repository : https://github.com/davidkim205/komt
- Model Architecture : The komt-mistral-7b-v1 is is a fine-tuned version of the Mistral-7B-Instruct-v0.1.
Dataset
korean multi-task instruction dataset
Hardware and Software
- nvidia driver : 535.54.03
- CUDA Version: 12.2
Training
Refer https://github.com/davidkim205/komt
Prompt template: Mistral
<s>[INST] {prompt} [/INST]</s>
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import TextStreamer, GenerationConfig
model_name='davidkim205/komt-mistral-7b-v1'
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
streamer = TextStreamer(tokenizer)
def gen(x):
generation_config = GenerationConfig(
temperature=0.8,
top_p=0.8,
top_k=100,
max_new_tokens=1024,
early_stopping=True,
do_sample=True,
)
q = f"[INST]{x} [/INST]"
gened = model.generate(
**tokenizer(
q,
return_tensors='pt',
return_token_type_ids=False
).to('cuda'),
generation_config=generation_config,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id,
streamer=streamer,
)
result_str = tokenizer.decode(gened[0])
start_tag = f"\n\n### Response: "
start_index = result_str.find(start_tag)
if start_index != -1:
result_str = result_str[start_index + len(start_tag):].strip()
return result_str
print(gen('μ μ£Όλλ₯Ό 1λ°2μΌλ‘ νΌμ μ¬ννλ €κ³ νλλ° μ¬ν μ½μ€λ₯Ό λ§λ€μ΄μ€'))
output
<s> [INST]μ μ£Όλλ₯Ό 1λ°2μΌλ‘ νΌμ μ¬ννλ €κ³ νλλ° μ¬ν μ½μ€λ₯Ό λ§λ€μ΄μ€ [/INST]1λ° 2μΌ μ¬ν μ½μ€
μ μ£Ό μ¬ν μ½μ€λ λ€μκ³Ό κ°μ΅λλ€:
첫째 λ :
* μμΉ¨: μ μ£Όμ μ λΌλ‘ μ΄λνμ¬ νλΌμ° κ΅λ¦½κ³΅μμ λμ°©ν©λλ€. μ€λ₯Έμͺ½ λ°μκ΅ κ΄μ₯μμ λμ λλ λ§₯μ£Όμμ μ°Ύμ ν λ³μ λ§₯μ£Όλ₯Ό λ§μκ³ κ³λ¨μΌλ‘ μ¬λΌκ° νλΌμ° μ μμ λμ°©ν©λλ€.
* μ μ¬: μ μμμ κ³λ¨μ λ΄λ €μ μ€λμ μμΉν 골λͺ©μμ μμμ μ¦κΉλλ€. μ κ°, κ³°ν, μ 볡 λ± λ§μλ μ μ£Ό νΉμ°λ¬Όμ λ§λ³΄λ©° νλ§μ μ¦κΉλλ€.
* μ€ν: μ€λμμ λ¨ννμ¬ λ¨μμͺ½ μ μ£Όλ‘ μ΄λν©λλ€. λ¨μμͺ½ μ μ£Όλ μ¬μ μμ° κ²½κ΄κ³Ό λ°±μ¬μ₯, μλ¦λ€μ΄ λ°λ€ λ± μλ¦λ€μ΄ νκ΄μ΄ μλ κ³³μ
λλ€. μμμ μλ°ν©λλ€.
λμ§Έ λ :
* μμΉ¨: λ¨μμͺ½ μ μ£Όμμ λΆμμͺ½ μ μ£Όλ‘ μ΄λν©λλ€. μ΄ μ§μμ νΈλ₯Έ μλ ν΄λ³κ³Ό ν¬λͺ
ν λ°λ€κ° νΌμ³μ Έ μλ μλ¦λ€μ΄ νκ΄μ
λλ€. μμμμ μμΉ¨μ λ¨Ήκ³ λ°λ€λ‘ ν₯νμ¬ ν΄λ³μμ ν΄μμ μ·¨ν©λλ€.
* μ μ¬: λΆμμͺ½ μ μ£Όμ μλ°λ€μμ μμν λ°λ€λ₯Ό 보며 ν λΌμ ν΄μ°λ¬Όμ λ§λ³΄κ³ κ³μ μ λ°λΌ ν΄μ°λ¬Ό μ리λ₯Ό μ¦κΉλλ€.
* μ€ν: λ¨μμͺ½ μ μ£Όμμ μ΄λνμ¬ μμͺ½ μ μ£Όλ‘ μ΄λν©λλ€. μμͺ½ μ μ£Όλ μ λ²½κ³Ό μμ ν΄λ³, μμκ»λΌμ λ μ μ λ± λ
νΉν κ²½κ΄μ΄ μλ κ³³μ
λλ€. μ΄κ³³μμλ μμͺ½ μ μ£Όμ λνμ μΈ λͺ
μμΈ μ²λμ¬λ₯Ό λ°©λ¬Ένκ³ μμͺ½ μ μ£Όμ μλ¦λ€μ΄ νκ΄μ κ°μν©λλ€.
* μ λ
: μμͺ½ μ μ£Όμμ μ μ£Ό μλ‘ μ΄λνμ¬ ν λΌμ μ μ£Ό νΉμ°λ¬Όμ λ§λ³΄κ³ λμ°©ν μ μ£Ό λμ¬μμ μ λ
μ μ¦κΉλλ€.
* μΌκ°: μ μ£Ό μμ λμ¬μμ μΌκ° νλμ μ¦κΈ°λ©° 1λ° 2μΌμ μ¬νμ λ§λ¬΄λ¦¬ν©λλ€.
μ΄λ κ² μ μ£Όλ₯Ό 1λ° 2μΌλ‘ νΌμ μ¬ννλ©΄ μ μ£Όμ μλ¦λ€μ΄ νκ΄, νΈλ₯Έ μλ ν΄λ³, ν¬λͺ
ν λ°λ€ λ±μ κ²½νν μ μμ΅λλ€.
Evaluation
For objective model evaluation, we initially used EleutherAI's lm-evaluation-harness but obtained unsatisfactory results. Consequently, we conducted evaluations using ChatGPT, a widely used model, as described in Self-Alignment with Instruction Backtranslation and Three Ways of Using Large Language Models to Evaluate Chat .
| model | score | average(0~5) | percentage |
|---|---|---|---|
| gpt-3.5-turbo(close) | 147 | 3.97 | 79.45% |
| naver Cue(close) | 140 | 3.78 | 75.67% |
| clova X(close) | 136 | 3.67 | 73.51% |
| WizardLM-13B-V1.2(open) | 96 | 2.59 | 51.89% |
| Llama-2-7b-chat-hf(open) | 67 | 1.81 | 36.21% |
| Llama-2-13b-chat-hf(open) | 73 | 1.91 | 38.37% |
| nlpai-lab/kullm-polyglot-12.8b-v2(open) | 70 | 1.89 | 37.83% |
| kfkas/Llama-2-ko-7b-Chat(open) | 96 | 2.59 | 51.89% |
| beomi/KoAlpaca-Polyglot-12.8B(open) | 100 | 2.70 | 54.05% |
| komt-llama2-7b-v1 (open)(ours) | 117 | 3.16 | 63.24% |
| komt-llama2-13b-v1 (open)(ours) | 129 | 3.48 | 69.72% |
| komt-llama-30b-v1 (open)(ours) | 129 | 3.16 | 63.24% |
| komt-mistral-7b-v1 (open)(ours) | 131 | 3.54 | 70.81% |
- Downloads last month
- 3
docker model run hf.co/2beone/mistral_edit