llm-jp-3-13b-it / README.md

harutoshi

Update

856b5ed verified over 1 year ago

preview code

raw

history blame contribute delete

2.11 kB

metadata

base_model: llm-jp/llm-jp-3-13b
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - llama
  - trl
license: apache-2.0
language:
  - jp

Uploaded model

Developed by: harutoshi
License: apache-2.0
Finetuned from model : llm-jp/llm-jp-3-13b

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

Sample Use

以下は、推論を実行するためのサンプルコードです。

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
from unsloth import FastLanguageModel
from tqdm import tqdm
import json

HF_TOKEN = {あなたのHugging Faceのトークン}
model_name = "harutoshi/llm-jp-3-13b-it"

max_seq_length = 2048
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name,
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = "HF token",
)
FastLanguageModel.for_inference(model)

datasets = []
with open("./elyza-tasks-100-TV_0.jsonl", "r") as f:
    item = ""
    for line in f:
      line = line.strip()
      item += line
      if item.endswith("}"):
        datasets.append(json.loads(item))
        item = ""

results = []
for dt in tqdm(datasets):
  input = dt["input"]

  prompt = f"""### 指示\n{input}\n### 回答\n"""

  inputs = tokenizer([prompt], return_tensors = "pt").to(model.device)

  outputs = model.generate(**inputs, max_new_tokens = 512, use_cache = True, do_sample=False, repetition_penalty=1.2)
  prediction = tokenizer.decode(outputs[0], skip_special_tokens=True).split('\n### 回答')[-1]

  results.append({"task_id": dt["task_id"], "input": input, "output": prediction})

with open(f"{new_model_id}_output.jsonl", 'w', encoding='utf-8') as f:
    for result in results:
        json.dump(result, f, ensure_ascii=False)
        f.write('\n')