k2s3_test_24001 / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
db84b5d verified
|
raw
history blame
5.71 kB
metadata
language:
  - ko
license: llama2
model-index:
  - name: k2s3_test_24001
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 55.72
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Changgil/k2s3_test_24001
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 80.69
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Changgil/k2s3_test_24001
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 54.6
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Changgil/k2s3_test_24001
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 43.57
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Changgil/k2s3_test_24001
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 75.69
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Changgil/k2s3_test_24001
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 29.8
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Changgil/k2s3_test_24001
          name: Open LLM Leaderboard

Developed by :

  • Changgil Song

Model Number:

  • k2s3_test_24001

Base Model :

Training Data

  • The model was trained on a diverse dataset comprising approximately 800 million tokens, including the Standard Korean Dictionary, KULLM training data from Korea University, dissertation abstracts from master's and doctoral theses, and Korean language samples from AI Hub.
  • ์ด ๋ชจ๋ธ์€ ํ‘œ์ค€๋Œ€๊ตญ์–ด์‚ฌ์ „, ๊ณ ๋ ค๋Œ€ KULLM์˜ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ, ์„๋ฐ•์‚ฌํ•™์œ„์ž ์„œ์ง€์ •๋ณด ๋…ผ๋ฌธ์ดˆ๋ก, ai_hub์˜ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ๋“ค์„ ํฌํ•จํ•˜์—ฌ ์•ฝ 8์–ต ๊ฐœ์˜ ํ† ํฐ์œผ๋กœ ๊ตฌ์„ฑ๋œ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์…‹์—์„œ ํ›ˆ๋ จ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Training Method

  • This model was fine-tuned on the "meta-llama/Llama-2-13b-chat-hf" base model using PEFT (Parameter-Efficient Fine-Tuning) LoRA (Low-Rank Adaptation) techniques.
  • ์ด ๋ชจ๋ธ์€ "meta-llama/Llama-2-13b-chat-hf" ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ PEFT LoRA๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฏธ์„ธ์กฐ์ •๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Hardware and Software

  • Hardware: Utilized two A100 (80G*2EA) GPUs for training.
  • Training Factors: This model was fine-tuned using PEFT LoRA with the HuggingFace SFTtrainer and applied fsdp. Key parameters included LoRA r = 8, LoRA alpha = 16, trained for 2 epochs, batch size of 1, and gradient accumulation of 32.
  • ์ด ๋ชจ๋ธ์€ PEFT LoRA๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ HuggingFace SFTtrainer์™€ fsdp๋ฅผ ์ ์šฉํ•˜์—ฌ ๋ฏธ์„ธ์กฐ์ •๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ฃผ์š” ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ๋Š” LoRA r = 8, LoRA alpha = 16, 2 ์—ํญ ํ›ˆ๋ จ, ๋ฐฐ์น˜ ํฌ๊ธฐ 1, ๊ทธ๋ฆฌ๊ณ  ๊ทธ๋ผ๋””์–ธํŠธ ๋ˆ„์  32๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

Caution

  • For fine-tuning this model, it is advised to consider the specific parameters used during training, such as LoRA r and LoRA alpha values, to ensure compatibility and optimal performance.
  • ์ด ๋ชจ๋ธ์„ ๋ฏธ์„ธ์กฐ์ •ํ•  ๋•Œ๋Š” LoRA r ๋ฐ LoRA alpha ๊ฐ’๊ณผ ๊ฐ™์ด ํ›ˆ๋ จ ์ค‘์— ์‚ฌ์šฉ๋œ ํŠน์ • ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ณ ๋ คํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ์ด๋Š” ํ˜ธํ™˜์„ฑ ๋ฐ ์ตœ์ ์˜ ์„ฑ๋Šฅ์„ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•จ์ž…๋‹ˆ๋‹ค.

Additional Information

  • The training leveraged the fsdp (Fully Sharded Data Parallel) feature through the HuggingFace SFTtrainer for efficient memory usage and accelerated training.
  • ํ›ˆ๋ จ์€ HuggingFace SFTtrainer๋ฅผ ํ†ตํ•œ fsdp ๊ธฐ๋Šฅ์„ ํ™œ์šฉํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ์„ ํšจ์œจ์ ์œผ๋กœ ํ•˜๊ณ  ํ›ˆ๋ จ ์†๋„๋ฅผ ๊ฐ€์†ํ™”ํ–ˆ์Šต๋‹ˆ๋‹ค.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 56.68
AI2 Reasoning Challenge (25-Shot) 55.72
HellaSwag (10-Shot) 80.69
MMLU (5-Shot) 54.60
TruthfulQA (0-shot) 43.57
Winogrande (5-shot) 75.69
GSM8k (5-shot) 29.80