k2s3_test_24001 / README.md

leaderboard-pr-bot

Adding Evaluation Results

db84b5d verified about 2 years ago

5.71 kB

language:
  - ko
license: llama2
model-index:
  - name: k2s3_test_24001
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 55.72
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Changgil/k2s3_test_24001
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 80.69
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Changgil/k2s3_test_24001
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 54.6
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Changgil/k2s3_test_24001
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 43.57
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Changgil/k2s3_test_24001
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 75.69
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Changgil/k2s3_test_24001
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 29.8
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Changgil/k2s3_test_24001
          name: Open LLM Leaderboard

Developed by :

Changgil Song

Model Number:

k2s3_test_24001

Base Model :

meta-llama/Llama-2-13b-chat-hf

Training Data

The model was trained on a diverse dataset comprising approximately 800 million tokens, including the Standard Korean Dictionary, KULLM training data from Korea University, dissertation abstracts from master's and doctoral theses, and Korean language samples from AI Hub.
이 모델은 표준대국어사전, 고려대 KULLM의 훈련 데이터, 석박사학위자 서지정보 논문초록, ai_hub의 한국어 데이터 샘플들을 포함하여 약 8억 개의 토큰으로 구성된 다양한 데이터셋에서 훈련되었습니다.

Training Method

This model was fine-tuned on the "meta-llama/Llama-2-13b-chat-hf" base model using PEFT (Parameter-Efficient Fine-Tuning) LoRA (Low-Rank Adaptation) techniques.
이 모델은 "meta-llama/Llama-2-13b-chat-hf" 기반 모델을 PEFT LoRA를 사용하여 미세조정되었습니다.

Hardware and Software

Hardware: Utilized two A100 (80G*2EA) GPUs for training.
Training Factors: This model was fine-tuned using PEFT LoRA with the HuggingFace SFTtrainer and applied fsdp. Key parameters included LoRA r = 8, LoRA alpha = 16, trained for 2 epochs, batch size of 1, and gradient accumulation of 32.
이 모델은 PEFT LoRA를 사용하여 HuggingFace SFTtrainer와 fsdp를 적용하여 미세조정되었습니다. 주요 파라미터로는 LoRA r = 8, LoRA alpha = 16, 2 에폭 훈련, 배치 크기 1, 그리고 그라디언트 누적 32를 포함합니다.

Caution

For fine-tuning this model, it is advised to consider the specific parameters used during training, such as LoRA r and LoRA alpha values, to ensure compatibility and optimal performance.
이 모델을 미세조정할 때는 LoRA r 및 LoRA alpha 값과 같이 훈련 중에 사용된 특정 파라미터를 고려하는 것이 좋습니다. 이는 호환성 및 최적의 성능을 보장하기 위함입니다.

Additional Information

The training leveraged the fsdp (Fully Sharded Data Parallel) feature through the HuggingFace SFTtrainer for efficient memory usage and accelerated training.
훈련은 HuggingFace SFTtrainer를 통한 fsdp 기능을 활용하여 메모리 사용을 효율적으로 하고 훈련 속도를 가속화했습니다.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	56.68
AI2 Reasoning Challenge (25-Shot)	55.72
HellaSwag (10-Shot)	80.69
MMLU (5-Shot)	54.60
TruthfulQA (0-shot)	43.57
Winogrande (5-shot)	75.69
GSM8k (5-shot)	29.80