developer-lunark's picture
Update README.md for Docker build with Python 3.11
70be64c verified
metadata
title: KAIdol Thinking Experiment
emoji: 🎀
colorFrom: purple
colorTo: pink
sdk: docker
pinned: false
license: apache-2.0
tags:
  - roleplay
  - korean
  - llm-evaluation
  - a-b-testing

KAIdol A/B Test Arena

K-pop μ•„μ΄λŒ λ‘€ν”Œλ ˆμ΄ 챗봇 λͺ¨λΈ A/B 비ꡐ 평가 ν”Œλž«νΌ

Features

  • A/B Arena: 두 λͺ¨λΈμ˜ 응닡을 λ‚˜λž€νžˆ 비ꡐ
  • Blind Mode: λͺ¨λΈλͺ… 숨기고 순수 ν’ˆμ§ˆ 평가
  • ELO Ranking: νˆ¬ν‘œ 기반 λͺ¨λΈ μˆœμœ„
  • 5 Characters: κ°•μœ¨, μ„œμ΄μ•ˆ, 이지후, μ°¨λ„ν•˜, 졜민

Models (19개 μ†Œν˜• Student λͺ¨λΈ)

DPO v5 (7-14B)

  • qwen2.5-7b/14b-dpo-v5
  • exaone-7.8b-dpo-v5
  • qwen3-8b-dpo-v5
  • solar-10.7b-dpo-v5

SFT Thinking (7-14B)

  • qwen2.5-7b/14b-thinking
  • exaone-7.8b-thinking

Phase 7 Kimi Students

  • qwen2.5-7b/14b-kimi
  • exaone-7.8b-kimi

V7 Students

  • qwen2.5-7b/14b-v7
  • exaone-7.8b-v7
  • qwen3-8b-v7
  • varco-8b-v7

Usage

  1. 캐릭터와 μ‹œλ‚˜λ¦¬μ˜€ 선택
  2. λ©”μ‹œμ§€ μž…λ ₯ λ˜λŠ” 랜덀 μ‹œλ‚˜λ¦¬μ˜€ μ‚¬μš©
  3. 두 λͺ¨λΈμ˜ 응닡 비ꡐ
  4. νˆ¬ν‘œλ‘œ 더 λ‚˜μ€ 응닡 선택

Tech Stack

  • Gradio 4.x
  • Python 3.11