You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

RAMEN-MISO

MISO ใฏใ€ๆพๅฐพ็ ” LLM ้–‹็™บใ‚ณใƒณใƒš 2025 ใซใŠใ„ใฆ Team RAMEN (Reasoning AI Model Engineering Network) ใŒ้–‹็™บใ—ใŸๅคง่ฆๆจก่จ€่ชžใƒขใƒ‡ใƒซใงใ‚ใ‚‹ใ€‚้ซ˜้›ฃๅบฆ้ ˜ๅŸŸใซใŠใ‘ใ‚‹ๆŽจ่ซ–ๆ€ง่ƒฝใฎๆœ€ๅคงๅŒ–ใ‚’็›ฎ็š„ใจใ—ใฆใ€Qwen3 ็ณป Mixture-of-Experts (MoE) ใ‚’ๅŸบ็›คใซ Chain-of-Thought Supervised Fine-Tuning (CoT-SFT) ใงๆœ€้ฉๅŒ–ใ—ใฆใ„ใ‚‹ใ€‚ๆ•ฐ็†ใƒป่‡ช็„ถ็ง‘ๅญฆใƒปไบบๆ–‡็คพไผšใชใฉๅคšๆง˜ใชใƒ‰ใƒกใ‚คใƒณใซใŠใ‘ใ‚‹้•ทๆ–‡ใ‹ใค้ซ˜่ฒ ่ทใชๆŽจ่ซ–ใ‚’ๅ‰ๆใซ่จญ่จˆใ—ใŸใ€‚


1. ใƒขใƒ‡ใƒซไป•ๆง˜

  • ใƒ™ใƒผใ‚นใƒขใƒ‡ใƒซ: https://huggingface.co/Qwen/Qwen3-235B-A22B
  • ๆŽจ่ซ–
    • ใƒชใ‚ฝใƒผใ‚น: H100 80GB ร— 16 (2 node) ใพใŸใฏ H100 80GB ร— 8 (1 node)
  • ๅญฆ็ฟ’
    • ๆ‰‹ๆณ•: CoT-SFT
    • ใƒชใ‚ฝใƒผใ‚น: H100 ร— 16 (2 node)
    • ใƒ‡ใƒผใ‚ฟ: ๅคงๅญฆ้™ขใƒฌใƒ™ใƒซไปฅไธŠใฎๆ•ฐ็†ใƒป่‡ช็„ถ็ง‘ๅญฆใƒปไบบๆ–‡็คพไผš็ณปใฎๅ…ฌ้–‹๏ผๅˆๆˆใƒ‡ใƒผใ‚ฟ
      https://huggingface.co/datasets/weblab-llm-competition-2025-bridge/RAMEN-phase1

2. ่ฉ•ไพกๆ–นๆณ•

  • ๅ…ฑ้€šๅฎŸ่กŒ็’ฐๅขƒ: vLLM 0.10.1.1๏ผˆๆœฌ README ่จ˜่ผ‰ใฎๆŽจๅฅจๆง‹ๆˆใงๆคœ่จผๆธˆใฟ๏ผ‰

2.1 Humanityโ€™s Last Exam๏ผˆHLE๏ผ‰

  • ่ฉ•ไพกใ‚ณใƒผใƒ‰: https://github.com/matsuolab/llm_bridge_prod/tree/master/eval_hle
  • ้‹็”จใƒกใƒข: 2401 ไปถ๏ผˆtext-only๏ผ‰ใฎๅ…จๅ•ใŒไธ€ๅบฆใงๅฎŒไบ†ใ—ใชใ„ๅ ดๅˆใฏใ€่ค‡ๆ•ฐๅ›žใซๅˆ†ใ‘ใฆๅฎŸ่กŒใ—ใฆใใ ใ•ใ„ใ€‚
  • ใพใš 2ใƒŽใƒผใƒ‰ ใ‚’่ฉฆใ—ใ€ๅคฑๆ•—ใ—ใŸใ‚‰ 1ใƒŽใƒผใƒ‰ ใงๅฎŸ่กŒใ—ใฆใใ ใ•ใ„ใ€‚
  • ๆœฌ README ใซ่จ˜่ผ‰ใฎ่จญๅฎšใƒ•ใ‚กใ‚คใƒซใŠใ‚ˆใณ Slurm ใƒ†ใƒณใƒ—ใƒฌใƒผใƒˆใฏใ€้‹ๅ–ถๆไพ›ใฎๅ…ฌๅผใ‚ณใƒผใƒ‰ใ‚’ๅŸบใซๆœ€้ฉๅŒ–ใ—ใŸๆŽจๅฅจๆง‹ๆˆใงใ™ใ€‚่ฉ•ไพกใ‚’ๅฎŸๆ–ฝใ™ใ‚‹้š›ใฏใ€ๆœฌ่จญๅฎšใ‚’ๅˆฉ็”จใ—ใฆใใ ใ•ใ„ใ€‚

่จญๅฎšใƒ•ใ‚กใ‚คใƒซ๏ผˆconf/config.yaml๏ผ‰

dataset: cais/hle

provider: vllm # [vllm]
base_url: http://localhost:8000/v1

model: weblab-llm-competition-2025-bridge/RAMEN-MISO
max_completion_tokens: 35000
reasoning: true

# sample with multimodal is 2500, so text-only sample is about 2400
num_workers: 2500
max_samples: 2500

judge: o3-mini-2025-01-31

Slurm ใƒ†ใƒณใƒ—ใƒฌใƒผใƒˆ

1ใƒŽใƒผใƒ‰ๅฎŸ่กŒ

#!/bin/bash
#SBATCH --job-name=qwen3_8gpu
#SBATCH --partition=P01
#SBATCH --nodelist=osk-gpu51
#SBATCH --nodes=1
#SBATCH --gpus-per-node=8
#SBATCH --cpus-per-task=240
#SBATCH --time=24:00:00
#SBATCH --output=/home/Competition2025/adm/X006/logs/%x-%j.out
#SBATCH --error=/home/Competition2025/adm/X006/logs/%x-%j.err
#SBATCH --export=OPENAI_API_KEY="openai_api_keyใ‚’ใ“ใ“ใซ"
#--- ใƒขใ‚ธใƒฅใƒผใƒซ & Conda --------------------------------------------
module purge
module load cuda/12.6 miniconda/24.7.1-py312
module load cudnn/9.6.0  
module load nccl/2.24.3 
source "$(conda info --base)/etc/profile.d/conda.sh"
conda activate llmbench

# Hugging Face ่ช่จผ
export HF_TOKEN= "<huggingface_tokenใ‚’ใ“ใ“ใซ>"
export HF_HOME=${SLURM_TMPDIR:-$HOME}/.hf_cache
export TRANSFORMERS_CACHE=$HF_HOME
export HUGGINGFACE_HUB_TOKEN=$HF_TOKEN
mkdir -p "$HF_HOME"
echo "HF cache dir : $HF_HOME"                   # ใƒ‡ใƒใƒƒใ‚ฐ็”จ

#--- ใƒกใƒขใƒช/ๆ€ง่ƒฝใƒใƒฅใƒผใƒ‹ใƒณใ‚ฐ  ------------------------------------------
# ใƒ™ใ‚นใƒˆๆง‹ๆˆใงใฎๆŽจๅฅจ๏ผšๆ‹กๅผตใ‚ปใ‚ฐใƒกใƒณใƒˆใงใƒกใƒขใƒชๆ–ญ็‰‡ๅŒ–ใ‚’ไฝŽๆธ›
export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True"

#--- GPU ็›ฃ่ฆ– -------------------------------------------------------
nvidia-smi -i 0,1,2,3,4,5,6,7 -l 3 > nvidia-smi.log &
pid_nvsmi=$!

#--- vLLM ่ตทๅ‹•๏ผˆ8GPU๏ผ‰----------------------------------------------
# ใƒ™ใ‚นใƒˆๆง‹ๆˆใงใฎๆŽจๅฅจ๏ผšrope-scaling ใชใ— / reasoning-parser ใชใ—
#  - tensor-parallel=4, pipeline-parallel=2
#  - enable-expert-parallel
#  - disable-custom-all-reduce
#  - gpu-memory-utilization=0.90
vllm serve weblab-llm-competition-2025-bridge/REMEN-MISO \
  --tensor-parallel-size 4 \
  --pipeline-parallel-size 2 \
  --enable-expert-parallel \
  --gpu-memory-utilization 0.90 \
  --disable-custom-all-reduce \
  > vllm.log 2>&1 &
pid_vllm=$!

#--- ใƒ˜ใƒซใ‚นใƒใ‚งใƒƒใ‚ฏ -------------------------------------------------
until curl -s http://127.0.0.1:8000/health >/dev/null; do
  echo "$(date +%T) vLLM starting โ€ฆ"
  sleep 10
done
echo "vLLM READY"

#--- ๆŽจ่ซ– -----------------------------------------------------------
python predict.py > predict.log 2>&1

#--- ่ฉ•ไพก -----------------------------------------------------------
OPENAI_API_KEY=xxx python judge.py

#--- ๅพŒ็‰‡ไป˜ใ‘ -------------------------------------------------------
kill $pid_vllm
kill $pid_nvsmi
wait

2ใƒŽใƒผใƒ‰ๅฎŸ่กŒ๏ผˆRay ใ‚ฏใƒฉใ‚นใ‚ฟๆ–นๅผใƒป้‹ๅ–ถๆŒ‡็คบๆบ–ๆ‹ ๏ผ‰
jobs/ray_cluster.shใ‚’ไฝฟ็”จใ—ใฆray clusterใ‚’่ตทๅ‹•ใ—ใฆใใ ใ•ใ„ใ€‚ใใฎๆ™‚partitionใ‚„nodelistใ€ใƒญใ‚ฐใƒ•ใ‚กใ‚คใƒซใ‚’่จญๅฎšใ—ใฆใใ ใ•ใ„ใ€‚
ใใ—ใฆsshใงๅ‡บๅŠ›ใ•ใ‚ŒใŸheadใƒŽใƒผใƒ‰ใฎใƒŽใƒผใƒ‰ใซๆŽฅ็ถšใ—ใ€ใƒขใ‚ธใƒฅใƒผใƒซใจcondaใ‚’่ชญใฟ่พผใฟใ€vLLMใ‚’ใ„ใคใ‚‚้€šใ‚Š่ตทๅ‹•ใ—ใฆใใ ใ•ใ„ใ€‚ray clusterใ‚’่‡ชๅ‹•ใง่ช่ญ˜ใ—ใพใ™ใ€‚
ใใฎๅพŒใ€ไธŠ่จ˜ใ‚นใ‚ฏใƒชใƒ—ใƒˆใ‹ใ‚‰vLLM่ตทๅ‹•ใจใƒ˜ใƒซใ‚นใƒใ‚งใƒƒใ‚ฏใ‚’ๅ‰Š้™คใ—ใ€configใ‚’ไฟฎๆญฃใ—ใฆใ‹ใ‚‰ๆŽจ่ซ–ใ—ใฆใใ ใ•ใ„ใ€‚

2.2 Do-Not-Answer๏ผˆDNA๏ผ‰

Slurm ใƒ†ใƒณใƒ—ใƒฌใƒผใƒˆ

#!/bin/bash
#SBATCH --job-name=qwen3_8gpu
#SBATCH --partition=P01
#SBATCH --nodelist=osk-gpu51
#SBATCH --nodes=1
#SBATCH --gpus-per-node=8
#SBATCH --cpus-per-task=240
#SBATCH --time=24:00:00
#SBATCH --output=/home/Competition2025/adm/X006/logs/%x-%j.out
#SBATCH --error=/home/Competition2025/adm/X006/logs/%x-%j.err
#SBATCH --export=OPENAI_API_KEY="openai_api_keyใ‚’ใ“ใ“ใซ"
#--- ใƒขใ‚ธใƒฅใƒผใƒซ & Conda --------------------------------------------
module purge
module load cuda/12.6 miniconda/24.7.1-py312
module load cudnn/9.6.0  
module load nccl/2.24.3 
source "$(conda info --base)/etc/profile.d/conda.sh"
conda activate llmbench

# Hugging Face ่ช่จผ
export HF_TOKEN= "<huggingface_tokenใ‚’ใ“ใ“ใซ>"
export HF_HOME=${SLURM_TMPDIR:-$HOME}/.hf_cache
export TRANSFORMERS_CACHE=$HF_HOME
export HUGGINGFACE_HUB_TOKEN=$HF_TOKEN
mkdir -p "$HF_HOME"
echo "HF cache dir : $HF_HOME"                   # ใƒ‡ใƒใƒƒใ‚ฐ็”จ

#--- ใƒกใƒขใƒช/ๆ€ง่ƒฝใƒใƒฅใƒผใƒ‹ใƒณใ‚ฐ -------------------------------------------
# ใƒ™ใ‚นใƒˆๆง‹ๆˆใงใฎๆŽจๅฅจ๏ผšๆ‹กๅผตใ‚ปใ‚ฐใƒกใƒณใƒˆใงใƒกใƒขใƒชๆ–ญ็‰‡ๅŒ–ใ‚’ไฝŽๆธ›
export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True"

#--- GPU ็›ฃ่ฆ– -------------------------------------------------------
nvidia-smi -i 0,1,2,3,4,5,6,7 -l 3 > nvidia-smi.log &
pid_nvsmi=$!

#--- ๅฟ…่ฆใชใƒ‡ใ‚ฃใƒฌใ‚ฏใƒˆใƒชใ‚’ไฝœๆˆ -----------------------------------------
mkdir -p evaluation_results

#--- vLLM ่ตทๅ‹•๏ผˆ8GPU๏ผ‰----------------------------------------------
# ใƒ™ใ‚นใƒˆๆง‹ๆˆใงใฎๆŽจๅฅจ๏ผšrope-scaling ใชใ— / reasoning-parser ใชใ—
#  - tensor-parallel=4, pipeline-parallel=2
#  - enable-expert-parallel
#  - disable-custom-all-reduce
#  - gpu-memory-utilization=0.90
vllm serve weblab-llm-competition-2025-bridge/REMEN-MISO \
  --tensor-parallel-size 4 \
  --pipeline-parallel-size 2 \
  --enable-expert-parallel \
  --gpu-memory-utilization 0.90 \
  --disable-custom-all-reduce \
  > vllm.log 2>&1 &
pid_vllm=$!

#--- ใƒ˜ใƒซใ‚นใƒใ‚งใƒƒใ‚ฏ -------------------------------------------------
until curl -s http://127.0.0.1:8000/health >/dev/null; do
  echo "$(date +%T) vLLM starting โ€ฆ"
  sleep 10
done
echo "vLLM READY"

#--- ๆŽจ่ซ– -----------------------------------------------------------
python llm-compe-eval/evaluate_huggingface_models.py \
    --model_name "weblab-llm-competition-2025-bridge/RAMEN-MISO" \
    --dataset_path datasets/Instruction/do_not_answer_en.csv \
    --output_dir evaluation_results \
    --use_vllm \
    --max_questions 939 \
    --vllm_base_url http://localhost:8000/v1 > predict.log 2>&1

#--- ๅพŒ็‰‡ไป˜ใ‘ -------------------------------------------------------
kill $pid_vllm
kill $pid_nvsmi
wait

3. ่ฉ•ไพก็ตๆžœ

Benchmark DeepSeek R1 0528 Qwen3 8B Qwen3 235B A22B MISO
Humanity's Last Exam (text-only) 6.46 ยฑ1.96 11.75 ยฑ1.36 11.12
Humanity's Last Exam Extract120 (text-only) 4.85 ยฑ4.15 11.54 ยฑ6.14 19.33 ยฑ7.10
Do-Not-Answer 97.2 97.9 92.0

ๆณจ: HLE-Extract120 ใฏใ€ๅ…ฌๅผใƒ‡ใƒผใ‚ฟใ‚ปใƒƒใƒˆ๏ผˆcais/hle๏ผ‰ใฎ text-only ๅ•้กŒใ‹ใ‚‰ใ‚ซใƒ†ใ‚ดใƒชๆฏ”็އใ‚’็ถญๆŒใ—ใฆ 120ๅ• ใ‚’ๅฑคๅŒ–ๆŠฝๅ‡บใ—ใŸใ‚ตใƒ–ใ‚ปใƒƒใƒˆใงใ™ใ€‚


Downloads last month
-
Safetensors
Model size
235B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for weblab-llm-competition-2025-bridge/RAMEN-MISO

Finetuned
(32)
this model