Instructions to use weblab-llm-competition-2025-bridge/RAMEN-MISO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use weblab-llm-competition-2025-bridge/RAMEN-MISO with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="weblab-llm-competition-2025-bridge/RAMEN-MISO")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("weblab-llm-competition-2025-bridge/RAMEN-MISO")
model = AutoModelForCausalLM.from_pretrained("weblab-llm-competition-2025-bridge/RAMEN-MISO")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use weblab-llm-competition-2025-bridge/RAMEN-MISO with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "weblab-llm-competition-2025-bridge/RAMEN-MISO"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "weblab-llm-competition-2025-bridge/RAMEN-MISO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/weblab-llm-competition-2025-bridge/RAMEN-MISO

SGLang

How to use weblab-llm-competition-2025-bridge/RAMEN-MISO with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "weblab-llm-competition-2025-bridge/RAMEN-MISO" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "weblab-llm-competition-2025-bridge/RAMEN-MISO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "weblab-llm-competition-2025-bridge/RAMEN-MISO" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "weblab-llm-competition-2025-bridge/RAMEN-MISO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use weblab-llm-competition-2025-bridge/RAMEN-MISO with Docker Model Runner:
```
docker model run hf.co/weblab-llm-competition-2025-bridge/RAMEN-MISO
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

RAMEN-MISO

MISO は、松尾研 LLM 開発コンペ 2025 において Team RAMEN (Reasoning AI Model Engineering Network) が開発した大規模言語モデルである。高難度領域における推論性能の最大化を目的として、Qwen3 系 Mixture-of-Experts (MoE) を基盤に Chain-of-Thought Supervised Fine-Tuning (CoT-SFT) で最適化している。数理・自然科学・人文社会など多様なドメインにおける長文かつ高負荷な推論を前提に設計した。

1. モデル仕様

ベースモデル: https://huggingface.co/Qwen/Qwen3-235B-A22B
推論
- リソース: H100 80GB × 16 (2 node) または H100 80GB × 8 (1 node)
学習
- 手法: CoT-SFT
- リソース: H100 × 16 (2 node)
- データ: 大学院レベル以上の数理・自然科学・人文社会系の公開／合成データ
  https://huggingface.co/datasets/weblab-llm-competition-2025-bridge/RAMEN-phase1

2. 評価方法

共通実行環境: vLLM 0.10.1.1（本 README 記載の推奨構成で検証済み）

2.1 Humanity’s Last Exam（HLE）

評価コード: https://github.com/matsuolab/llm_bridge_prod/tree/master/eval_hle
運用メモ: 2401 件（text-only）の全問が一度で完了しない場合は、複数回に分けて実行してください。
まず 2ノード を試し、失敗したら 1ノード で実行してください。
本 README に記載の設定ファイルおよび Slurm テンプレートは、運営提供の公式コードを基に最適化した推奨構成です。評価を実施する際は、本設定を利用してください。

設定ファイル（`conf/config.yaml`）

dataset: cais/hle

provider: vllm # [vllm]
base_url: http://localhost:8000/v1

model: weblab-llm-competition-2025-bridge/RAMEN-MISO
max_completion_tokens: 35000
reasoning: true

# sample with multimodal is 2500, so text-only sample is about 2400
num_workers: 2500
max_samples: 2500

judge: o3-mini-2025-01-31

Slurm テンプレート

1ノード実行

#!/bin/bash
#SBATCH --job-name=qwen3_8gpu
#SBATCH --partition=P01
#SBATCH --nodelist=osk-gpu51
#SBATCH --nodes=1
#SBATCH --gpus-per-node=8
#SBATCH --cpus-per-task=240
#SBATCH --time=24:00:00
#SBATCH --output=/home/Competition2025/adm/X006/logs/%x-%j.out
#SBATCH --error=/home/Competition2025/adm/X006/logs/%x-%j.err
#SBATCH --export=OPENAI_API_KEY="openai_api_keyをここに"
#--- モジュール & Conda --------------------------------------------
module purge
module load cuda/12.6 miniconda/24.7.1-py312
module load cudnn/9.6.0  
module load nccl/2.24.3 
source "$(conda info --base)/etc/profile.d/conda.sh"
conda activate llmbench

# Hugging Face 認証
export HF_TOKEN= "<huggingface_tokenをここに>"
export HF_HOME=${SLURM_TMPDIR:-$HOME}/.hf_cache
export TRANSFORMERS_CACHE=$HF_HOME
export HUGGINGFACE_HUB_TOKEN=$HF_TOKEN
mkdir -p "$HF_HOME"
echo "HF cache dir : $HF_HOME"                   # デバッグ用

#--- メモリ/性能チューニング  ------------------------------------------
# ベスト構成での推奨：拡張セグメントでメモリ断片化を低減
export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True"

#--- GPU 監視 -------------------------------------------------------
nvidia-smi -i 0,1,2,3,4,5,6,7 -l 3 > nvidia-smi.log &
pid_nvsmi=$!

#--- vLLM 起動（8GPU）----------------------------------------------
# ベスト構成での推奨：rope-scaling なし / reasoning-parser なし
#  - tensor-parallel=4, pipeline-parallel=2
#  - enable-expert-parallel
#  - disable-custom-all-reduce
#  - gpu-memory-utilization=0.90
vllm serve weblab-llm-competition-2025-bridge/REMEN-MISO \
  --tensor-parallel-size 4 \
  --pipeline-parallel-size 2 \
  --enable-expert-parallel \
  --gpu-memory-utilization 0.90 \
  --disable-custom-all-reduce \
  > vllm.log 2>&1 &
pid_vllm=$!

#--- ヘルスチェック -------------------------------------------------
until curl -s http://127.0.0.1:8000/health >/dev/null; do
  echo "$(date +%T) vLLM starting …"
  sleep 10
done
echo "vLLM READY"

#--- 推論 -----------------------------------------------------------
python predict.py > predict.log 2>&1

#--- 評価 -----------------------------------------------------------
OPENAI_API_KEY=xxx python judge.py

#--- 後片付け -------------------------------------------------------
kill $pid_vllm
kill $pid_nvsmi
wait

2ノード実行（Ray クラスタ方式・運営指示準拠）
jobs/ray_cluster.shを使用してray clusterを起動してください。その時partitionやnodelist、ログファイルを設定してください。
そしてsshで出力されたheadノードのノードに接続し、モジュールとcondaを読み込み、vLLMをいつも通り起動してください。ray clusterを自動で認識します。
その後、上記スクリプトからvLLM起動とヘルスチェックを削除し、configを修正してから推論してください。

2.2 Do-Not-Answer（DNA）

評価コード: https://github.com/matsuolab/llm_bridge_prod/tree/master/eval_dna

Slurm テンプレート

#!/bin/bash
#SBATCH --job-name=qwen3_8gpu
#SBATCH --partition=P01
#SBATCH --nodelist=osk-gpu51
#SBATCH --nodes=1
#SBATCH --gpus-per-node=8
#SBATCH --cpus-per-task=240
#SBATCH --time=24:00:00
#SBATCH --output=/home/Competition2025/adm/X006/logs/%x-%j.out
#SBATCH --error=/home/Competition2025/adm/X006/logs/%x-%j.err
#SBATCH --export=OPENAI_API_KEY="openai_api_keyをここに"
#--- モジュール & Conda --------------------------------------------
module purge
module load cuda/12.6 miniconda/24.7.1-py312
module load cudnn/9.6.0  
module load nccl/2.24.3 
source "$(conda info --base)/etc/profile.d/conda.sh"
conda activate llmbench

# Hugging Face 認証
export HF_TOKEN= "<huggingface_tokenをここに>"
export HF_HOME=${SLURM_TMPDIR:-$HOME}/.hf_cache
export TRANSFORMERS_CACHE=$HF_HOME
export HUGGINGFACE_HUB_TOKEN=$HF_TOKEN
mkdir -p "$HF_HOME"
echo "HF cache dir : $HF_HOME"                   # デバッグ用

#--- メモリ/性能チューニング -------------------------------------------
# ベスト構成での推奨：拡張セグメントでメモリ断片化を低減
export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True"

#--- GPU 監視 -------------------------------------------------------
nvidia-smi -i 0,1,2,3,4,5,6,7 -l 3 > nvidia-smi.log &
pid_nvsmi=$!

#--- 必要なディレクトリを作成 -----------------------------------------
mkdir -p evaluation_results

#--- vLLM 起動（8GPU）----------------------------------------------
# ベスト構成での推奨：rope-scaling なし / reasoning-parser なし
#  - tensor-parallel=4, pipeline-parallel=2
#  - enable-expert-parallel
#  - disable-custom-all-reduce
#  - gpu-memory-utilization=0.90
vllm serve weblab-llm-competition-2025-bridge/REMEN-MISO \
  --tensor-parallel-size 4 \
  --pipeline-parallel-size 2 \
  --enable-expert-parallel \
  --gpu-memory-utilization 0.90 \
  --disable-custom-all-reduce \
  > vllm.log 2>&1 &
pid_vllm=$!

#--- ヘルスチェック -------------------------------------------------
until curl -s http://127.0.0.1:8000/health >/dev/null; do
  echo "$(date +%T) vLLM starting …"
  sleep 10
done
echo "vLLM READY"

#--- 推論 -----------------------------------------------------------
python llm-compe-eval/evaluate_huggingface_models.py \
    --model_name "weblab-llm-competition-2025-bridge/RAMEN-MISO" \
    --dataset_path datasets/Instruction/do_not_answer_en.csv \
    --output_dir evaluation_results \
    --use_vllm \
    --max_questions 939 \
    --vllm_base_url http://localhost:8000/v1 > predict.log 2>&1

#--- 後片付け -------------------------------------------------------
kill $pid_vllm
kill $pid_nvsmi
wait

3. 評価結果

Benchmark	DeepSeek R1 0528 Qwen3 8B	Qwen3 235B A22B	MISO
Humanity's Last Exam (text-only)	6.46 ±1.96	11.75 ±1.36	11.12
Humanity's Last Exam Extract120 (text-only)	4.85 ±4.15	11.54 ±6.14	19.33 ±7.10
Do-Not-Answer	97.2	97.9	92.0

注: HLE-Extract120 は、公式データセット（cais/hle）の text-only 問題からカテゴリ比率を維持して 120問を層化抽出したサブセットです。

Downloads last month: -

Safetensors

Model size

235B params

Tensor type

BF16

Model tree for weblab-llm-competition-2025-bridge/RAMEN-MISO

Base model

Qwen/Qwen3-235B-A22B

Finetuned

(36)

this model

You need to agree to share your contact information to access this model

RAMEN-MISO

1. モデル仕様

2. 評価方法

2.1 Humanity’s Last Exam（HLE）

設定ファイル（conf/config.yaml）

Slurm テンプレート

2.2 Do-Not-Answer（DNA）

Slurm テンプレート

3. 評価結果

Model tree for weblab-llm-competition-2025-bridge/RAMEN-MISO

設定ファイル（`conf/config.yaml`）