Instructions to use blue-tundra-42/code_and_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use blue-tundra-42/code_and_model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="blue-tundra-42/code_and_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("blue-tundra-42/code_and_model")
model = AutoModelForCausalLM.from_pretrained("blue-tundra-42/code_and_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use blue-tundra-42/code_and_model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "blue-tundra-42/code_and_model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "blue-tundra-42/code_and_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/blue-tundra-42/code_and_model

SGLang

How to use blue-tundra-42/code_and_model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "blue-tundra-42/code_and_model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "blue-tundra-42/code_and_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "blue-tundra-42/code_and_model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "blue-tundra-42/code_and_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use blue-tundra-42/code_and_model with Docker Model Runner:
```
docker model run hf.co/blue-tundra-42/code_and_model
```

code_and_model / eval_scripts /README.md

blue-tundra-42

Upload UNO Scorer (initial version)

f1f682e verified 9 days ago

preview code

raw

history blame contribute delete

7.06 kB

	[English](README.md) \| [中文](README-zh.md)

	---

	# UNO Evaluation Framework

	To facilitate generalized evaluation of various Omni benchmarks, we have constructed a lightweight Omni evaluation framework and released a high-performance scoring model to support it. You can freely and easily add new datasets or evaluation models based on this framework. Below, we will use UNO-Bench and Qwen-2.5-Omni-7B as examples to demonstrate how to run the framework.

	# 🚀 Quick Start

	## 🛠️ Environment Preparation

	Before running, please ensure the following Python core dependencies are installed. Note: Since vLLM installation involves PyTorch, CUDA, and other complex dependencies, it is recommended to set up the environment in a fresh virtual environment to avoid potential conflicts.
	```bash
	pip install -r requirements.txt
	```
	Download the necessary models and datasets using the following commands:
	```bash
	huggingface-cli download xxx --repo-type dataset --local-dir /path/to/UNO-Bench
	huggingface-cli download xxx --local-dir /path/to/UNO-Scorer
	huggingface-cli download Qwen/Qwen2.5-Omni-7B --local-dir /path/to/Qwen2.5-Omni
	```
	## 🎯 Reproducing Experimental Results

	By executing the following code, you can reproduce the experimental results of Qwen-2.5-Omni-7B presented in the paper. Remember to replace MODEL_PATH, DATASET_LOCAL_DIR, and SCORER_MODEL_PATH with your local path.
	```bash
	bash examples/run_unobench_qwen_omni_hf.sh
	```

	We recommend you to execute the vLLM version of the inference service for better performance.

	```bash
	bash examples/run_unobench_qwen_omni_vllm.sh
	```

	* The program employs sequential logic for evaluation, executing in the following order: `Start Inference Service -> Generate Results -> Release Resources -> Start Scoring Service -> Calculate Scores -> Release Resources`.
	* It supports resuming from breakpoints (checkpointing); both inference progress and scoring progress are saved locally at regular intervals.

	## 📈 Compositional Law
	You can refer to the following code for the fitting curve of the Compositional Law.

	```python
	python3 compositional_law.py
	```

	## 🤖 Using Only the Scoring Model
	We recommend using vLLM for higher efficiency. You can refer to:
	```bash
	bash examples/test_scorer_vllm.sh
	```
	Or use transformers-based approach, but with lower efficiency:
	```python
	python3 examples/test_scorer_hf.py
	```

	## ⚙️ Configuration Guide

	Before running, you must modify the configuration section at the top of `run_unobench_qwen_omni_*.sh` to adapt to your environment.

	### 1. Inference Model Configuration (Target Model)

	\| Variable Name \| Description \| Example \|
	\| :--- \| :--- \| :--- \|
	\| `MODEL_NAME` \| Model registration name (corresponds to the name defined in `models` code) \| `"Qwen-2.5-Omni-7B"` `"VLLMClient"` \|
	\| `MODEL_PATH` \| Local absolute path to the model weights \| `/path/to/Qwen2.5-Omni` \|
	\| `INFERENCE_BACKEND` \| Inference backend selection: `"vllm"` or `"hf"` \| `"vllm"` \|
	\| `TARGET_GPU_IDS` \| GPU IDs used for the inference stage \| `"0,1"` \|
	\| `TARGET_TP_SIZE` \| Tensor Parallelism size for the inference model \| `2` \|
	\| `TARGET_PORT` \| vLLM service port \| `8000` \|

	### 2. Scorer Model Configuration (Scorer Model)

	\| Variable Name \| Description \| Example \|
	\| :--- \| :--- \| :--- \|
	\| `SCORER_MODEL_PATH` \| Path to the scoring model (e.g., UNO-Scorer) \| `/path/to/UNO-Scorer` \|
	\| `SCORER_GPU_IDS` \| GPU IDs used for the scoring stage \| `"0,1"` \|
	\| `SCORER_PORT` \| vLLM service port for the scorer \| `8001` \|

	### 3. Dataset and Paths

	\| Variable Name \| Description \|
	\| :--- \| :--- \|
	\| `DATASET_NAME` \| Evaluation dataset name (e.g., `"UNO-Bench"`) \|
	\| `HF_CACHE_DIR` \| HuggingFace cache or multimedia data directory; automatically downloaded datasets will be saved here \|
	\| `DATASET_LOCAL_DIR` \| Local path for the dataset. The program prioritizes reading from `DATASET_LOCAL_DIR`; otherwise, it automatically downloads to `HF_CACHE_DIR` \|
	\| `EXP_MARKING` \| Experiment marking suffix (e.g., `_20251024`), used to distinguish experimental settings and output filenames \|

	## 🌀 Running Evaluation

	After configuration, grant execution permissions to the script and run it:

	```bash
	bash run_eval.sh
	```

	### Detailed Script Execution Flow

	1. Stage 1: Inference
	* If `vllm` mode is selected, the script starts the target model's API Server in the background.
	* Runs `eval.py --mode inference` to perform data inference.
	* Key Step: After inference is complete, the script automatically kills the target model's vLLM process to fully release GPU memory.
	2. Stage 2: Scorer Setup
	* Starts the Scoring Model's (Scorer) vLLM service in the background.
	3. Stage 3: Evaluation (Scoring)
	* Runs `eval.py --mode scoring` to send the generated results to the scoring model for evaluation.
	4. Cleanup
	* Upon task completion, automatically shuts down the scoring model service.

	## 📊 Output Results

	Evaluation results will be generated as JSON files, saved by default in the `./eval_results/` directory.

	* Filename Format: `{MODEL_NAME}{EXP_MARKING}:{DATASET_NAME}.json`

	## 📂 Minimalist Development Guide

	```text
	.
	├── run_eval.sh # [Main Program] Manages config parameters, service lifecycle, and flow control
	├── eval.py # [Execution Script] Handles data loading, API interaction, and result storage
	├── utils/ # [Dependencies] General utility functions
	├── models/ # [Dependencies] Model registration and loading
	└── benchmarks/ # [Dependencies] Dataset registration and loading
	```

	The project is mainly divided into benchmarks (evaluation sets) and evaluation models. You can register new datasets in `benchmarks/` and new models in `models/`.

	### Adding New Datasets
	1. Create a new dataset `.py` file in `benchmarks/`, such as `unobench.py`. Inherit from the `BaseDataset` class and implement the abstract methods:
	* `load_and_prepare`: Download and load the dataset, organizing each item into the `utils.EvaluationRecord` format.
	* `build_message`: Construct the message sent to the model side (OpenAI Chat Message format).
	* `build_score_message`: Construct the message sent to the scoring model (OpenAI Chat Message format).
	* `compute_score`: Calculate the score for a single data item.
	* `compute_metrics`: Calculate metrics for the entire dataset.
	2. Register the dataset in `__init__.py`.

	### Adding New Models
	1. Create a new model `.py` file in `models/`, such as `qwen_2d5_omni_7b.py`. Inherit from the `BaseModel` class and implement the abstract methods:
	* `load_model`: Load the model.
	* `generate`: Call the model interface once to generate text.
	* `generate_batch`: Batch call the model interface to generate text.
	2. Register the model in `__init__.py`.

	## ⚠️ Precautions
	* Path Check: Please ensure that the paths in the script have been modified to match the actual paths on your server.