Text Generation
Transformers
Safetensors
Chinese
English
qwen3
qwen
scoring
grading
evaluation
llm-judge
conversational
text-generation-inference
Instructions to use blue-tundra-42/code_and_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use blue-tundra-42/code_and_model with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="blue-tundra-42/code_and_model") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("blue-tundra-42/code_and_model") model = AutoModelForCausalLM.from_pretrained("blue-tundra-42/code_and_model") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use blue-tundra-42/code_and_model with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "blue-tundra-42/code_and_model" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "blue-tundra-42/code_and_model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/blue-tundra-42/code_and_model
- SGLang
How to use blue-tundra-42/code_and_model with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "blue-tundra-42/code_and_model" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "blue-tundra-42/code_and_model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "blue-tundra-42/code_and_model" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "blue-tundra-42/code_and_model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use blue-tundra-42/code_and_model with Docker Model Runner:
docker model run hf.co/blue-tundra-42/code_and_model
| language: | |
| - zh | |
| - en | |
| license: apache-2.0 | |
| base_model: Qwen/Qwen3-14B | |
| library_name: transformers | |
| tags: | |
| - qwen | |
| - scoring | |
| - grading | |
| - evaluation | |
| - llm-judge | |
| pipeline_tag: text-generation | |
| # UNO-Scorer: A Unified General Scoring Model for UNO-Bench | |
| ## 📖 Introduction | |
| **UNO-Scorer** is a lightweight yet high-precision general scoring model developed as part of **UNO-Bench**. It is designed to efficiently automate the evaluation of Large Multimodal Models (LMMs) with minimal computational overhead. | |
| Built upon the powerful **Qwen3-14B** backbone, UNO-Scorer is fine-tuned on 13K high-quality in-house data. It overcomes the limitations of traditional Overall Reward Models (ORMs) by supporting **6 distinct question types**, with particular excellence in **Multi-Step Open-Ended Questions (MO)**. | |
| ## 📊 Performance | |
| UNO-Scorer demonstrates superior performance in automated evaluation, particularly in handling complex **Multi-Step Open-Ended Questions**. We compared the accuracy of our scorer against other advanced evaluators: | |
| | Model | Accuracy | | |
| | :--- | :--- | | |
| | Seed-1.5-VL | 0.9118 | | |
| | GPT-4.1 | 0.9457 | | |
| | **UNO-Scorer (Ours)** | **0.9505** | | |
| Experiments show that UNO-Scorer surpasses even proprietary frontier models like GPT-4.1 in this specific evaluation domain with lower cost. | |
| ## 💻 Usage | |
| ### Run Inference | |
| We provide an example script based on **vLLM** for efficient model inference. You can run the following command to test the scorer: | |
| ```bash | |
| bash examples/test_scorer.sh | |
| ``` | |
| ### 4. Adapt Your Reference Answer | |
| The most critical aspect of utilizing the UNO-Scorer lies in the proper formatting of the Reference Answer. Specifically, it is required to: | |
| 1. Assign point values to the answer components. The total points for the question should typically sum to 10 points. | |
| 2. You may customize detailed scoring criteria for each reference answer to suit your needs(e.g., clarifying how to judge cases where the final choice is correct but the reasoning is flawed). | |
| Note: Since the model is primarily trained on Chinese corpora, it adheres more accurately to instructions when these specific descriptions are written in Chinese. | |
| You can structure the Reference Answer as follows: | |
| | Question Type | Scenario | **Reference Answer** | Example | | |
| | :--- | :--- | :--- | :--- | | |
| | **Single Question** | The model only needs to check if the final result matches. | Format as a single sub-question (Sub-question 1) worth exactly 10 points.<br><br>Template:<br>`小问1:{Answer},总分10分,无需关注推理过程,最终答案正确即可` | **Raw Answer:** "C"<br>**Input Answer:** `小问1:C,总分10分,无需关注推理过程,最终答案正确即可` | | |
| | **Multiple Question** | The model needs to grade specific checkpoints. | Break down the answer into numbered sub-steps with assigned points (summing to exactly 10).<br><br>Template:<br>`1. {Sub-Answer A} ({X} points); 2. {Sub-Answer B} ({Y} points).` | **Raw Answer:** "5 apples, 6 bananas"<br>**Input Answer:** `1. 5 apples (4 points); 2. 6 bananas (6 points).` | | |
| --- | |
| **Disclaimer:** This model is based on Qwen3-14B. Please strictly follow the license and usage policy of the original Qwen model series. |