Instructions to use GadflyII/GLM-4.7-Flash-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use GadflyII/GLM-4.7-Flash-NVFP4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="GadflyII/GLM-4.7-Flash-NVFP4")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("GadflyII/GLM-4.7-Flash-NVFP4")
model = AutoModelForCausalLM.from_pretrained("GadflyII/GLM-4.7-Flash-NVFP4")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use GadflyII/GLM-4.7-Flash-NVFP4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "GadflyII/GLM-4.7-Flash-NVFP4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GadflyII/GLM-4.7-Flash-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/GadflyII/GLM-4.7-Flash-NVFP4

SGLang

How to use GadflyII/GLM-4.7-Flash-NVFP4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "GadflyII/GLM-4.7-Flash-NVFP4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GadflyII/GLM-4.7-Flash-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "GadflyII/GLM-4.7-Flash-NVFP4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GadflyII/GLM-4.7-Flash-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use GadflyII/GLM-4.7-Flash-NVFP4 with Docker Model Runner:
```
docker model run hf.co/GadflyII/GLM-4.7-Flash-NVFP4
```

Wasn't able to recreate MMLU-Pro benchmarks

by zenmagnets - opened Jan 23

Discussion

zenmagnets

Jan 23

•

edited Jan 23

Only got 71.4% average for GLM-4.7-Flash-NVFP4. Seems to have many timeouts do to infinite loops.

What parameters, hardware were you running on @GadflyII ?

GadflyII

Owner Jan 23

•

edited Jan 23

System with 2 RTX Pro Blackwell GPU, and one with and 2x RTX 4090. MMLU pro bench was done on the dual blackwell machine.

Not sure why you would have infinite loops and timeouts. That is not something I have seen at all.

Below is the entire MMLU Pro script I ran to test both NVFP4 and BF16 models.

Note: lm_eval wrapper has some compatibility issues with transformers 5.

#!/usr/bin/env python3
"""
MMLU-Pro Evaluation Script for GLM-4.7-Flash NVFP4

Sets proper multiprocessing start method before importing CUDA modules.
"""

import multiprocessing
multiprocessing.set_start_method('spawn', force=True)

import os
import sys
import json
import argparse
from datetime import datetime
from pathlib import Path

# Must set before importing CUDA modules
os.environ.setdefault('CUDA_VISIBLE_DEVICES', '0')

import lm_eval
from lm_eval import evaluator
from lm_eval.models.vllm_causallms import VLLM


def run_eval(model_path: str, output_dir: str, model_name: str = "nvfp4"):
    """Run MMLU-Pro evaluation on a model."""
    print("="*80)
    print(f"GLM-4.7-Flash {model_name.upper()} - MMLU-Pro Evaluation")
    print("="*80)
    print(f"Model: {model_path}")
    print(f"Output: {output_dir}")
    print(f"Start time: {datetime.now().isoformat()}")
    print("="*80)

    # Create output directory
    os.makedirs(output_dir, exist_ok=True)

    # Initialize vLLM model
    print("\nLoading model...")
    model = VLLM(
        pretrained=model_path,
        tensor_parallel_size=1,
        trust_remote_code=True,
        max_model_len=4096,
        gpu_memory_utilization=0.90,
        enforce_eager=False,  # Allow CUDA graphs for speed
        dtype="auto",
    )

    print("Model loaded. Starting MMLU-Pro evaluation...")

    # Run MMLU-Pro evaluation
    results = evaluator.simple_evaluate(
        model=model,
        tasks=["mmlu_pro"],
        num_fewshot=0,
        batch_size="auto",
        log_samples=True,
    )

    # Save results
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    results_file = os.path.join(output_dir, f"mmlu_pro_results_{model_name}_{timestamp}.json")

    # Extract and organize results
    mmlu_pro_results = results.get("results", {})

    # Calculate category averages
    categories = {}
    overall_correct = 0
    overall_total = 0

    for task_name, metrics in mmlu_pro_results.items():
        if isinstance(metrics, dict) and "acc,none" in metrics:
            acc = metrics["acc,none"]
            # Extract category from task name (e.g., mmlu_pro_biology -> biology)
            if task_name.startswith("mmlu_pro_"):
                category = task_name.replace("mmlu_pro_", "")
            else:
                category = task_name
            categories[category] = acc

    # Overall score
    if "mmlu_pro" in mmlu_pro_results:
        overall_acc = mmlu_pro_results["mmlu_pro"].get("acc,none", 0)
    else:
        overall_acc = sum(categories.values()) / len(categories) if categories else 0

    serializable_results = {
        "timestamp": timestamp,
        "model": model_path,
        "model_name": model_name,
        "task": "mmlu_pro",
        "num_fewshot": 0,
        "overall_accuracy": overall_acc,
        "category_results": categories,
        "raw_results": {k: {kk: vv for kk, vv in v.items() if not callable(vv)}
                       for k, v in mmlu_pro_results.items() if isinstance(v, dict)},
        "configs": {k: str(v) for k, v in results.get("configs", {}).items()},
        "versions": results.get("versions", {}),
    }

    with open(results_file, 'w') as f:
        json.dump(serializable_results, f, indent=2, default=str)

    print(f"\nResults saved to: {results_file}")

    # Print summary
    print("\n" + "="*80)
    print("MMLU-PRO RESULTS SUMMARY")
    print("="*80)
    print(f"\nOverall Accuracy: {overall_acc:.4f} ({overall_acc*100:.2f}%)")

    if categories:
        print("\nCategory Results:")
        for cat, acc in sorted(categories.items(), key=lambda x: -x[1]):
            print(f"  {cat}: {acc:.4f} ({acc*100:.2f}%)")

    print("\n" + "="*80)
    print(f"End time: {datetime.now().isoformat()}")

    return serializable_results


def main():
    parser = argparse.ArgumentParser(description="Run MMLU-Pro evaluation on GLM-4.7-Flash")
    parser.add_argument("--model", choices=["nvfp4", "bf16", "both"], default="nvfp4",
                       help="Which model to evaluate")
    args = parser.parse_args()

    NVFP4_PATH = "/home/quant/AI/glm-4.7-flash/nvfp4"
    BF16_PATH = "/home/quant/AI/glm-4.7-flash/bf16"
    OUTPUT_DIR = "/home/quant/AI/glm-4.7-flash/eval_results"

    results = {}

    if args.model in ["nvfp4", "both"]:
        print("\n>>> Evaluating NVFP4 model...")
        results["nvfp4"] = run_eval(NVFP4_PATH, OUTPUT_DIR, "nvfp4")

    if args.model in ["bf16", "both"]:
        print("\n>>> Evaluating BF16 model...")
        results["bf16"] = run_eval(BF16_PATH, OUTPUT_DIR, "bf16")

    if args.model == "both" and len(results) == 2:
        # Compare results
        print("\n" + "="*80)
        print("COMPARISON: BF16 vs NVFP4")
        print("="*80)

        bf16_acc = results["bf16"]["overall_accuracy"]
        nvfp4_acc = results["nvfp4"]["overall_accuracy"]
        diff = nvfp4_acc - bf16_acc

        print(f"\nBF16 Overall:  {bf16_acc:.4f} ({bf16_acc*100:.2f}%)")
        print(f"NVFP4 Overall: {nvfp4_acc:.4f} ({nvfp4_acc*100:.2f}%)")
        print(f"Difference:    {diff:+.4f} ({diff*100:+.2f}%)")

        # Save comparison
        comparison_file = os.path.join(OUTPUT_DIR, f"mmlu_pro_comparison_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json")
        with open(comparison_file, 'w') as f:
            json.dump({
                "bf16_accuracy": bf16_acc,
                "nvfp4_accuracy": nvfp4_acc,
                "accuracy_difference": diff,
                "bf16_results": results["bf16"],
                "nvfp4_results": results["nvfp4"],
            }, f, indent=2)
        print(f"\nComparison saved to: {comparison_file}")

    return results


if __name__ == "__main__":
    main()

GadflyII

Owner Jan 24

•

edited Jan 24

get it to run? @zenmagnets

zenmagnets

Jan 24

get it to run? @zenmagnets

Haven't tried again yet. The previous test took 23 hours so I'm weary to start another test on a whim. But looks like the main difference between your parameters and mine is max_model_len=4096, where as I had mine set to 200,000.

GadflyII

Owner Jan 25

•

edited Jan 25

That doesn't sound right. It should take 2-5min per model, not 23 hours; use my script and see if it helps.

I re-ran the test with max_model_len=200000. I had to set the KV to FP8 and use both GPU's (TP=2); so it is not apples to apples to the first run; but even with the KV cache set to FP8, the diffrences between the two runs are under 1%.

total test time, for both models, was under 5 min.

Results (200K context, TP=2, FP8 KV cache):                                                                                                                                           
  ┌───────┬──────────┬───────────────┐                                                                                                                                                  
  │ Model │ Accuracy │ Correct/Total │                                                                                                                                                  
  ├───────┼──────────┼───────────────┤                                                                                                                                                  
  │ BF16  │ 24.54%   │ 2953/12032    │                                                                                                                                                  
  ├───────┼──────────┼───────────────┤                                                                                                                                                  
  │ NVFP4 │ 23.56%   │ 2835/12032    │                                                                                                                                                  
  ├───────┼──────────┼───────────────┤                                                                                                                                                  
  │ Δ     │ -0.98%   │ -118          │                                                                                                                                                  
  └───────┴──────────┴───────────────┘                                                                                                                                                  
  By Category:                                                                                                                                                                          
  ┌─────────────────┬────────┬────────┬────────┐                                                                                                                                        
  │    Category     │  BF16  │ NVFP4  │   Δ    │                                                                                                                                        
  ├─────────────────┼────────┼────────┼────────┤                                                                                                                                        
  │ Social Sciences │ 32.99% │ 31.14% │ -1.85% │                                                                                                                                        
  ├─────────────────┼────────┼────────┼────────┤                                                                                                                                        
  │ Other           │ 31.46% │ 30.48% │ -0.98% │                                                                                                                                        
  ├─────────────────┼────────┼────────┼────────┤                                                                                                                                        
  │ Humanities      │ 23.27% │ 22.01% │ -1.26% │                                                                                                                                        
  ├─────────────────┼────────┼────────┼────────┤                                                                                                                                        
  │ STEM            │ 19.43% │ 18.90% │ -0.53% │                                                                                                                                        
  └─────────────────┴────────┴────────┴────────┘  

MMLU-Pro by Subject (200K context, TP=2, FP8 KV cache):                                                                                                                               
  ┌──────────────────┬────────┬────────┬────────┬───────────┐                                                                                                                           
  │     Subject      │  BF16  │ NVFP4  │   Δ    │ Questions │                                                                                                                           
  ├──────────────────┼────────┼────────┼────────┼───────────┤                                                                                                                           
  │ Biology          │ 50.63% │ 47.00% │ -3.63% │ 717       │                                                                                                                           
  ├──────────────────┼────────┼────────┼────────┼───────────┤                                                                                                                           
  │ Psychology       │ 45.74% │ 41.48% │ -4.26% │ 798       │                                                                                                                           
  ├──────────────────┼────────┼────────┼────────┼───────────┤                                                                                                                           
  │ Economics        │ 36.37% │ 33.65% │ -2.72% │ 844       │                                                                                                                           
  ├──────────────────┼────────┼────────┼────────┼───────────┤                                                                                                                           
  │ Health           │ 34.72% │ 33.62% │ -1.10% │ 818       │                                                                                                                           
  ├──────────────────┼────────┼────────┼────────┼───────────┤                                                                                                                           
  │ History          │ 34.65% │ 30.97% │ -3.68% │ 381       │                                                                                                                           
  ├──────────────────┼────────┼────────┼────────┼───────────┤                                                                                                                           
  │ Philosophy       │ 30.06% │ 27.86% │ -2.20% │ 499       │                                                                                                                           
  ├──────────────────┼────────┼────────┼────────┼───────────┤                                                                                                                           
  │ Other            │ 28.57% │ 27.71% │ -0.86% │ 924       │                                                                                                                           
  ├──────────────────┼────────┼────────┼────────┼───────────┤                                                                                                                           
  │ Computer Science │ 24.15% │ 20.98% │ -3.17% │ 410       │                                                                                                                           
  ├──────────────────┼────────┼────────┼────────┼───────────┤                                                                                                                           
  │ Business         │ 18.00% │ 18.00% │ 0.00%  │ 789       │                                                                                                                           
  ├──────────────────┼────────┼────────┼────────┼───────────┤                                                                                                                           
  │ Law              │ 16.26% │ 16.26% │ 0.00%  │ 1101      │                                                                                                                           
  ├──────────────────┼────────┼────────┼────────┼───────────┤                                                                                                                           
  │ Engineering      │ 15.27% │ 15.27% │ 0.00%  │ 969       │                                                                                                                           
  ├──────────────────┼────────┼────────┼────────┼───────────┤                                                                                                                           
  │ Physics          │ 14.78% │ 15.09% │ +0.31% │ 1299      │                                                                                                                           
  ├──────────────────┼────────┼────────┼────────┼───────────┤                                                                                                                           
  │ Math             │ 13.84% │ 13.69% │ -0.15% │ 1351      │                                                                                                                           
  ├──────────────────┼────────┼────────┼────────┼───────────┤                                                                                                                           
  │ Chemistry        │ 13.52% │ 14.05% │ +0.53% │ 1132      │                                                                                                                           
  └──────────────────┴────────┴────────┴────────┴───────────┘     

Overall Comparison: Short Context vs Long Context                                                                                                                                   
  ┌───────┬────────────────────────────┬─────────────────────────────┬────────┐                                                                                                         
  │ Model │ 4K Context (TP=1, BF16 KV) │ 200K Context (TP=2, FP8 KV) │   Δ    │                                                                                                         
  ├───────┼────────────────────────────┼─────────────────────────────┼────────┤                                                                                                         
  │ BF16  │ 24.83%                     │ 24.54%                      │ -0.29% │                                                                                                         
  ├───────┼────────────────────────────┼─────────────────────────────┼────────┤                                                                                                         
  │ NVFP4 │ 23.55%                     │ 23.56%                      │ +0.01% │                                                                                                         
  └───────┴────────────────────────────┴─────────────────────────────┴────────┘                                                                                                         
  ---                                                                                                                                                                                   
  By Subject - Full Comparison:                                                                                                                                                         
  ┌──────────────────┬───────────┬─────────────┬────────┬────────────┬──────────────┬────────┐                                                                                          
  │     Subject      │ BF16 (4K) │ BF16 (200K) │   Δ    │ NVFP4 (4K) │ NVFP4 (200K) │   Δ    │                                                                                          
  ├──────────────────┼───────────┼─────────────┼────────┼────────────┼──────────────┼────────┤                                                                                          
  │ Biology          │ 50.35%    │ 50.63%      │ +0.28% │ 47.42%     │ 47.00%       │ -0.42% │                                                                                          
  ├──────────────────┼───────────┼─────────────┼────────┼────────────┼──────────────┼────────┤                                                                                          
  │ Psychology       │ 44.99%    │ 45.74%      │ +0.75% │ 42.48%     │ 41.48%       │ -1.00% │                                                                                          
  ├──────────────────┼───────────┼─────────────┼────────┼────────────┼──────────────┼────────┤                                                                                          
  │ Economics        │ 36.37%    │ 36.37%      │ 0.00%  │ 34.48%     │ 33.65%       │ -0.83% │                                                                                          
  ├──────────────────┼───────────┼─────────────┼────────┼────────────┼──────────────┼────────┤                                                                                          
  │ Health           │ 35.21%    │ 34.72%      │ -0.49% │ 34.84%     │ 33.62%       │ -1.22% │                                                                                          
  ├──────────────────┼───────────┼─────────────┼────────┼────────────┼──────────────┼────────┤                                                                                          
  │ History          │ 33.60%    │ 34.65%      │ +1.05% │ 30.71%     │ 30.97%       │ +0.26% │                                                                                          
  ├──────────────────┼───────────┼─────────────┼────────┼────────────┼──────────────┼────────┤                                                                                          
  │ Philosophy       │ 31.46%    │ 30.06%      │ -1.40% │ 30.06%     │ 27.86%       │ -2.20% │                                                                                          
  ├──────────────────┼───────────┼─────────────┼────────┼────────────┼──────────────┼────────┤                                                                                          
  │ Other            │ 28.35%    │ 28.57%      │ +0.22% │ 25.87%     │ 27.71%       │ +1.84% │                                                                                          
  ├──────────────────┼───────────┼─────────────┼────────┼────────────┼──────────────┼────────┤                                                                                          
  │ Computer Science │ 26.10%    │ 24.15%      │ -1.95% │ 21.46%     │ 20.98%       │ -0.48% │                                                                                          
  ├──────────────────┼───────────┼─────────────┼────────┼────────────┼──────────────┼────────┤                                                                                          
  │ Business         │ 16.35%    │ 16.48%      │ +0.13% │ 16.98%     │ 18.00%       │ +1.02% │                                                                                          
  ├──────────────────┼───────────┼─────────────┼────────┼────────────┼──────────────┼────────┤                                                                                          
  │ Law              │ 16.89%    │ 16.26%      │ -0.63% │ 16.35%     │ 16.26%       │ -0.09% │                                                                                          
  ├──────────────────┼───────────┼─────────────┼────────┼────────────┼──────────────┼────────┤                                                                                          
  │ Engineering      │ 16.00%    │ 15.27%      │ -0.73% │ 14.04%     │ 15.27%       │ +1.23% │                                                                                          
  ├──────────────────┼───────────┼─────────────┼────────┼────────────┼──────────────┼────────┤                                                                                          
  │ Physics          │ 15.32%    │ 14.78%      │ -0.54% │ 14.70%     │ 15.09%       │ +0.39% │                                                                                          
  ├──────────────────┼───────────┼─────────────┼────────┼────────────┼──────────────┼────────┤                                                                                          
  │ Math             │ 14.06%    │ 13.84%      │ -0.22% │ 14.29%     │ 13.69%       │ -0.60% │                                                                                          
  ├──────────────────┼───────────┼─────────────┼────────┼────────────┼──────────────┼────────┤                                                                                          
  │ Chemistry        │ 14.13%    │ 13.52%      │ -0.61% │ 13.34%     │ 14.05%       │ +0.71% │                                                                                          
  └──────────────────┴───────────┴─────────────┴────────┴────────────┴──────────────┴────────┘

lokimando25

Jan 25

Please give more details how to get it run , I tried existing and fresh install on ubuntu server vm . no luck :(

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment