Spaces:

pbhappliedsystems
/

quant-eval-agent-arena

Running on Zero

Apply for a GPU community grant: Company project

Owner May 12

quant-eval Agent Arena is a public Hugging Face Space by PBH Applied Systems that demonstrates side-by-side evaluation of quantized open-weight LLMs for agentic workflows.

The Space lets users compare two GGUF-based agents on the same prompt, view streaming ReAct-

style traces, inspect pre-computed quant_eval behavioral scores, and review a methodology tab explaining how the models were evaluated across production-relevant task families such as reasoning, instruction following, structured output, tool dispatch, JSON generation, stateful follow-up, and coherence.

The project is designed to make practical LLM evaluation more transparent. Instead of presenting only chatbot outputs, the Space shows how different quantized models behave inside agent-style workflows and connects those behaviors to published model cards and evaluation results. The goal is to help developers, students, researchers, and small teams understand which open-weight models are suitable for specific deployment patterns before investing in infrastructure.

The current Space runs on ZeroGPU, but it reaches the daily quota quickly because each public comparison loads and runs two GGUF agents. If the Space gains community attention, users will hit the quota limit repeatedly, which prevents meaningful exploration and weakens the public educational value of the demo.

A community GPU grant would allow the Space to remain usable for visitors, support more reliable public demonstrations, and make the evaluation methodology accessible to the Hugging Face community. The project is public-facing, built with Gradio, uses open-weight model artifacts hosted on Hugging Face, and is intended to promote transparent, production-oriented evaluation of quantized LLMs and agentic AI workflows.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment