Spaces:
Running
Running
| <html lang="en"> | |
| <head> | |
| <meta charset="UTF-8"> | |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> | |
| <title>Zhiyin</title> | |
| <link rel="stylesheet" href="styles.css?v=3"> | |
| <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap" rel="stylesheet"> | |
| <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0/css/all.min.css"> | |
| </head> | |
| <body> | |
| <div class="container"> | |
| <header class="header"> | |
| <h1><i class="fas fa-feather"></i> Zhiyin</h1> | |
| <p>Exploring the Frontier of Chinese LLM Writing</p> | |
| </header> | |
| <div class="dashboard card"> | |
| <section class="overview-section"> | |
| <h2 class="section-title"> | |
| <span class="accent-bar"></span> | |
| Benchmark Overview | |
| <span class="section-title-spacer"></span> | |
| <span class="external-links-text"> | |
| <a href="https://github.com/zake7749/Chinese-Writing-Bench" target="_blank" rel="noopener" | |
| class="external-link-text">GitHub</a> | |
| <span class="divider">•</span> | |
| <a href="https://huggingface.co/datasets/zake7749/chinese-writing-benchmark" target="_blank" | |
| rel="noopener" class="external-link-text">Hugging Face</a> | |
| </span> | |
| </h2> | |
| <div class="overview-content"> | |
| <div class="overview-text"> | |
| <p> | |
| <strong>Zhiyin</strong> is an LLM-as-a-judge benchmark for Chinese writing evaluation, | |
| featuring 280 test cases across 18 diverse writing tasks in this V1 release. | |
| </p> | |
| <p> | |
| Our method relies on <strong>pairwise comparison</strong>. A powerful language model acts as | |
| the judge, scoring a model's response relative to a fixed baseline (GPT-4.1), which is | |
| anchored at a score of 5. | |
| </p> | |
| <h4>Scoring System</h4> | |
| <p>The judge assigns the model's response an integer score from 0 to 10, where:</p> | |
| <ul> | |
| <li>A score > 5 indicates the response is <strong>superior</strong> to the baseline.</li> | |
| <li>A score = 5 indicates the response is <strong>on par</strong> with the baseline.</li> | |
| <li>A score < 5 indicates the response is <strong>inferior</strong> to the baseline.</li> | |
| </ul> | |
| <h4>Evaluation Dimensions</h4> | |
| <p>To ensure a comprehensive analysis, the final score is informed by a multi-dimensional | |
| assessment. The judge evaluates the response across six key criteria:</p> | |
| <ol> | |
| <li><strong>Comprehension & Relevance:</strong> How well the response understands the | |
| prompt's intent and stays on topic.</li> | |
| <li><strong>Structure & Coherence:</strong> How clear, logical, and well-organized the | |
| writing is.</li> | |
| <li><strong>Prose & Style:</strong> The quality of the language, grammar, and adherence to | |
| the requested tone.</li> | |
| <li><strong>Creativity & Originality:</strong> The novelty of the ideas and the uniqueness | |
| of the perspective.</li> | |
| <li><strong>Depth & Insight:</strong> The level of detail, analysis, and substance provided. | |
| </li> | |
| <li><strong>Helpfulness:</strong> How effectively the response fulfills the user's overall | |
| goal.</li> | |
| </ol> | |
| </div> | |
| </div> | |
| </section> | |
| <section class="dashboard-controls"> | |
| <div class="controls-header"> | |
| <div class="judge-selector"> | |
| <span class="control-label">Judge Model</span> | |
| <div class="judge-toggle"> | |
| <button class="judge-btn active" data-judge="gpt5.4">GPT-5.4</button> | |
| <button class="judge-btn" data-judge="o3">O3</button> | |
| </div> | |
| </div> | |
| <div class="search-container"> | |
| <i class="fas fa-search search-icon"></i> | |
| <input type="text" id="globalSearch" class="search-input" placeholder="Search models..."> | |
| </div> | |
| </div> | |
| <div class="tabs-container"> | |
| <div class="tabs-header"> | |
| <button class="tab-btn active" data-target="generalTableSection">All Writing Tasks</button> | |
| <button class="tab-btn" data-target="complicatedTableSection">Complicated Writing Tasks</button> | |
| </div> | |
| </div> | |
| </section> | |
| <section class="tab-content active" id="generalTableSection"> | |
| <div id="generalTable" class="table-container"> | |
| <!-- General Writing table will be populated here --> | |
| </div> | |
| </section> | |
| <section class="tab-content" id="complicatedTableSection"> | |
| <div id="complicatedTable" class="table-container"> | |
| <!-- Complicated Writing table will be populated here --> | |
| </div> | |
| </section> | |
| <section class="citation-section"> | |
| <h2 class="section-title"><span class="accent-bar"></span>Citation</h2> | |
| <div class="citation-content"> | |
| <p> | |
| If you use these results, please cite our paper:<br> | |
| <em>"Zhiyin: Exploring the Frontier of Chinese LLM Writing, 2025. | |
| https://github.com/zake7749/Chinese-Writing-Bench"</em> | |
| </p> | |
| </div> | |
| </section> | |
| </div> | |
| <div id="loading" class="loading"> | |
| <div class="spinner"></div> | |
| <p>Loading benchmark data...</p> | |
| </div> | |
| </div> | |
| <script src="script.js"></script> | |
| </body> | |
| </html> |