Spaces:

zake7749
/

Chinese-Writing-Bench

Running

App Files Files Community

Chinese-Writing-Bench / index.html

justin-ailabs

UI Overhaul: Premium Slate Aesthetic, Tabbed Interface, and Grid Refinements

384e89d 4 months ago

Raw

History Blame Contribute Delete

6.68 kB

	<!DOCTYPE html>
	<html lang="en">

	<head>
	<meta charset="UTF-8">
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	<title>Zhiyin</title>
	<link rel="stylesheet" href="styles.css?v=3">
	<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap" rel="stylesheet">
	<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0/css/all.min.css">
	</head>

	<body>
	<div class="container">
	<header class="header">
	<h1><i class="fas fa-feather"></i> Zhiyin</h1>
	<p>Exploring the Frontier of Chinese LLM Writing</p>
	</header>

	<div class="dashboard card">
	<section class="overview-section">
	<h2 class="section-title">
	<span class="accent-bar"></span>
	Benchmark Overview
	<span class="section-title-spacer"></span>
	<span class="external-links-text">
	<a href="https://github.com/zake7749/Chinese-Writing-Bench" target="_blank" rel="noopener"
	class="external-link-text">GitHub</a>
	<span class="divider">•</span>
	<a href="https://huggingface.co/datasets/zake7749/chinese-writing-benchmark" target="_blank"
	rel="noopener" class="external-link-text">Hugging Face</a>
	</span>
	</h2>
	<div class="overview-content">
	<div class="overview-text">
	<p>
	<strong>Zhiyin</strong> is an LLM-as-a-judge benchmark for Chinese writing evaluation,
	featuring 280 test cases across 18 diverse writing tasks in this V1 release.
	</p>
	<p>
	Our method relies on <strong>pairwise comparison</strong>. A powerful language model acts as
	the judge, scoring a model's response relative to a fixed baseline (GPT-4.1), which is
	anchored at a score of 5.
	</p>

	<h4>Scoring System</h4>
	<p>The judge assigns the model's response an integer score from 0 to 10, where:</p>
	<ul>
	<li>A score > 5 indicates the response is <strong>superior</strong> to the baseline.</li>
	<li>A score = 5 indicates the response is <strong>on par</strong> with the baseline.</li>
	<li>A score < 5 indicates the response is <strong>inferior</strong> to the baseline.</li>
	</ul>

	<h4>Evaluation Dimensions</h4>
	<p>To ensure a comprehensive analysis, the final score is informed by a multi-dimensional
	assessment. The judge evaluates the response across six key criteria:</p>
	<ol>
	<li><strong>Comprehension & Relevance:</strong> How well the response understands the
	prompt's intent and stays on topic.</li>
	<li><strong>Structure & Coherence:</strong> How clear, logical, and well-organized the
	writing is.</li>
	<li><strong>Prose & Style:</strong> The quality of the language, grammar, and adherence to
	the requested tone.</li>
	<li><strong>Creativity & Originality:</strong> The novelty of the ideas and the uniqueness
	of the perspective.</li>
	<li><strong>Depth & Insight:</strong> The level of detail, analysis, and substance provided.
	</li>
	<li><strong>Helpfulness:</strong> How effectively the response fulfills the user's overall
	goal.</li>
	</ol>
	</div>
	</div>
	</section>

	<section class="dashboard-controls">
	<div class="controls-header">
	<div class="judge-selector">
	<span class="control-label">Judge Model</span>
	<div class="judge-toggle">
	<button class="judge-btn active" data-judge="gpt5.4">GPT-5.4</button>
	<button class="judge-btn" data-judge="o3">O3</button>
	</div>
	</div>
	<div class="search-container">
	<i class="fas fa-search search-icon"></i>
	<input type="text" id="globalSearch" class="search-input" placeholder="Search models...">
	</div>
	</div>

	<div class="tabs-container">
	<div class="tabs-header">
	<button class="tab-btn active" data-target="generalTableSection">All Writing Tasks</button>
	<button class="tab-btn" data-target="complicatedTableSection">Complicated Writing Tasks</button>
	</div>
	</div>
	</section>

	<section class="tab-content active" id="generalTableSection">
	<div id="generalTable" class="table-container">
	<!-- General Writing table will be populated here -->
	</div>
	</section>

	<section class="tab-content" id="complicatedTableSection">
	<div id="complicatedTable" class="table-container">
	<!-- Complicated Writing table will be populated here -->
	</div>
	</section>

	<section class="citation-section">
	<h2 class="section-title"><span class="accent-bar"></span>Citation</h2>
	<div class="citation-content">
	<p>
	If you use these results, please cite our paper:<br>
	<em>"Zhiyin: Exploring the Frontier of Chinese LLM Writing, 2025.
	https://github.com/zake7749/Chinese-Writing-Bench"</em>
	</p>
	</div>
	</section>
	</div>

	<div id="loading" class="loading">
	<div class="spinner"></div>
	<p>Loading benchmark data...</p>
	</div>
	</div>
	<script src="script.js"></script>
	</body>

	</html>