danielrosehill's picture
Create comprehensive Hugging Face Space for Single-Shot Brevity Training experiment
50271b5
<!doctype html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width" />
<title>Single-Shot Brevity Training | LLM Response Optimization</title>
<link rel="stylesheet" href="style.css" />
</head>
<body>
<div class="container">
<header>
<h1>Single-Shot Brevity Training</h1>
<p class="subtitle">Using One Example to Train LLMs for Informational Brevity</p>
<div class="links">
<a href="https://github.com/danielrosehill/Single-Shot-Brevity-Training" target="_blank" class="btn">View on GitHub</a>
</div>
</header>
<section class="card">
<h2>The Problem</h2>
<p>Large Language Models often generate excessively verbose responses, even when concise, informative answers would be more valuable. This experiment explores a simple yet effective approach to guide models toward brevity without sacrificing information quality.</p>
</section>
<section class="card">
<h2>The Approach</h2>
<p>Rather than abstract instructions like "be concise," this framework uses <strong>single-shot training</strong>: demonstrating the desired format with one concrete example in the system prompt.</p>
<h3>Two-Phase Methodology</h3>
<div class="phase">
<h4>Phase 1: Baseline Evaluation</h4>
<p>Tested 14 models using a standardized product recommendation prompt (power bank selection) without any brevity instructions to establish natural response lengths.</p>
</div>
<div class="phase">
<h4>Phase 2: Single-Shot Training</h4>
<p>Selected models received system prompts containing one optimized response example to guide future outputs toward similar brevity.</p>
</div>
</section>
<section class="card highlight">
<h2>Key Findings</h2>
<div class="stat-grid">
<div class="stat">
<div class="stat-number">5.5x</div>
<div class="stat-label">Difference between longest and shortest responses</div>
</div>
<div class="stat">
<div class="stat-number">794</div>
<div class="stat-label">Mean response length (words)</div>
</div>
<div class="stat">
<div class="stat-number">60-75%</div>
<div class="stat-label">Word reduction in optimized examples</div>
</div>
</div>
<h3>Model Response Length Comparison</h3>
<div class="chart-container">
<img src="verbosity_bar_chart.png" alt="Bar chart comparing word counts across 14 LLM models" class="chart-image">
<p class="chart-caption">Comparison of response lengths across 14 evaluated models</p>
</div>
<h3>Comprehensive Verbosity Analysis</h3>
<div class="chart-container">
<img src="verbosity_analysis.png" alt="Four-panel analysis of response verbosity characteristics" class="chart-image">
<p class="chart-caption">Multi-faceted examination of response characteristics and patterns</p>
</div>
<h3>Response Length Variation</h3>
<ul>
<li><strong>Longest:</strong> 1,632 words (OpenAI GPT-OSS-120B)</li>
<li><strong>Shortest:</strong> 295 words (AI21 Jamba Large)</li>
<li><strong>Standard deviation:</strong> 456 words</li>
</ul>
<h3>Most Concise Performers</h3>
<ol class="model-list">
<li><strong>AI21 Jamba Large</strong> - 295 words</li>
<li><strong>Mistral Large</strong> - 352 words</li>
<li><strong>Meta Llama 4 Maverick</strong> - 397 words</li>
</ol>
<h3>Most Verbose Performers</h3>
<ol class="model-list">
<li><strong>OpenAI GPT-OSS-120B</strong> - 1,632 words</li>
<li><strong>Google Gemini 2.5 Flash</strong> - 1,607 words</li>
</ol>
</section>
<section class="card">
<h2>Repository Contents</h2>
<ul>
<li><strong>Raw Response Data:</strong> Complete baseline outputs from all tested models</li>
<li><strong>Optimized Examples:</strong> Demonstrating ideal brevity (60-75% word reduction)</li>
<li><strong>Model-Specific System Prompts:</strong> Implementing single-shot training for practical application</li>
<li><strong>Statistical Analysis:</strong> Comprehensive comparison of response lengths and patterns</li>
</ul>
</section>
<section class="card">
<h2>Practical Applications</h2>
<p>This approach offers several benefits for LLM deployment:</p>
<ul>
<li><strong>Cost Reduction:</strong> Shorter responses mean fewer output tokens and lower API costs</li>
<li><strong>User Experience:</strong> Concise responses are faster to read and process</li>
<li><strong>Efficiency:</strong> One example is simpler than complex prompt engineering</li>
<li><strong>Reusability:</strong> The framework can be adapted to different use cases and domains</li>
</ul>
</section>
<section class="card">
<h2>Get Involved</h2>
<p>This is an open experiment exploring effective LLM training techniques. The repository includes all data, prompts, and analysis for transparency and reproducibility.</p>
<div class="links">
<a href="https://github.com/danielrosehill/Single-Shot-Brevity-Training" target="_blank" class="btn btn-primary">Explore the Repository</a>
<a href="https://github.com/danielrosehill/Single-Shot-Brevity-Training/issues" target="_blank" class="btn">Share Feedback</a>
</div>
</section>
<footer>
<p>Created by <a href="https://danielrosehill.com" target="_blank">Daniel Rosehill</a></p>
<p>Part of ongoing research in LLM optimization and prompt engineering</p>
</footer>
</div>
</body>
</html>