Spaces:
Sleeping
Sleeping
Bram van Es
commited on
Commit
·
ef12530
1
Parent(s):
797ad44
bla
Browse files- ITERATIONS_REMOVAL_SUMMARY.md +189 -0
- MLM_EXPLANATION.md +190 -0
- QUICKSTART.md +91 -0
- README.md +155 -0
- SIMPLIFICATION_SUMMARY.md +129 -0
- __pycache__/app.cpython-310.pyc +0 -0
- __pycache__/app.cpython-312.pyc +0 -0
- __pycache__/config.cpython-310.pyc +0 -0
- __pycache__/config.cpython-312.pyc +0 -0
- __pycache__/launch.cpython-310.pyc +0 -0
- __pycache__/mlm_demo.cpython-310.pyc +0 -0
- __pycache__/run.cpython-310.pyc +0 -0
- __pycache__/test_app.cpython-310.pyc +0 -0
- app.py +54 -74
- color_test.html +53 -0
- demo.py +263 -0
- mlm_demo.py +199 -0
- simple_color_test.py +147 -0
- test_app.py +271 -0
- test_colors.py +198 -0
ITERATIONS_REMOVAL_SUMMARY.md
ADDED
|
@@ -0,0 +1,189 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 🎯 Iterations Removal Summary - Final Simplification
|
| 2 |
+
|
| 3 |
+
## Change Request
|
| 4 |
+
The user correctly identified that since we now **mask one token at a time** for comprehensive analysis, there's **no need for a settable number of iterations**. This final simplification removes the iterations slider for the cleanest possible interface.
|
| 5 |
+
|
| 6 |
+
## Rationale
|
| 7 |
+
|
| 8 |
+
### Why Iterations Made Sense Before
|
| 9 |
+
- **Random sampling**: When using MLM probability, we needed multiple iterations to get stable averages
|
| 10 |
+
- **Statistical variance**: Random token selection meant results could vary between runs
|
| 11 |
+
- **Confidence intervals**: Multiple iterations helped estimate uncertainty
|
| 12 |
+
|
| 13 |
+
### Why Iterations Are Unnecessary Now
|
| 14 |
+
- **Deterministic analysis**: Each token is individually masked and analyzed
|
| 15 |
+
- **Complete coverage**: All content tokens are processed in a single pass
|
| 16 |
+
- **No randomness**: Results are identical on every run
|
| 17 |
+
- **Comprehensive by design**: Single iteration gives the complete picture
|
| 18 |
+
|
| 19 |
+
## What Was Removed
|
| 20 |
+
|
| 21 |
+
### 1. Iterations Slider
|
| 22 |
+
- **Before**: User could set iterations from 1-10
|
| 23 |
+
- **After**: No slider, single automatic analysis
|
| 24 |
+
|
| 25 |
+
### 2. Iteration Logic
|
| 26 |
+
- **Before**: Loop through iterations, calculate averages
|
| 27 |
+
- **After**: Direct single-pass calculation
|
| 28 |
+
|
| 29 |
+
### 3. Statistical Averaging
|
| 30 |
+
- **Before**: Average perplexity across multiple random samples
|
| 31 |
+
- **After**: Direct perplexity calculation from comprehensive analysis
|
| 32 |
+
|
| 33 |
+
## Code Changes Made
|
| 34 |
+
|
| 35 |
+
### Function Signatures Simplified
|
| 36 |
+
```python
|
| 37 |
+
# OLD
|
| 38 |
+
def calculate_decoder_perplexity(text, model, tokenizer, iterations=1)
|
| 39 |
+
def calculate_encoder_perplexity(text, model, tokenizer, iterations=1)
|
| 40 |
+
def process_text(text, model_name, model_type, iterations)
|
| 41 |
+
|
| 42 |
+
# NEW
|
| 43 |
+
def calculate_decoder_perplexity(text, model, tokenizer)
|
| 44 |
+
def calculate_encoder_perplexity(text, model, tokenizer)
|
| 45 |
+
def process_text(text, model_name, model_type)
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
### Decoder Model Changes
|
| 49 |
+
- **Before**: Multiple forward passes, average the losses
|
| 50 |
+
- **After**: Single forward pass, direct perplexity calculation
|
| 51 |
+
- **Result**: Faster and equally accurate
|
| 52 |
+
|
| 53 |
+
### Encoder Model Changes
|
| 54 |
+
- **Before**: Multiple iterations of random masking + averaging
|
| 55 |
+
- **After**: Single comprehensive pass masking each token
|
| 56 |
+
- **Result**: More accurate and deterministic
|
| 57 |
+
|
| 58 |
+
### UI Changes
|
| 59 |
+
- **Removed**: Iterations slider and related controls
|
| 60 |
+
- **Simplified**: Function calls and event handlers
|
| 61 |
+
- **Cleaner**: Examples no longer include iterations parameter
|
| 62 |
+
|
| 63 |
+
## Performance Impact
|
| 64 |
+
|
| 65 |
+
### Decoder Models (GPT, etc.)
|
| 66 |
+
- ✅ **Faster**: No redundant iterations
|
| 67 |
+
- ✅ **Same accuracy**: Single pass gives true perplexity
|
| 68 |
+
- ✅ **Deterministic**: Consistent results every time
|
| 69 |
+
|
| 70 |
+
### Encoder Models (BERT, etc.)
|
| 71 |
+
- ✅ **More accurate**: Every token analyzed vs. random sampling
|
| 72 |
+
- ✅ **Deterministic**: No statistical variance
|
| 73 |
+
- ✅ **Comprehensive**: Complete picture in single pass
|
| 74 |
+
- ⚠️ **Slightly slower**: But more thorough analysis
|
| 75 |
+
|
| 76 |
+
## User Experience
|
| 77 |
+
|
| 78 |
+
### Before (Confusing)
|
| 79 |
+
1. Enter text
|
| 80 |
+
2. Choose model
|
| 81 |
+
3. Adjust iterations (why?)
|
| 82 |
+
4. Analyze
|
| 83 |
+
5. Wonder if more iterations would be better
|
| 84 |
+
|
| 85 |
+
### After (Simple)
|
| 86 |
+
1. Enter text
|
| 87 |
+
2. Choose model
|
| 88 |
+
3. Analyze
|
| 89 |
+
4. Get complete results immediately
|
| 90 |
+
|
| 91 |
+
## Technical Benefits
|
| 92 |
+
|
| 93 |
+
### 1. **Deterministic Results**
|
| 94 |
+
- Same input always produces same output
|
| 95 |
+
- No statistical variance to worry about
|
| 96 |
+
- Reproducible for research and debugging
|
| 97 |
+
|
| 98 |
+
### 2. **Optimal Performance**
|
| 99 |
+
- No wasted computation on redundant iterations
|
| 100 |
+
- Single comprehensive pass is most efficient
|
| 101 |
+
- Faster for decoder models, more thorough for encoder models
|
| 102 |
+
|
| 103 |
+
### 3. **Cleaner Codebase**
|
| 104 |
+
- Simpler function signatures
|
| 105 |
+
- Less parameter validation
|
| 106 |
+
- Fewer edge cases to handle
|
| 107 |
+
|
| 108 |
+
### 4. **Better User Understanding**
|
| 109 |
+
- Clear 1:1 relationship between input and output
|
| 110 |
+
- No abstract "iterations" concept to explain
|
| 111 |
+
- Results are intuitive and immediate
|
| 112 |
+
|
| 113 |
+
## Interface Comparison
|
| 114 |
+
|
| 115 |
+
### Complex Interface (Before)
|
| 116 |
+
```
|
| 117 |
+
Text: [input box]
|
| 118 |
+
Model: [dropdown]
|
| 119 |
+
Model Type: [decoder/encoder]
|
| 120 |
+
Iterations: [1-10 slider] ← Removed
|
| 121 |
+
MLM Probability: [0.1-0.5 slider] ← Already removed
|
| 122 |
+
[Analyze Button]
|
| 123 |
+
```
|
| 124 |
+
|
| 125 |
+
### Simple Interface (After)
|
| 126 |
+
```
|
| 127 |
+
Text: [input box]
|
| 128 |
+
Model: [dropdown]
|
| 129 |
+
Model Type: [decoder/encoder]
|
| 130 |
+
[Analyze Button]
|
| 131 |
+
```
|
| 132 |
+
|
| 133 |
+
## What Users Gain
|
| 134 |
+
|
| 135 |
+
### 1. **Simplicity**
|
| 136 |
+
- Minimal cognitive load
|
| 137 |
+
- No parameters to tune
|
| 138 |
+
- Immediate results
|
| 139 |
+
|
| 140 |
+
### 2. **Confidence**
|
| 141 |
+
- Results are comprehensive, not sampled
|
| 142 |
+
- No wondering about "optimal" iteration count
|
| 143 |
+
- Deterministic and reproducible
|
| 144 |
+
|
| 145 |
+
### 3. **Speed**
|
| 146 |
+
- Faster workflow (fewer clicks)
|
| 147 |
+
- No time wasted on parameter adjustment
|
| 148 |
+
- Direct path to insights
|
| 149 |
+
|
| 150 |
+
## Files Modified
|
| 151 |
+
|
| 152 |
+
1. **`app.py`**: Removed iterations parameter throughout
|
| 153 |
+
2. **`config.py`**: Removed iterations from examples and settings
|
| 154 |
+
3. **`README.md`**: Updated documentation
|
| 155 |
+
4. **`QUICKSTART.md`**: Simplified instructions
|
| 156 |
+
|
| 157 |
+
## Migration Notes
|
| 158 |
+
|
| 159 |
+
### For Users
|
| 160 |
+
- **Old workflow**: Text → Model → Iterations → Analyze
|
| 161 |
+
- **New workflow**: Text → Model → Analyze
|
| 162 |
+
- **Result**: Same quality, much simpler
|
| 163 |
+
|
| 164 |
+
### For Developers
|
| 165 |
+
- Function signatures simplified (no iterations parameter)
|
| 166 |
+
- No iteration loops in core functions
|
| 167 |
+
- Single-pass algorithms throughout
|
| 168 |
+
|
| 169 |
+
## Final State
|
| 170 |
+
|
| 171 |
+
The PerplexityViewer is now **maximally simplified**:
|
| 172 |
+
|
| 173 |
+
- ✅ **No MLM probability slider** (comprehensive token analysis)
|
| 174 |
+
- ✅ **No iterations slider** (single-pass analysis)
|
| 175 |
+
- ✅ **Clean interface** (text → model → analyze)
|
| 176 |
+
- ✅ **Deterministic results** (same input = same output)
|
| 177 |
+
- ✅ **Comprehensive analysis** (all tokens processed)
|
| 178 |
+
|
| 179 |
+
## Result
|
| 180 |
+
|
| 181 |
+
The app now has the **simplest possible interface** while providing **the most comprehensive analysis**. This is exactly what good software engineering achieves: maximum functionality with minimum complexity.
|
| 182 |
+
|
| 183 |
+
### User Benefits
|
| 184 |
+
- 🎯 **Simpler**: Just text and model selection
|
| 185 |
+
- 🚀 **Faster**: Direct workflow, no parameter tuning
|
| 186 |
+
- 🔍 **Complete**: Every token analyzed thoroughly
|
| 187 |
+
- 🎨 **Clear**: Beautiful color visualization of all results
|
| 188 |
+
|
| 189 |
+
The final interface is clean, intuitive, and powerful - perfect for exploring perplexity patterns in text! 🎉
|
MLM_EXPLANATION.md
ADDED
|
@@ -0,0 +1,190 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 🎭 MLM Probability Fix - Complete Documentation
|
| 2 |
+
|
| 3 |
+
## Issue Identified
|
| 4 |
+
The user correctly observed that **changing the MLM probability did not affect the results at all** in the encoder model visualization. This was a significant bug in how the MLM probability parameter was being used.
|
| 5 |
+
|
| 6 |
+
## Root Cause Analysis
|
| 7 |
+
|
| 8 |
+
### What Was Wrong
|
| 9 |
+
The MLM probability setting had two separate effects that were not properly connected:
|
| 10 |
+
|
| 11 |
+
1. **Average Perplexity Calculation** ✅ (Working correctly)
|
| 12 |
+
- Used random masking with the specified MLM probability
|
| 13 |
+
- Affected the summary statistic shown to the user
|
| 14 |
+
|
| 15 |
+
2. **Per-Token Visualization** ❌ (Bug was here)
|
| 16 |
+
- Always masked each token individually
|
| 17 |
+
- Completely ignored the MLM probability setting
|
| 18 |
+
- This meant changing MLM probability had no visual effect
|
| 19 |
+
|
| 20 |
+
### The Disconnect
|
| 21 |
+
```python
|
| 22 |
+
# OLD CODE - MLM probability was ignored for visualization
|
| 23 |
+
for i in range(len(tokens)):
|
| 24 |
+
if not special_token:
|
| 25 |
+
# ALWAYS calculated detailed perplexity for every token
|
| 26 |
+
masked_input[0, i] = tokenizer.mask_token_id
|
| 27 |
+
# ... calculate perplexity
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
## The Fix
|
| 31 |
+
|
| 32 |
+
### 1. Made MLM Probability Affect Visualization
|
| 33 |
+
Now the MLM probability controls which tokens get detailed analysis:
|
| 34 |
+
|
| 35 |
+
```python
|
| 36 |
+
# NEW CODE - MLM probability affects visualization
|
| 37 |
+
for i in range(len(tokens)):
|
| 38 |
+
if not special_token:
|
| 39 |
+
if torch.rand(1).item() < mlm_probability: # ✅ Now respects MLM prob
|
| 40 |
+
# Calculate detailed perplexity for this token
|
| 41 |
+
masked_input[0, i] = tokenizer.mask_token_id
|
| 42 |
+
# ... calculate detailed perplexity
|
| 43 |
+
else:
|
| 44 |
+
# Use baseline perplexity for non-analyzed tokens
|
| 45 |
+
token_perplexities.append(2.0) # Neutral baseline
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
### 2. Visual Distinction
|
| 49 |
+
- **Analyzed tokens**: Colored by actual perplexity (green/yellow/red)
|
| 50 |
+
- **Non-analyzed tokens**: Gray color with baseline perplexity
|
| 51 |
+
- **Tooltip**: Shows whether token was analyzed or not
|
| 52 |
+
|
| 53 |
+
### 3. Clear User Feedback
|
| 54 |
+
- Summary now shows: `MLM Probability: 0.15 (3/8 tokens analyzed in detail)`
|
| 55 |
+
- Legend updated: `🟢 Low → 🟡 Medium → 🔴 High → ⚫ Not analyzed`
|
| 56 |
+
- Improved help text: "Probability of detailed analysis per token"
|
| 57 |
+
|
| 58 |
+
## How It Works Now
|
| 59 |
+
|
| 60 |
+
### Low MLM Probability (0.15)
|
| 61 |
+
```
|
| 62 |
+
Input: "The capital of France is Paris"
|
| 63 |
+
Result: Only ~15% of tokens get detailed analysis
|
| 64 |
+
Visualization: Mostly gray tokens with a few colored ones
|
| 65 |
+
Effect: Fast analysis, matches BERT training conditions
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
### High MLM Probability (0.5)
|
| 69 |
+
```
|
| 70 |
+
Input: "The capital of France is Paris"
|
| 71 |
+
Result: ~50% of tokens get detailed analysis
|
| 72 |
+
Visualization: More colored tokens, fewer gray ones
|
| 73 |
+
Effect: More comprehensive but slower analysis
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
## User Experience Improvements
|
| 77 |
+
|
| 78 |
+
### Before the Fix
|
| 79 |
+
- User changes MLM probability from 0.15 → 0.5
|
| 80 |
+
- No visual change in token colors
|
| 81 |
+
- Only summary statistic changed (confusing!)
|
| 82 |
+
|
| 83 |
+
### After the Fix
|
| 84 |
+
- User changes MLM probability from 0.15 → 0.5
|
| 85 |
+
- More tokens become colored (analyzed)
|
| 86 |
+
- Fewer tokens remain gray (non-analyzed)
|
| 87 |
+
- Summary shows token count: "(3/8 tokens analyzed)"
|
| 88 |
+
- Clear visual feedback of the parameter's effect
|
| 89 |
+
|
| 90 |
+
## Testing the Fix
|
| 91 |
+
|
| 92 |
+
### 1. Quick Test
|
| 93 |
+
Try the same text with different MLM probabilities:
|
| 94 |
+
- Text: "Machine learning algorithms require computational resources"
|
| 95 |
+
- MLM 0.2: Few colored tokens
|
| 96 |
+
- MLM 0.8: Most tokens colored
|
| 97 |
+
|
| 98 |
+
### 2. Demo Script
|
| 99 |
+
```bash
|
| 100 |
+
python mlm_demo.py
|
| 101 |
+
```
|
| 102 |
+
Shows exactly how MLM probability affects analysis.
|
| 103 |
+
|
| 104 |
+
### 3. Visual Examples
|
| 105 |
+
The app now includes example pairs:
|
| 106 |
+
- Same text with MLM 0.2 vs 0.8
|
| 107 |
+
- Shows clear visual difference
|
| 108 |
+
|
| 109 |
+
## Technical Details
|
| 110 |
+
|
| 111 |
+
### Randomness Handling
|
| 112 |
+
- Uses `torch.rand()` for consistency with PyTorch
|
| 113 |
+
- Each token gets independent random chance
|
| 114 |
+
- Reproducible with manual seeds for testing
|
| 115 |
+
|
| 116 |
+
### Baseline Perplexity
|
| 117 |
+
- Non-analyzed tokens get perplexity = 2.0
|
| 118 |
+
- This represents "neutral" confidence
|
| 119 |
+
- Avoids misleading very low/high values
|
| 120 |
+
|
| 121 |
+
### Color Mapping
|
| 122 |
+
- Analyzed tokens: Full color spectrum based on actual perplexity
|
| 123 |
+
- Non-analyzed tokens: Gray (`rgb(200, 200, 200)`)
|
| 124 |
+
- Tooltips distinguish: "Perplexity: 5.2" vs "Not analyzed"
|
| 125 |
+
|
| 126 |
+
## Performance Implications
|
| 127 |
+
|
| 128 |
+
### Lower MLM Probability (0.15)
|
| 129 |
+
- **Pros**: Faster, matches BERT training, realistic
|
| 130 |
+
- **Cons**: Sparse analysis, some tokens not evaluated
|
| 131 |
+
|
| 132 |
+
### Higher MLM Probability (0.8)
|
| 133 |
+
- **Pros**: Comprehensive analysis, more visual information
|
| 134 |
+
- **Cons**: Slower computation, unrealistic for MLM
|
| 135 |
+
|
| 136 |
+
### Recommendation
|
| 137 |
+
- **Default 0.15**: Standard BERT-like analysis
|
| 138 |
+
- **Increase to 0.3-0.5**: For more detailed exploration
|
| 139 |
+
- **Avoid >0.8**: Diminishing returns, very slow
|
| 140 |
+
|
| 141 |
+
## Impact on Model Types
|
| 142 |
+
|
| 143 |
+
### Decoder Models (GPT, etc.)
|
| 144 |
+
- **No change**: MLM probability only affects encoder models
|
| 145 |
+
- Always analyze all tokens for next-token prediction
|
| 146 |
+
|
| 147 |
+
### Encoder Models (BERT, etc.)
|
| 148 |
+
- **Major improvement**: MLM probability now has clear visual effect
|
| 149 |
+
- Users can explore different analysis depths
|
| 150 |
+
- Better understanding of model confidence patterns
|
| 151 |
+
|
| 152 |
+
## User Guidance
|
| 153 |
+
|
| 154 |
+
### When to Use Different MLM Probabilities
|
| 155 |
+
|
| 156 |
+
**0.15 (Standard)**
|
| 157 |
+
- Quick analysis
|
| 158 |
+
- Matches BERT training
|
| 159 |
+
- Good for initial exploration
|
| 160 |
+
|
| 161 |
+
**0.3-0.4 (Detailed)**
|
| 162 |
+
- More comprehensive view
|
| 163 |
+
- Better for understanding difficult texts
|
| 164 |
+
- Reasonable computation time
|
| 165 |
+
|
| 166 |
+
**0.5+ (Comprehensive)**
|
| 167 |
+
- Maximum detail
|
| 168 |
+
- Research/analysis purposes
|
| 169 |
+
- Slower but thorough
|
| 170 |
+
|
| 171 |
+
## Future Enhancements
|
| 172 |
+
|
| 173 |
+
### Possible Improvements
|
| 174 |
+
1. **Adaptive MLM**: Adjust probability based on text difficulty
|
| 175 |
+
2. **Token importance**: Prioritize content words over function words
|
| 176 |
+
3. **Interactive selection**: Let users click tokens to analyze
|
| 177 |
+
4. **Batch analysis**: Process multiple MLM probabilities simultaneously
|
| 178 |
+
|
| 179 |
+
### Configuration Options
|
| 180 |
+
The fix is fully configurable via `config.py`:
|
| 181 |
+
- Default MLM probability
|
| 182 |
+
- Min/max ranges
|
| 183 |
+
- Baseline perplexity value
|
| 184 |
+
- Color scheme for non-analyzed tokens
|
| 185 |
+
|
| 186 |
+
## Conclusion
|
| 187 |
+
|
| 188 |
+
This fix transforms the MLM probability from a "hidden parameter" that only affected summary statistics into a **visible, interactive control** that directly impacts the visualization. Users now get immediate visual feedback when adjusting MLM probability, making the parameter's purpose clear and the analysis more engaging.
|
| 189 |
+
|
| 190 |
+
The fix maintains backward compatibility while significantly improving the user experience for encoder model analysis. 🎉
|
QUICKSTART.md
ADDED
|
@@ -0,0 +1,91 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 🚀 Quick Start Guide
|
| 2 |
+
|
| 3 |
+
## Installation & Launch (3 steps)
|
| 4 |
+
|
| 5 |
+
1. **Install dependencies:**
|
| 6 |
+
```bash
|
| 7 |
+
pip install -r requirements.txt
|
| 8 |
+
```
|
| 9 |
+
|
| 10 |
+
2. **Launch the app:**
|
| 11 |
+
```bash
|
| 12 |
+
python launch.py
|
| 13 |
+
```
|
| 14 |
+
|
| 15 |
+
3. **Open your browser** to http://localhost:7860
|
| 16 |
+
|
| 17 |
+
## Alternative Launch Methods
|
| 18 |
+
|
| 19 |
+
If the above doesn't work, try these:
|
| 20 |
+
|
| 21 |
+
```bash
|
| 22 |
+
# Method 1: Full startup script
|
| 23 |
+
python run.py
|
| 24 |
+
|
| 25 |
+
# Method 2: Direct app launch
|
| 26 |
+
python app.py
|
| 27 |
+
|
| 28 |
+
# Method 3: With dependency installation
|
| 29 |
+
python run.py --install
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
## First Time Usage
|
| 33 |
+
|
| 34 |
+
1. **Enter text** in the input box (try: "The quick brown fox jumps over the lazy dog.")
|
| 35 |
+
2. **Select a model** (default: gpt2)
|
| 36 |
+
3. **Choose model type** (decoder for GPT-like, encoder for BERT-like)
|
| 37 |
+
4. **Click "Analyze"**
|
| 38 |
+
|
| 39 |
+
You'll see:
|
| 40 |
+
- 🟢 Green tokens = Low perplexity (model is confident)
|
| 41 |
+
- 🔴 Red tokens = High perplexity (model is uncertain)
|
| 42 |
+
|
| 43 |
+
## Troubleshooting
|
| 44 |
+
|
| 45 |
+
**Common Issues:**
|
| 46 |
+
|
| 47 |
+
- **"Module not found"** → Run: `pip install -r requirements.txt`
|
| 48 |
+
- **"Model download failed"** → Check internet connection
|
| 49 |
+
- **"Launch failed"** → Try: `python launch.py` or `python app.py`
|
| 50 |
+
- **Out of memory** → Use smaller models like `distilgpt2` or `distilbert-base-uncased`
|
| 51 |
+
|
| 52 |
+
**GPU Support:**
|
| 53 |
+
- Automatically uses GPU if available
|
| 54 |
+
- Falls back to CPU if no GPU found
|
| 55 |
+
|
| 56 |
+
## Example Models to Try
|
| 57 |
+
|
| 58 |
+
**Decoder (GPT-like):**
|
| 59 |
+
- `gpt2` - Standard GPT-2
|
| 60 |
+
- `distilgpt2` - Smaller, faster
|
| 61 |
+
- `microsoft/DialoGPT-small` - Conversational
|
| 62 |
+
|
| 63 |
+
**Encoder (BERT-like):**
|
| 64 |
+
- `bert-base-uncased` - Standard BERT
|
| 65 |
+
- `distilbert-base-uncased` - Smaller, faster
|
| 66 |
+
- `roberta-base` - Improved BERT
|
| 67 |
+
|
| 68 |
+
## Need Help?
|
| 69 |
+
|
| 70 |
+
Run the test suite:
|
| 71 |
+
```bash
|
| 72 |
+
python test_app.py
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
Or try the command-line demo:
|
| 76 |
+
```bash
|
| 77 |
+
python demo.py
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
**Still having issues?** Check the full README.md for detailed instructions.
|
| 81 |
+
|
| 82 |
+
## ✅ Recent Updates
|
| 83 |
+
|
| 84 |
+
**Ultra-Simplified Interface!**
|
| 85 |
+
- Removed MLM probability slider for cleaner interface
|
| 86 |
+
- Removed iterations slider - single comprehensive analysis per run
|
| 87 |
+
- Encoder models now analyze all tokens for complete results
|
| 88 |
+
- Decoder models provide single-pass perplexity calculation
|
| 89 |
+
- Tokens are properly colored by perplexity (green=confident, red=uncertain)
|
| 90 |
+
- If you see black/white tokens, try refreshing the browser
|
| 91 |
+
- Test the colors with: `python simple_color_test.py` (creates color_test.html)
|
README.md
CHANGED
|
@@ -12,3 +12,158 @@ short_description: Simple inspection of perplexity using color-gradients
|
|
| 12 |
---
|
| 13 |
|
| 14 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
| 15 |
+
|
| 16 |
+
# PerplexityViewer 📈
|
| 17 |
+
|
| 18 |
+
A Gradio-based web application for visualizing text perplexity using color-coded gradients. Perfect for understanding how confident language models are about different parts of your text.
|
| 19 |
+
|
| 20 |
+
## Features
|
| 21 |
+
|
| 22 |
+
- **Dual Model Support**: Works with both decoder models (GPT, DialoGPT) and encoder models (BERT, RoBERTa)
|
| 23 |
+
- **Interactive Visualization**: Color-coded per-token perplexity using spaCy's displaCy
|
| 24 |
+
- **Configurable Analysis**: Adjustable iterations and MLM probability settings
|
| 25 |
+
- **Real-time Processing**: Instant analysis with cached models for faster subsequent runs
|
| 26 |
+
- **Multiple Model Types**:
|
| 27 |
+
- **Decoder Models**: Calculate true perplexity for causal language models
|
| 28 |
+
- **Encoder Models**: Calculate pseudo-perplexity using masked language modeling
|
| 29 |
+
|
| 30 |
+
## How It Works
|
| 31 |
+
|
| 32 |
+
- **Red tokens**: High perplexity (model is uncertain about this token)
|
| 33 |
+
- **Green tokens**: Low perplexity (model is confident about this token)
|
| 34 |
+
- **Gradient colors**: Show varying degrees of model confidence
|
| 35 |
+
|
| 36 |
+
## Installation
|
| 37 |
+
|
| 38 |
+
1. Clone this repository or download the files
|
| 39 |
+
2. Install dependencies:
|
| 40 |
+
```bash
|
| 41 |
+
pip install -r requirements.txt
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
## Quick Start
|
| 45 |
+
|
| 46 |
+
### Option 1: Using the startup script (recommended)
|
| 47 |
+
```bash
|
| 48 |
+
python run.py
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
### Option 2: Direct launch
|
| 52 |
+
```bash
|
| 53 |
+
python app.py
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
### Option 3: With dependency installation and testing
|
| 57 |
+
```bash
|
| 58 |
+
python run.py --install --test
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
## Usage
|
| 62 |
+
|
| 63 |
+
1. **Enter your text** in the input box
|
| 64 |
+
2. **Select a model** from the dropdown or enter a custom HuggingFace model name
|
| 65 |
+
3. **Choose model type**:
|
| 66 |
+
- **Decoder**: For GPT-like models (true perplexity)
|
| 67 |
+
- **Encoder**: For BERT-like models (pseudo-perplexity via MLM)
|
| 68 |
+
4. **Adjust settings** (optional):
|
| 69 |
+
5. **Click "Analyze"** to see the results
|
| 70 |
+
|
| 71 |
+
## Supported Models
|
| 72 |
+
|
| 73 |
+
### Decoder Models (Causal LM)
|
| 74 |
+
- `gpt2`, `distilgpt2`
|
| 75 |
+
- `microsoft/DialoGPT-small`, `microsoft/DialoGPT-medium`
|
| 76 |
+
- `openai-gpt`
|
| 77 |
+
- Any HuggingFace causal language model
|
| 78 |
+
|
| 79 |
+
### Encoder Models (Masked LM)
|
| 80 |
+
- `bert-base-uncased`, `bert-base-cased`
|
| 81 |
+
- `distilbert-base-uncased`
|
| 82 |
+
- `roberta-base`
|
| 83 |
+
- `albert-base-v2`
|
| 84 |
+
- Any HuggingFace masked language model
|
| 85 |
+
|
| 86 |
+
## Understanding the Results
|
| 87 |
+
|
| 88 |
+
### Perplexity Interpretation
|
| 89 |
+
- **Lower perplexity**: Model is more confident (text is more predictable)
|
| 90 |
+
- **Higher perplexity**: Model is less confident (text is more surprising)
|
| 91 |
+
|
| 92 |
+
### Color Coding
|
| 93 |
+
- **Green**: Low perplexity (≤ 2.0) - very predictable
|
| 94 |
+
- **Yellow/Orange**: Medium perplexity (2.0-10.0) - somewhat predictable
|
| 95 |
+
- **Red**: High perplexity (≥ 10.0) - surprising or difficult to predict
|
| 96 |
+
|
| 97 |
+
## Technical Details
|
| 98 |
+
|
| 99 |
+
### Decoder Models (True Perplexity)
|
| 100 |
+
- Uses next-token prediction to calculate perplexity
|
| 101 |
+
- Formula: `PPL = exp(average_cross_entropy_loss)`
|
| 102 |
+
- Each token's perplexity is based on how well the model predicted it given the previous context
|
| 103 |
+
|
| 104 |
+
### Encoder Models (Pseudo-Perplexity)
|
| 105 |
+
- Uses masked language modeling (MLM)
|
| 106 |
+
- Masks each token individually and measures prediction confidence
|
| 107 |
+
- Pseudo-perplexity approximates true perplexity for bidirectional models
|
| 108 |
+
- All content tokens are analyzed for comprehensive results
|
| 109 |
+
|
| 110 |
+
## Testing
|
| 111 |
+
|
| 112 |
+
Run the test suite to verify everything works:
|
| 113 |
+
```bash
|
| 114 |
+
python test_app.py
|
| 115 |
+
```
|
| 116 |
+
|
| 117 |
+
Or use the startup script with testing:
|
| 118 |
+
```bash
|
| 119 |
+
python run.py --test
|
| 120 |
+
```
|
| 121 |
+
|
| 122 |
+
## Configuration
|
| 123 |
+
|
| 124 |
+
The app uses sensible defaults but can be customized via `config.py`:
|
| 125 |
+
- Default model lists
|
| 126 |
+
- Processing settings
|
| 127 |
+
- Visualization colors and settings
|
| 128 |
+
- UI configuration
|
| 129 |
+
|
| 130 |
+
## Requirements
|
| 131 |
+
|
| 132 |
+
- Python 3.7+
|
| 133 |
+
- PyTorch
|
| 134 |
+
- Transformers
|
| 135 |
+
- Gradio 4.0+
|
| 136 |
+
- spaCy
|
| 137 |
+
- pandas
|
| 138 |
+
- numpy
|
| 139 |
+
|
| 140 |
+
## GPU Support
|
| 141 |
+
|
| 142 |
+
The app automatically uses GPU acceleration when available, falling back to CPU processing otherwise.
|
| 143 |
+
|
| 144 |
+
## Troubleshooting
|
| 145 |
+
|
| 146 |
+
### Common Issues
|
| 147 |
+
|
| 148 |
+
1. **Model loading errors**: Ensure you have internet connection for first-time model downloads
|
| 149 |
+
2. **Memory issues**: Try smaller models like `distilgpt2` or `distilbert-base-uncased`
|
| 150 |
+
3. **CUDA out of memory**: Reduce text length or use CPU-only mode
|
| 151 |
+
4. **Encoder models slow**: This is normal - each token is analyzed individually for accuracy
|
| 152 |
+
5. **Single analysis**: The app now performs one comprehensive analysis per run (no iterations needed)
|
| 153 |
+
|
| 154 |
+
### Getting Help
|
| 155 |
+
|
| 156 |
+
If you encounter issues:
|
| 157 |
+
1. Check the console output for error messages
|
| 158 |
+
2. Try running the test suite: `python test_app.py`
|
| 159 |
+
3. Ensure all dependencies are installed: `pip install -r requirements.txt`
|
| 160 |
+
|
| 161 |
+
## Examples
|
| 162 |
+
|
| 163 |
+
Try these example texts to see the app in action:
|
| 164 |
+
|
| 165 |
+
- **"The quick brown fox jumps over the lazy dog."** (Common phrase - should show low perplexity)
|
| 166 |
+
- **"Quantum entanglement defies classical intuition."** (Technical content - may show higher perplexity)
|
| 167 |
+
- **"Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo."** (Grammatically complex - interesting perplexity patterns)
|
| 168 |
+
|
| 169 |
+
|
SIMPLIFICATION_SUMMARY.md
ADDED
|
@@ -0,0 +1,129 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 🎯 Simplification Summary - MLM Probability Removal
|
| 2 |
+
|
| 3 |
+
## Change Request
|
| 4 |
+
The user requested to **remove the MLM probability slider** and **analyze all tokens** for encoder models, simplifying the interface and making results more consistent.
|
| 5 |
+
|
| 6 |
+
## What Was Removed
|
| 7 |
+
|
| 8 |
+
### 1. MLM Probability Slider
|
| 9 |
+
- **Before**: User could adjust MLM probability from 0.1 to 0.5
|
| 10 |
+
- **After**: No slider, cleaner interface
|
| 11 |
+
|
| 12 |
+
### 2. Random Token Selection
|
| 13 |
+
- **Before**: Only ~15-50% of tokens analyzed based on MLM probability
|
| 14 |
+
- **After**: ALL content tokens analyzed for comprehensive results
|
| 15 |
+
|
| 16 |
+
### 3. Complex Configuration
|
| 17 |
+
- **Before**: MLM probability settings, thresholds, explanations
|
| 18 |
+
- **After**: Simplified configuration focused on core functionality
|
| 19 |
+
|
| 20 |
+
## Code Changes Made
|
| 21 |
+
|
| 22 |
+
### `app.py`
|
| 23 |
+
- **Removed**: `mlm_probability` parameter from all functions
|
| 24 |
+
- **Simplified**: `calculate_encoder_perplexity()` now analyzes all tokens
|
| 25 |
+
- **Cleaned**: UI no longer shows/hides MLM probability slider
|
| 26 |
+
- **Updated**: Process function signature simplified
|
| 27 |
+
|
| 28 |
+
### `config.py`
|
| 29 |
+
- **Removed**: All MLM probability related settings
|
| 30 |
+
- **Simplified**: Examples no longer include MLM probability values
|
| 31 |
+
- **Cleaned**: Processing settings streamlined
|
| 32 |
+
|
| 33 |
+
### UI Changes
|
| 34 |
+
- **Removed**: MLM probability slider and related controls
|
| 35 |
+
- **Updated**: Help text and examples
|
| 36 |
+
- **Simplified**: Model type change handler
|
| 37 |
+
|
| 38 |
+
## New Behavior
|
| 39 |
+
|
| 40 |
+
### Encoder Models (BERT, etc.)
|
| 41 |
+
1. **Comprehensive Analysis**: Every content token is individually masked and analyzed
|
| 42 |
+
2. **Consistent Results**: No randomness in token selection
|
| 43 |
+
3. **Full Visualization**: All tokens get proper perplexity colors (no gray "not analyzed" tokens)
|
| 44 |
+
4. **Better Performance**: No need to run multiple iterations for statistical sampling
|
| 45 |
+
|
| 46 |
+
### Decoder Models (GPT, etc.)
|
| 47 |
+
- **No change**: Still analyzes all tokens as before
|
| 48 |
+
- **Consistent interface**: Same workflow for both model types
|
| 49 |
+
|
| 50 |
+
## Benefits of Simplification
|
| 51 |
+
|
| 52 |
+
### 1. **User Experience**
|
| 53 |
+
- ✅ Cleaner, less confusing interface
|
| 54 |
+
- ✅ Consistent results every time
|
| 55 |
+
- ✅ No need to understand MLM probability concept
|
| 56 |
+
- ✅ Faster workflow (fewer parameters to adjust)
|
| 57 |
+
|
| 58 |
+
### 2. **Technical Benefits**
|
| 59 |
+
- ✅ More comprehensive analysis (100% of tokens)
|
| 60 |
+
- ✅ Deterministic results (no randomness)
|
| 61 |
+
- ✅ Simplified codebase (easier to maintain)
|
| 62 |
+
- ✅ Better visualization (all tokens colored)
|
| 63 |
+
|
| 64 |
+
### 3. **Performance**
|
| 65 |
+
- ✅ More predictable compute time
|
| 66 |
+
- ✅ No wasted computation on statistical sampling
|
| 67 |
+
- ✅ Single iteration gives complete picture
|
| 68 |
+
|
| 69 |
+
## Impact on Existing Functionality
|
| 70 |
+
|
| 71 |
+
### What Still Works
|
| 72 |
+
- ✅ All model types supported
|
| 73 |
+
- ✅ Color visualization working perfectly
|
| 74 |
+
- ✅ Iterations parameter still available
|
| 75 |
+
- ✅ Model caching still functional
|
| 76 |
+
- ✅ All examples still work
|
| 77 |
+
|
| 78 |
+
### What's Improved
|
| 79 |
+
- 🎯 Encoder model analysis is now comprehensive
|
| 80 |
+
- 🎯 No more confusing "not analyzed" gray tokens
|
| 81 |
+
- 🎯 Simpler parameter space to explore
|
| 82 |
+
- 🎯 More consistent results
|
| 83 |
+
|
| 84 |
+
## Migration Notes
|
| 85 |
+
|
| 86 |
+
### For Users
|
| 87 |
+
- **Old workflow**: Adjust MLM probability → Analyze → Interpret partial results
|
| 88 |
+
- **New workflow**: Select text → Choose model → Analyze → Get complete results
|
| 89 |
+
|
| 90 |
+
### For Developers
|
| 91 |
+
- Function signatures simplified (removed `mlm_probability` parameter)
|
| 92 |
+
- Configuration streamlined (removed MLM-related settings)
|
| 93 |
+
- UI event handlers simplified (no MLM probability visibility toggle)
|
| 94 |
+
|
| 95 |
+
## Files Modified
|
| 96 |
+
|
| 97 |
+
1. **`app.py`**: Core functionality and UI
|
| 98 |
+
2. **`config.py`**: Configuration and examples
|
| 99 |
+
3. **`README.md`**: Updated documentation
|
| 100 |
+
4. **`QUICKSTART.md`**: Simplified instructions
|
| 101 |
+
|
| 102 |
+
## Files Created
|
| 103 |
+
1. **`SIMPLIFICATION_SUMMARY.md`**: This documentation
|
| 104 |
+
|
| 105 |
+
## Testing
|
| 106 |
+
|
| 107 |
+
The simplification maintains all existing functionality while providing better results:
|
| 108 |
+
|
| 109 |
+
```bash
|
| 110 |
+
# Test the simplified interface
|
| 111 |
+
python launch.py
|
| 112 |
+
|
| 113 |
+
# Try encoder models - all tokens now analyzed:
|
| 114 |
+
# Text: "The capital of France is Paris"
|
| 115 |
+
# Model: bert-base-uncased
|
| 116 |
+
# Type: encoder
|
| 117 |
+
# Result: All content tokens get proper colors!
|
| 118 |
+
```
|
| 119 |
+
|
| 120 |
+
## Result
|
| 121 |
+
|
| 122 |
+
The app is now **simpler, faster, and more comprehensive** - exactly what the user requested! 🎉
|
| 123 |
+
|
| 124 |
+
- 🎯 **Simpler**: Removed confusing MLM probability parameter
|
| 125 |
+
- 🚀 **Faster**: More direct workflow
|
| 126 |
+
- 🔍 **Comprehensive**: All tokens analyzed for complete picture
|
| 127 |
+
- 🎨 **Better visualization**: No more gray "not analyzed" tokens
|
| 128 |
+
|
| 129 |
+
The interface is cleaner, the results are more complete, and the user experience is significantly improved.
|
__pycache__/app.cpython-310.pyc
ADDED
|
Binary file (11.6 kB). View file
|
|
|
__pycache__/app.cpython-312.pyc
ADDED
|
Binary file (20 kB). View file
|
|
|
__pycache__/config.cpython-310.pyc
ADDED
|
Binary file (2.23 kB). View file
|
|
|
__pycache__/config.cpython-312.pyc
ADDED
|
Binary file (2.44 kB). View file
|
|
|
__pycache__/launch.cpython-310.pyc
ADDED
|
Binary file (1.28 kB). View file
|
|
|
__pycache__/mlm_demo.cpython-310.pyc
ADDED
|
Binary file (6.11 kB). View file
|
|
|
__pycache__/run.cpython-310.pyc
ADDED
|
Binary file (4.79 kB). View file
|
|
|
__pycache__/test_app.cpython-310.pyc
ADDED
|
Binary file (7.47 kB). View file
|
|
|
app.py
CHANGED
|
@@ -33,18 +33,16 @@ except ImportError:
|
|
| 33 |
"displacy_options": {"ents": ["PP"], "colors": {}}
|
| 34 |
}
|
| 35 |
PROCESSING_SETTINGS = {
|
| 36 |
-
"default_iterations": 1,
|
| 37 |
-
"max_iterations": 10,
|
| 38 |
"epsilon": 1e-10
|
| 39 |
}
|
| 40 |
UI_SETTINGS = {
|
| 41 |
-
"title": "📈 Perplexity Viewer
|
| 42 |
-
"description": "Visualize per-token perplexity using color gradients.
|
| 43 |
"examples": [
|
| 44 |
-
{"text": "The quick brown fox jumps over the lazy dog.", "model": "gpt2", "type": "decoder"
|
| 45 |
-
{"text": "The capital of France is Paris.", "model": "bert-base-uncased", "type": "encoder"
|
| 46 |
-
{"text": "Quantum entanglement defies classical physics intuition completely.", "model": "distilgpt2", "type": "decoder"
|
| 47 |
-
{"text": "Machine learning algorithms require computational resources.", "model": "distilbert-base-uncased", "type": "encoder"
|
| 48 |
]
|
| 49 |
}
|
| 50 |
ERROR_MESSAGES = {
|
|
@@ -95,27 +93,24 @@ def load_model_and_tokenizer(model_name, model_type):
|
|
| 95 |
|
| 96 |
return cached_models[cache_key], cached_tokenizers[cache_key]
|
| 97 |
|
| 98 |
-
def calculate_decoder_perplexity(text, model, tokenizer
|
| 99 |
"""Calculate perplexity for decoder models (like GPT)"""
|
| 100 |
device = next(model.parameters()).device
|
| 101 |
|
| 102 |
-
|
|
|
|
|
|
|
| 103 |
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=MODEL_SETTINGS["max_length"])
|
| 107 |
-
input_ids = inputs.input_ids.to(device)
|
| 108 |
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
loss = outputs.loss
|
| 115 |
-
perplexity = torch.exp(loss).item()
|
| 116 |
-
perplexities.append(perplexity)
|
| 117 |
|
| 118 |
-
# Get token-level perplexities
|
| 119 |
with torch.no_grad():
|
| 120 |
outputs = model(input_ids)
|
| 121 |
logits = outputs.logits
|
|
@@ -142,46 +137,44 @@ def calculate_decoder_perplexity(text, model, tokenizer, iterations=1):
|
|
| 142 |
else:
|
| 143 |
cleaned_tokens.append(token)
|
| 144 |
|
| 145 |
-
return
|
| 146 |
|
| 147 |
-
def calculate_encoder_perplexity(text, model, tokenizer
|
| 148 |
"""Calculate pseudo-perplexity for encoder models (like BERT) using MLM on all tokens"""
|
| 149 |
device = next(model.parameters()).device
|
| 150 |
|
| 151 |
-
|
|
|
|
|
|
|
| 152 |
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=MODEL_SETTINGS["max_length"])
|
| 156 |
-
input_ids = inputs.input_ids.to(device)
|
| 157 |
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
with torch.no_grad():
|
| 163 |
-
seq_length = input_ids.size(1)
|
| 164 |
-
special_token_ids = {tokenizer.cls_token_id, tokenizer.sep_token_id, tokenizer.pad_token_id}
|
| 165 |
|
| 166 |
-
|
| 167 |
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
|
| 174 |
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|
| 180 |
|
| 181 |
-
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
|
|
|
| 185 |
|
| 186 |
# Calculate per-token pseudo-perplexity for visualization (analyze all tokens)
|
| 187 |
with torch.no_grad():
|
|
@@ -212,7 +205,7 @@ def calculate_encoder_perplexity(text, model, tokenizer, iterations=1):
|
|
| 212 |
else:
|
| 213 |
cleaned_tokens.append(token)
|
| 214 |
|
| 215 |
-
return
|
| 216 |
|
| 217 |
def create_visualization(tokens, perplexities):
|
| 218 |
"""Create custom HTML visualization with color-coded perplexities"""
|
|
@@ -318,26 +311,23 @@ def create_visualization(tokens, perplexities):
|
|
| 318 |
|
| 319 |
return "".join(html_parts)
|
| 320 |
|
| 321 |
-
def process_text(text, model_name, model_type
|
| 322 |
"""Main processing function"""
|
| 323 |
if not text.strip():
|
| 324 |
return ERROR_MESSAGES["empty_text"], "", pd.DataFrame()
|
| 325 |
|
| 326 |
try:
|
| 327 |
-
# Validate inputs
|
| 328 |
-
iterations = max(1, min(iterations, PROCESSING_SETTINGS["max_iterations"]))
|
| 329 |
-
|
| 330 |
# Load model and tokenizer
|
| 331 |
model, tokenizer = load_model_and_tokenizer(model_name, model_type)
|
| 332 |
|
| 333 |
# Calculate perplexity
|
| 334 |
if model_type == "decoder":
|
| 335 |
avg_perplexity, tokens, token_perplexities = calculate_decoder_perplexity(
|
| 336 |
-
text, model, tokenizer
|
| 337 |
)
|
| 338 |
else: # encoder
|
| 339 |
avg_perplexity, tokens, token_perplexities = calculate_encoder_perplexity(
|
| 340 |
-
text, model, tokenizer
|
| 341 |
)
|
| 342 |
|
| 343 |
# Create visualization
|
|
@@ -351,7 +341,6 @@ def process_text(text, model_name, model_type, iterations):
|
|
| 351 |
**Model Type:** {model_type.title()}
|
| 352 |
**Average Perplexity:** {avg_perplexity:.4f}
|
| 353 |
**Number of Tokens:** {len(tokens)}
|
| 354 |
-
**Iterations:** {iterations}
|
| 355 |
"""
|
| 356 |
|
| 357 |
|
|
@@ -397,15 +386,6 @@ with gr.Blocks(title=UI_SETTINGS["title"], theme=gr.themes.Soft()) as demo:
|
|
| 397 |
info="Decoder for causal LM, Encoder for masked LM"
|
| 398 |
)
|
| 399 |
|
| 400 |
-
with gr.Row():
|
| 401 |
-
iterations = gr.Slider(
|
| 402 |
-
label="Iterations",
|
| 403 |
-
minimum=1,
|
| 404 |
-
maximum=PROCESSING_SETTINGS["max_iterations"],
|
| 405 |
-
value=PROCESSING_SETTINGS["default_iterations"],
|
| 406 |
-
step=1,
|
| 407 |
-
info="Number of iterations to average over"
|
| 408 |
-
)
|
| 409 |
analyze_btn = gr.Button("🔍 Analyze Perplexity", variant="primary", size="lg")
|
| 410 |
|
| 411 |
with gr.Column(scale=3):
|
|
@@ -433,20 +413,20 @@ with gr.Blocks(title=UI_SETTINGS["title"], theme=gr.themes.Soft()) as demo:
|
|
| 433 |
# Set up the analysis function
|
| 434 |
analyze_btn.click(
|
| 435 |
fn=process_text,
|
| 436 |
-
inputs=[text_input, model_name, model_type
|
| 437 |
outputs=[summary_output, viz_output, table_output]
|
| 438 |
)
|
| 439 |
|
| 440 |
# Add examples
|
| 441 |
with gr.Accordion("📝 Example Texts", open=False):
|
| 442 |
examples_data = [
|
| 443 |
-
[ex["text"], ex["model"], ex["type"]
|
| 444 |
for ex in UI_SETTINGS["examples"]
|
| 445 |
]
|
| 446 |
|
| 447 |
gr.Examples(
|
| 448 |
examples=examples_data,
|
| 449 |
-
inputs=[text_input, model_name, model_type
|
| 450 |
outputs=[summary_output, viz_output, table_output],
|
| 451 |
fn=process_text,
|
| 452 |
cache_examples=False,
|
|
@@ -468,7 +448,7 @@ with gr.Blocks(title=UI_SETTINGS["title"], theme=gr.themes.Soft()) as demo:
|
|
| 468 |
- Models are cached after first use
|
| 469 |
- Very long texts are truncated to 512 tokens
|
| 470 |
- GPU acceleration is used when available
|
| 471 |
-
-
|
| 472 |
""")
|
| 473 |
|
| 474 |
if __name__ == "__main__":
|
|
|
|
| 33 |
"displacy_options": {"ents": ["PP"], "colors": {}}
|
| 34 |
}
|
| 35 |
PROCESSING_SETTINGS = {
|
|
|
|
|
|
|
| 36 |
"epsilon": 1e-10
|
| 37 |
}
|
| 38 |
UI_SETTINGS = {
|
| 39 |
+
"title": "📈 Perplexity Viewer",
|
| 40 |
+
"description": "Visualize per-token perplexity using color gradients.",
|
| 41 |
"examples": [
|
| 42 |
+
{"text": "The quick brown fox jumps over the lazy dog.", "model": "gpt2", "type": "decoder"},
|
| 43 |
+
{"text": "The capital of France is Paris.", "model": "bert-base-uncased", "type": "encoder"},
|
| 44 |
+
{"text": "Quantum entanglement defies classical physics intuition completely.", "model": "distilgpt2", "type": "decoder"},
|
| 45 |
+
{"text": "Machine learning algorithms require computational resources.", "model": "distilbert-base-uncased", "type": "encoder"}
|
| 46 |
]
|
| 47 |
}
|
| 48 |
ERROR_MESSAGES = {
|
|
|
|
| 93 |
|
| 94 |
return cached_models[cache_key], cached_tokenizers[cache_key]
|
| 95 |
|
| 96 |
+
def calculate_decoder_perplexity(text, model, tokenizer):
|
| 97 |
"""Calculate perplexity for decoder models (like GPT)"""
|
| 98 |
device = next(model.parameters()).device
|
| 99 |
|
| 100 |
+
# Tokenize the text
|
| 101 |
+
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=MODEL_SETTINGS["max_length"])
|
| 102 |
+
input_ids = inputs.input_ids.to(device)
|
| 103 |
|
| 104 |
+
if input_ids.size(1) < 2:
|
| 105 |
+
raise gr.Error("Text is too short for perplexity calculation.")
|
|
|
|
|
|
|
| 106 |
|
| 107 |
+
# Calculate overall perplexity
|
| 108 |
+
with torch.no_grad():
|
| 109 |
+
outputs = model(input_ids, labels=input_ids)
|
| 110 |
+
loss = outputs.loss
|
| 111 |
+
perplexity = torch.exp(loss).item()
|
|
|
|
|
|
|
|
|
|
| 112 |
|
| 113 |
+
# Get token-level perplexities
|
| 114 |
with torch.no_grad():
|
| 115 |
outputs = model(input_ids)
|
| 116 |
logits = outputs.logits
|
|
|
|
| 137 |
else:
|
| 138 |
cleaned_tokens.append(token)
|
| 139 |
|
| 140 |
+
return perplexity, cleaned_tokens, token_perplexities
|
| 141 |
|
| 142 |
+
def calculate_encoder_perplexity(text, model, tokenizer):
|
| 143 |
"""Calculate pseudo-perplexity for encoder models (like BERT) using MLM on all tokens"""
|
| 144 |
device = next(model.parameters()).device
|
| 145 |
|
| 146 |
+
# Tokenize the text
|
| 147 |
+
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=MODEL_SETTINGS["max_length"])
|
| 148 |
+
input_ids = inputs.input_ids.to(device)
|
| 149 |
|
| 150 |
+
if input_ids.size(1) < 3: # Need at least [CLS] + 1 token + [SEP]
|
| 151 |
+
raise gr.Error("Text is too short for MLM perplexity calculation.")
|
|
|
|
|
|
|
| 152 |
|
| 153 |
+
# Calculate average perplexity by masking all content tokens
|
| 154 |
+
with torch.no_grad():
|
| 155 |
+
seq_length = input_ids.size(1)
|
| 156 |
+
special_token_ids = {tokenizer.cls_token_id, tokenizer.sep_token_id, tokenizer.pad_token_id}
|
|
|
|
|
|
|
|
|
|
| 157 |
|
| 158 |
+
all_token_losses = []
|
| 159 |
|
| 160 |
+
# Mask each non-special token individually and calculate loss
|
| 161 |
+
for i in range(seq_length):
|
| 162 |
+
if input_ids[0, i].item() not in special_token_ids:
|
| 163 |
+
masked_input = input_ids.clone()
|
| 164 |
+
original_token_id = input_ids[0, i]
|
| 165 |
+
masked_input[0, i] = tokenizer.mask_token_id
|
| 166 |
|
| 167 |
+
outputs = model(masked_input)
|
| 168 |
+
predictions = outputs.logits[0, i]
|
| 169 |
+
prob = F.softmax(predictions, dim=-1)[original_token_id]
|
| 170 |
+
loss = -torch.log(prob + PROCESSING_SETTINGS["epsilon"])
|
| 171 |
+
all_token_losses.append(loss.item())
|
| 172 |
|
| 173 |
+
if all_token_losses:
|
| 174 |
+
avg_loss = np.mean(all_token_losses)
|
| 175 |
+
perplexity = math.exp(avg_loss)
|
| 176 |
+
else:
|
| 177 |
+
perplexity = float('inf')
|
| 178 |
|
| 179 |
# Calculate per-token pseudo-perplexity for visualization (analyze all tokens)
|
| 180 |
with torch.no_grad():
|
|
|
|
| 205 |
else:
|
| 206 |
cleaned_tokens.append(token)
|
| 207 |
|
| 208 |
+
return perplexity, cleaned_tokens, np.array(token_perplexities)
|
| 209 |
|
| 210 |
def create_visualization(tokens, perplexities):
|
| 211 |
"""Create custom HTML visualization with color-coded perplexities"""
|
|
|
|
| 311 |
|
| 312 |
return "".join(html_parts)
|
| 313 |
|
| 314 |
+
def process_text(text, model_name, model_type):
|
| 315 |
"""Main processing function"""
|
| 316 |
if not text.strip():
|
| 317 |
return ERROR_MESSAGES["empty_text"], "", pd.DataFrame()
|
| 318 |
|
| 319 |
try:
|
|
|
|
|
|
|
|
|
|
| 320 |
# Load model and tokenizer
|
| 321 |
model, tokenizer = load_model_and_tokenizer(model_name, model_type)
|
| 322 |
|
| 323 |
# Calculate perplexity
|
| 324 |
if model_type == "decoder":
|
| 325 |
avg_perplexity, tokens, token_perplexities = calculate_decoder_perplexity(
|
| 326 |
+
text, model, tokenizer
|
| 327 |
)
|
| 328 |
else: # encoder
|
| 329 |
avg_perplexity, tokens, token_perplexities = calculate_encoder_perplexity(
|
| 330 |
+
text, model, tokenizer
|
| 331 |
)
|
| 332 |
|
| 333 |
# Create visualization
|
|
|
|
| 341 |
**Model Type:** {model_type.title()}
|
| 342 |
**Average Perplexity:** {avg_perplexity:.4f}
|
| 343 |
**Number of Tokens:** {len(tokens)}
|
|
|
|
| 344 |
"""
|
| 345 |
|
| 346 |
|
|
|
|
| 386 |
info="Decoder for causal LM, Encoder for masked LM"
|
| 387 |
)
|
| 388 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 389 |
analyze_btn = gr.Button("🔍 Analyze Perplexity", variant="primary", size="lg")
|
| 390 |
|
| 391 |
with gr.Column(scale=3):
|
|
|
|
| 413 |
# Set up the analysis function
|
| 414 |
analyze_btn.click(
|
| 415 |
fn=process_text,
|
| 416 |
+
inputs=[text_input, model_name, model_type],
|
| 417 |
outputs=[summary_output, viz_output, table_output]
|
| 418 |
)
|
| 419 |
|
| 420 |
# Add examples
|
| 421 |
with gr.Accordion("📝 Example Texts", open=False):
|
| 422 |
examples_data = [
|
| 423 |
+
[ex["text"], ex["model"], ex["type"]]
|
| 424 |
for ex in UI_SETTINGS["examples"]
|
| 425 |
]
|
| 426 |
|
| 427 |
gr.Examples(
|
| 428 |
examples=examples_data,
|
| 429 |
+
inputs=[text_input, model_name, model_type],
|
| 430 |
outputs=[summary_output, viz_output, table_output],
|
| 431 |
fn=process_text,
|
| 432 |
cache_examples=False,
|
|
|
|
| 448 |
- Models are cached after first use
|
| 449 |
- Very long texts are truncated to 512 tokens
|
| 450 |
- GPU acceleration is used when available
|
| 451 |
+
- All tokens are analyzed in a single pass for accurate results
|
| 452 |
""")
|
| 453 |
|
| 454 |
if __name__ == "__main__":
|
color_test.html
ADDED
|
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
<!DOCTYPE html>
|
| 3 |
+
<html>
|
| 4 |
+
<head>
|
| 5 |
+
<title>Color Test</title>
|
| 6 |
+
<style>
|
| 7 |
+
body { font-family: Arial, sans-serif; margin: 20px; }
|
| 8 |
+
.test-section { margin: 20px 0; padding: 15px; border: 1px solid #ccc; }
|
| 9 |
+
</style>
|
| 10 |
+
</head>
|
| 11 |
+
<body>
|
| 12 |
+
<h1>🎨 Perplexity Color Test</h1>
|
| 13 |
+
|
| 14 |
+
<div class="test-section">
|
| 15 |
+
<h2>Low Perplexity (Green - Confident)</h2>
|
| 16 |
+
<span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.2">The</span>
|
| 17 |
+
<span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.8">quick</span>
|
| 18 |
+
<span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 2.1">brown</span>
|
| 19 |
+
</div>
|
| 20 |
+
|
| 21 |
+
<div class="test-section">
|
| 22 |
+
<h2>Medium Perplexity (Yellow - Uncertain)</h2>
|
| 23 |
+
<span style="background-color: rgba(241, 196, 15, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 5.4">machine</span>
|
| 24 |
+
<span style="background-color: rgba(241, 196, 15, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 7.2">learning</span>
|
| 25 |
+
<span style="background-color: rgba(241, 196, 15, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 8.9">requires</span>
|
| 26 |
+
</div>
|
| 27 |
+
|
| 28 |
+
<div class="test-section">
|
| 29 |
+
<h2>High Perplexity (Red - Very Uncertain)</h2>
|
| 30 |
+
<span style="background-color: rgba(231, 76, 60, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 15.7">quantum</span>
|
| 31 |
+
<span style="background-color: rgba(231, 76, 60, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 23.4">entanglement</span>
|
| 32 |
+
<span style="background-color: rgba(231, 76, 60, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 31.2">defies</span>
|
| 33 |
+
</div>
|
| 34 |
+
|
| 35 |
+
<div class="test-section">
|
| 36 |
+
<h2>Mixed Example Sentence</h2>
|
| 37 |
+
<span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.2">The</span>
|
| 38 |
+
<span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 2.3">capital</span>
|
| 39 |
+
<span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.8">of</span>
|
| 40 |
+
<span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 2.1">France</span>
|
| 41 |
+
<span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.5">is</span>
|
| 42 |
+
<span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.9">Paris</span>
|
| 43 |
+
</div>
|
| 44 |
+
|
| 45 |
+
<p><strong>Instructions:</strong> Hover over tokens to see perplexity values in tooltips!</p>
|
| 46 |
+
<p><strong>Color Legend:</strong></p>
|
| 47 |
+
<ul>
|
| 48 |
+
<li>🟢 <strong>Green:</strong> Low perplexity (model is confident)</li>
|
| 49 |
+
<li>🟡 <strong>Yellow:</strong> Medium perplexity (model is somewhat uncertain)</li>
|
| 50 |
+
<li>🔴 <strong>Red:</strong> High perplexity (model is very uncertain)</li>
|
| 51 |
+
</ul>
|
| 52 |
+
</body>
|
| 53 |
+
</html>
|
demo.py
ADDED
|
@@ -0,0 +1,263 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Demo script for PerplexityViewer - shows core functionality without GUI
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import torch
|
| 7 |
+
import numpy as np
|
| 8 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForMaskedLM
|
| 9 |
+
import warnings
|
| 10 |
+
warnings.filterwarnings("ignore")
|
| 11 |
+
|
| 12 |
+
def demo_decoder_perplexity():
|
| 13 |
+
"""Demo decoder model perplexity calculation"""
|
| 14 |
+
print("="*60)
|
| 15 |
+
print("🤖 Decoder Model Demo (GPT-2)")
|
| 16 |
+
print("="*60)
|
| 17 |
+
|
| 18 |
+
# Load model
|
| 19 |
+
model_name = "distilgpt2"
|
| 20 |
+
print(f"Loading {model_name}...")
|
| 21 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 22 |
+
model = AutoModelForCausalLM.from_pretrained(model_name)
|
| 23 |
+
|
| 24 |
+
if tokenizer.pad_token is None:
|
| 25 |
+
tokenizer.pad_token = tokenizer.eos_token
|
| 26 |
+
|
| 27 |
+
model.eval()
|
| 28 |
+
|
| 29 |
+
# Test texts
|
| 30 |
+
test_texts = [
|
| 31 |
+
"The quick brown fox jumps over the lazy dog.",
|
| 32 |
+
"Machine learning is revolutionizing artificial intelligence.",
|
| 33 |
+
"Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo.",
|
| 34 |
+
"The capital of France is Paris."
|
| 35 |
+
]
|
| 36 |
+
|
| 37 |
+
for i, text in enumerate(test_texts, 1):
|
| 38 |
+
print(f"\n📝 Text {i}: {text}")
|
| 39 |
+
|
| 40 |
+
# Tokenize
|
| 41 |
+
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
|
| 42 |
+
input_ids = inputs.input_ids
|
| 43 |
+
|
| 44 |
+
# Calculate perplexity
|
| 45 |
+
with torch.no_grad():
|
| 46 |
+
outputs = model(input_ids, labels=input_ids)
|
| 47 |
+
loss = outputs.loss
|
| 48 |
+
perplexity = torch.exp(loss).item()
|
| 49 |
+
|
| 50 |
+
print(f" 💯 Perplexity: {perplexity:.2f}")
|
| 51 |
+
|
| 52 |
+
# Get token-level details
|
| 53 |
+
tokens = tokenizer.convert_ids_to_tokens(input_ids[0][1:]) # Skip first token
|
| 54 |
+
|
| 55 |
+
with torch.no_grad():
|
| 56 |
+
outputs = model(input_ids)
|
| 57 |
+
logits = outputs.logits
|
| 58 |
+
shift_logits = logits[..., :-1, :].contiguous()
|
| 59 |
+
shift_labels = input_ids[..., 1:].contiguous()
|
| 60 |
+
|
| 61 |
+
loss_fct = torch.nn.CrossEntropyLoss(reduction='none')
|
| 62 |
+
token_losses = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
|
| 63 |
+
token_perplexities = torch.exp(token_losses).cpu().numpy()
|
| 64 |
+
|
| 65 |
+
print(" 🎯 Token details:")
|
| 66 |
+
for token, pp in zip(tokens[:5], token_perplexities[:5]): # Show first 5
|
| 67 |
+
clean_token = token.replace('Ġ', ' ').replace('##', '')
|
| 68 |
+
color = '🟢' if pp < 3 else '🟡' if pp < 10 else '🔴'
|
| 69 |
+
print(f" {color} '{clean_token}': {pp:.2f}")
|
| 70 |
+
|
| 71 |
+
if len(tokens) > 5:
|
| 72 |
+
print(f" ... and {len(tokens) - 5} more tokens")
|
| 73 |
+
|
| 74 |
+
def demo_encoder_perplexity():
|
| 75 |
+
"""Demo encoder model pseudo-perplexity calculation"""
|
| 76 |
+
print("\n" + "="*60)
|
| 77 |
+
print("🤖 Encoder Model Demo (DistilBERT)")
|
| 78 |
+
print("="*60)
|
| 79 |
+
|
| 80 |
+
# Load model
|
| 81 |
+
model_name = "distilbert-base-uncased"
|
| 82 |
+
print(f"Loading {model_name}...")
|
| 83 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 84 |
+
model = AutoModelForMaskedLM.from_pretrained(model_name)
|
| 85 |
+
model.eval()
|
| 86 |
+
|
| 87 |
+
# Test texts
|
| 88 |
+
test_texts = [
|
| 89 |
+
"The capital of France is Paris.",
|
| 90 |
+
"Python is a programming language.",
|
| 91 |
+
"The weather today is beautiful.",
|
| 92 |
+
"Machine learning requires large datasets."
|
| 93 |
+
]
|
| 94 |
+
|
| 95 |
+
mlm_probability = 0.15
|
| 96 |
+
|
| 97 |
+
for i, text in enumerate(test_texts, 1):
|
| 98 |
+
print(f"\n📝 Text {i}: {text}")
|
| 99 |
+
|
| 100 |
+
# Tokenize
|
| 101 |
+
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
|
| 102 |
+
input_ids = inputs.input_ids
|
| 103 |
+
|
| 104 |
+
# Create masked version
|
| 105 |
+
masked_input_ids = input_ids.clone()
|
| 106 |
+
original_tokens = input_ids.clone()
|
| 107 |
+
|
| 108 |
+
# Randomly mask tokens (excluding special tokens)
|
| 109 |
+
seq_length = input_ids.size(1)
|
| 110 |
+
mask_indices = []
|
| 111 |
+
special_token_ids = {tokenizer.cls_token_id, tokenizer.sep_token_id, tokenizer.pad_token_id}
|
| 112 |
+
|
| 113 |
+
for j in range(seq_length):
|
| 114 |
+
if input_ids[0, j].item() not in special_token_ids:
|
| 115 |
+
if torch.rand(1).item() < mlm_probability:
|
| 116 |
+
mask_indices.append(j)
|
| 117 |
+
masked_input_ids[0, j] = tokenizer.mask_token_id
|
| 118 |
+
|
| 119 |
+
if not mask_indices: # Ensure at least one token is masked
|
| 120 |
+
non_special_indices = [j for j in range(seq_length) if input_ids[0, j].item() not in special_token_ids]
|
| 121 |
+
if non_special_indices:
|
| 122 |
+
mask_idx = torch.randint(0, len(non_special_indices), (1,)).item()
|
| 123 |
+
mask_indices = [non_special_indices[mask_idx]]
|
| 124 |
+
masked_input_ids[0, mask_indices[0]] = tokenizer.mask_token_id
|
| 125 |
+
|
| 126 |
+
# Calculate pseudo-perplexity
|
| 127 |
+
with torch.no_grad():
|
| 128 |
+
outputs = model(masked_input_ids)
|
| 129 |
+
predictions = outputs.logits
|
| 130 |
+
|
| 131 |
+
masked_token_losses = []
|
| 132 |
+
for idx in mask_indices:
|
| 133 |
+
target_id = original_tokens[0, idx]
|
| 134 |
+
pred_scores = predictions[0, idx]
|
| 135 |
+
prob = torch.softmax(pred_scores, dim=-1)[target_id]
|
| 136 |
+
loss = -torch.log(prob + 1e-10)
|
| 137 |
+
masked_token_losses.append(loss.item())
|
| 138 |
+
|
| 139 |
+
if masked_token_losses:
|
| 140 |
+
avg_loss = np.mean(masked_token_losses)
|
| 141 |
+
pseudo_perplexity = np.exp(avg_loss)
|
| 142 |
+
else:
|
| 143 |
+
pseudo_perplexity = float('inf')
|
| 144 |
+
|
| 145 |
+
print(f" 💯 Pseudo-perplexity: {pseudo_perplexity:.2f}")
|
| 146 |
+
print(f" 🎭 Masked {len(mask_indices)} tokens")
|
| 147 |
+
|
| 148 |
+
# Show some token-level pseudo-perplexities
|
| 149 |
+
tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
|
| 150 |
+
print(" 🎯 Sample token pseudo-perplexities:")
|
| 151 |
+
|
| 152 |
+
with torch.no_grad():
|
| 153 |
+
sample_indices = list(range(1, min(6, len(tokens)-1))) # Skip [CLS] and [SEP]
|
| 154 |
+
for idx in sample_indices:
|
| 155 |
+
if input_ids[0, idx].item() not in special_token_ids:
|
| 156 |
+
masked_input = input_ids.clone()
|
| 157 |
+
original_token_id = input_ids[0, idx]
|
| 158 |
+
masked_input[0, idx] = tokenizer.mask_token_id
|
| 159 |
+
|
| 160 |
+
outputs = model(masked_input)
|
| 161 |
+
predictions = outputs.logits[0, idx]
|
| 162 |
+
prob = torch.softmax(predictions, dim=-1)[original_token_id]
|
| 163 |
+
token_pseudo_perplexity = 1.0 / (prob.item() + 1e-10)
|
| 164 |
+
|
| 165 |
+
clean_token = tokens[idx].replace('##', '')
|
| 166 |
+
color = '🟢' if token_pseudo_perplexity < 5 else '🟡' if token_pseudo_perplexity < 20 else '🔴'
|
| 167 |
+
print(f" {color} '{clean_token}': {token_pseudo_perplexity:.2f}")
|
| 168 |
+
|
| 169 |
+
def demo_comparison():
|
| 170 |
+
"""Compare perplexity across different model types"""
|
| 171 |
+
print("\n" + "="*60)
|
| 172 |
+
print("🔬 Model Comparison Demo")
|
| 173 |
+
print("="*60)
|
| 174 |
+
|
| 175 |
+
test_text = "The quick brown fox jumps over the lazy dog."
|
| 176 |
+
print(f"📝 Comparing models on: {test_text}")
|
| 177 |
+
|
| 178 |
+
models_to_test = [
|
| 179 |
+
("distilgpt2", "decoder"),
|
| 180 |
+
("distilbert-base-uncased", "encoder")
|
| 181 |
+
]
|
| 182 |
+
|
| 183 |
+
results = []
|
| 184 |
+
|
| 185 |
+
for model_name, model_type in models_to_test:
|
| 186 |
+
print(f"\n🤖 Testing {model_name} ({model_type})...")
|
| 187 |
+
|
| 188 |
+
try:
|
| 189 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 190 |
+
|
| 191 |
+
if model_type == "decoder":
|
| 192 |
+
model = AutoModelForCausalLM.from_pretrained(model_name)
|
| 193 |
+
if tokenizer.pad_token is None:
|
| 194 |
+
tokenizer.pad_token = tokenizer.eos_token
|
| 195 |
+
else:
|
| 196 |
+
model = AutoModelForMaskedLM.from_pretrained(model_name)
|
| 197 |
+
|
| 198 |
+
model.eval()
|
| 199 |
+
|
| 200 |
+
inputs = tokenizer(test_text, return_tensors="pt", truncation=True, max_length=512)
|
| 201 |
+
input_ids = inputs.input_ids
|
| 202 |
+
|
| 203 |
+
if model_type == "decoder":
|
| 204 |
+
with torch.no_grad():
|
| 205 |
+
outputs = model(input_ids, labels=input_ids)
|
| 206 |
+
loss = outputs.loss
|
| 207 |
+
perplexity = torch.exp(loss).item()
|
| 208 |
+
else: # encoder
|
| 209 |
+
# Quick pseudo-perplexity calculation
|
| 210 |
+
masked_input_ids = input_ids.clone()
|
| 211 |
+
seq_length = input_ids.size(1)
|
| 212 |
+
|
| 213 |
+
# Mask middle token
|
| 214 |
+
if seq_length > 2:
|
| 215 |
+
middle_idx = seq_length // 2
|
| 216 |
+
masked_input_ids[0, middle_idx] = tokenizer.mask_token_id
|
| 217 |
+
|
| 218 |
+
with torch.no_grad():
|
| 219 |
+
outputs = model(masked_input_ids)
|
| 220 |
+
predictions = outputs.logits[0, middle_idx]
|
| 221 |
+
prob = torch.softmax(predictions, dim=-1)[input_ids[0, middle_idx]]
|
| 222 |
+
perplexity = 1.0 / (prob.item() + 1e-10)
|
| 223 |
+
else:
|
| 224 |
+
perplexity = float('inf')
|
| 225 |
+
|
| 226 |
+
results.append((model_name, model_type, perplexity))
|
| 227 |
+
print(f" ✅ Perplexity: {perplexity:.2f}")
|
| 228 |
+
|
| 229 |
+
except Exception as e:
|
| 230 |
+
print(f" ❌ Error: {e}")
|
| 231 |
+
results.append((model_name, model_type, float('inf')))
|
| 232 |
+
|
| 233 |
+
print(f"\n📊 Summary for '{test_text}':")
|
| 234 |
+
for model_name, model_type, perplexity in results:
|
| 235 |
+
if perplexity != float('inf'):
|
| 236 |
+
confidence = "High" if perplexity < 5 else "Medium" if perplexity < 15 else "Low"
|
| 237 |
+
print(f" • {model_name} ({model_type}): {perplexity:.2f} - {confidence} confidence")
|
| 238 |
+
else:
|
| 239 |
+
print(f" • {model_name} ({model_type}): Failed")
|
| 240 |
+
|
| 241 |
+
def main():
|
| 242 |
+
"""Run all demos"""
|
| 243 |
+
print("🎭 PerplexityViewer Core Functionality Demo")
|
| 244 |
+
print("This demo shows how perplexity calculation works under the hood")
|
| 245 |
+
|
| 246 |
+
try:
|
| 247 |
+
demo_decoder_perplexity()
|
| 248 |
+
demo_encoder_perplexity()
|
| 249 |
+
demo_comparison()
|
| 250 |
+
|
| 251 |
+
print("\n" + "="*60)
|
| 252 |
+
print("🎉 Demo completed successfully!")
|
| 253 |
+
print("💡 To try the interactive web interface, run: python run.py")
|
| 254 |
+
print("="*60)
|
| 255 |
+
|
| 256 |
+
except KeyboardInterrupt:
|
| 257 |
+
print("\n👋 Demo interrupted by user")
|
| 258 |
+
except Exception as e:
|
| 259 |
+
print(f"\n❌ Demo failed with error: {e}")
|
| 260 |
+
print("💡 Make sure you have installed all dependencies: pip install -r requirements.txt")
|
| 261 |
+
|
| 262 |
+
if __name__ == "__main__":
|
| 263 |
+
main()
|
mlm_demo.py
ADDED
|
@@ -0,0 +1,199 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Demo script showing how MLM probability affects encoder model analysis
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import torch
|
| 7 |
+
from transformers import AutoTokenizer, AutoModelForMaskedLM
|
| 8 |
+
import warnings
|
| 9 |
+
warnings.filterwarnings("ignore")
|
| 10 |
+
|
| 11 |
+
def demo_mlm_probability_effect():
|
| 12 |
+
"""Demonstrate how MLM probability affects the analysis"""
|
| 13 |
+
print("🎭 MLM Probability Effect Demo")
|
| 14 |
+
print("=" * 60)
|
| 15 |
+
|
| 16 |
+
# Load a BERT model
|
| 17 |
+
model_name = "distilbert-base-uncased"
|
| 18 |
+
print(f"Loading {model_name}...")
|
| 19 |
+
|
| 20 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 21 |
+
model = AutoModelForMaskedLM.from_pretrained(model_name)
|
| 22 |
+
model.eval()
|
| 23 |
+
|
| 24 |
+
# Test text
|
| 25 |
+
text = "The capital of France is Paris and it is beautiful."
|
| 26 |
+
print(f"📝 Text: {text}")
|
| 27 |
+
|
| 28 |
+
# Tokenize
|
| 29 |
+
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
|
| 30 |
+
input_ids = inputs.input_ids
|
| 31 |
+
tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
|
| 32 |
+
|
| 33 |
+
print(f"🔤 Tokens: {tokens}")
|
| 34 |
+
print()
|
| 35 |
+
|
| 36 |
+
# Test different MLM probabilities
|
| 37 |
+
mlm_probs = [0.1, 0.15, 0.3, 0.5, 0.8]
|
| 38 |
+
|
| 39 |
+
for mlm_prob in mlm_probs:
|
| 40 |
+
print(f"🎯 MLM Probability: {mlm_prob}")
|
| 41 |
+
|
| 42 |
+
# Simulate the analysis process
|
| 43 |
+
seq_length = input_ids.size(1)
|
| 44 |
+
special_token_ids = {tokenizer.cls_token_id, tokenizer.sep_token_id, tokenizer.pad_token_id}
|
| 45 |
+
|
| 46 |
+
# Count how many tokens would be analyzed
|
| 47 |
+
analyzed_count = 0
|
| 48 |
+
analyzed_tokens = []
|
| 49 |
+
|
| 50 |
+
torch.manual_seed(42) # For reproducible results
|
| 51 |
+
|
| 52 |
+
for i in range(seq_length):
|
| 53 |
+
token = tokens[i]
|
| 54 |
+
if input_ids[0, i].item() not in special_token_ids:
|
| 55 |
+
if torch.rand(1).item() < mlm_prob:
|
| 56 |
+
analyzed_count += 1
|
| 57 |
+
analyzed_tokens.append(f"'{token}'")
|
| 58 |
+
|
| 59 |
+
total_content_tokens = sum(1 for i in range(seq_length) if input_ids[0, i].item() not in special_token_ids)
|
| 60 |
+
|
| 61 |
+
print(f" 📊 Analyzed: {analyzed_count}/{total_content_tokens} content tokens ({analyzed_count/total_content_tokens*100:.1f}%)")
|
| 62 |
+
print(f" 🎯 Analyzed tokens: {', '.join(analyzed_tokens[:5])}" + (f" + {len(analyzed_tokens)-5} more" if len(analyzed_tokens) > 5 else ""))
|
| 63 |
+
print()
|
| 64 |
+
|
| 65 |
+
def simulate_perplexity_calculation():
|
| 66 |
+
"""Simulate how different MLM probabilities affect perplexity calculation"""
|
| 67 |
+
print("🧮 Perplexity Calculation Simulation")
|
| 68 |
+
print("=" * 60)
|
| 69 |
+
|
| 70 |
+
# Load model
|
| 71 |
+
model_name = "distilbert-base-uncased"
|
| 72 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 73 |
+
model = AutoModelForMaskedLM.from_pretrained(model_name)
|
| 74 |
+
model.eval()
|
| 75 |
+
|
| 76 |
+
text = "Machine learning is transforming artificial intelligence rapidly."
|
| 77 |
+
inputs = tokenizer(text, return_tensors="pt")
|
| 78 |
+
input_ids = inputs.input_ids
|
| 79 |
+
|
| 80 |
+
print(f"📝 Text: {text}")
|
| 81 |
+
print(f"🔤 Tokens: {tokenizer.convert_ids_to_tokens(input_ids[0])}")
|
| 82 |
+
print()
|
| 83 |
+
|
| 84 |
+
mlm_probs = [0.15, 0.3, 0.5]
|
| 85 |
+
|
| 86 |
+
for mlm_prob in mlm_probs:
|
| 87 |
+
print(f"🎭 MLM Probability: {mlm_prob}")
|
| 88 |
+
|
| 89 |
+
# Simulate multiple iterations
|
| 90 |
+
iteration_results = []
|
| 91 |
+
|
| 92 |
+
for iteration in range(3):
|
| 93 |
+
# Simulate masking
|
| 94 |
+
masked_input_ids = input_ids.clone()
|
| 95 |
+
original_tokens = input_ids.clone()
|
| 96 |
+
seq_length = input_ids.size(1)
|
| 97 |
+
|
| 98 |
+
mask_indices = []
|
| 99 |
+
special_token_ids = {tokenizer.cls_token_id, tokenizer.sep_token_id, tokenizer.pad_token_id}
|
| 100 |
+
|
| 101 |
+
torch.manual_seed(42 + iteration) # Different seed per iteration
|
| 102 |
+
|
| 103 |
+
for i in range(seq_length):
|
| 104 |
+
if input_ids[0, i].item() not in special_token_ids:
|
| 105 |
+
if torch.rand(1).item() < mlm_prob:
|
| 106 |
+
mask_indices.append(i)
|
| 107 |
+
masked_input_ids[0, i] = tokenizer.mask_token_id
|
| 108 |
+
|
| 109 |
+
if not mask_indices:
|
| 110 |
+
# Ensure at least one token is masked
|
| 111 |
+
non_special_indices = [i for i in range(seq_length)
|
| 112 |
+
if input_ids[0, i].item() not in special_token_ids]
|
| 113 |
+
if non_special_indices:
|
| 114 |
+
mask_idx = torch.randint(0, len(non_special_indices), (1,)).item()
|
| 115 |
+
mask_indices = [non_special_indices[mask_idx]]
|
| 116 |
+
masked_input_ids[0, mask_indices[0]] = tokenizer.mask_token_id
|
| 117 |
+
|
| 118 |
+
# Calculate pseudo-perplexity for masked tokens
|
| 119 |
+
with torch.no_grad():
|
| 120 |
+
outputs = model(masked_input_ids)
|
| 121 |
+
predictions = outputs.logits
|
| 122 |
+
|
| 123 |
+
masked_token_losses = []
|
| 124 |
+
masked_tokens = []
|
| 125 |
+
|
| 126 |
+
for idx in mask_indices:
|
| 127 |
+
target_id = original_tokens[0, idx]
|
| 128 |
+
pred_scores = predictions[0, idx]
|
| 129 |
+
prob = torch.softmax(pred_scores, dim=-1)[target_id]
|
| 130 |
+
loss = -torch.log(prob + 1e-10)
|
| 131 |
+
masked_token_losses.append(loss.item())
|
| 132 |
+
|
| 133 |
+
token = tokenizer.convert_ids_to_tokens([target_id])[0]
|
| 134 |
+
masked_tokens.append(token)
|
| 135 |
+
|
| 136 |
+
if masked_token_losses:
|
| 137 |
+
avg_loss = sum(masked_token_losses) / len(masked_token_losses)
|
| 138 |
+
perplexity = torch.exp(torch.tensor(avg_loss)).item()
|
| 139 |
+
iteration_results.append(perplexity)
|
| 140 |
+
|
| 141 |
+
print(f" Iteration {iteration + 1}: {len(mask_indices)} tokens masked")
|
| 142 |
+
print(f" Masked: {', '.join(masked_tokens[:3])}" + (f" + {len(masked_tokens)-3} more" if len(masked_tokens) > 3 else ""))
|
| 143 |
+
print(f" Pseudo-perplexity: {perplexity:.2f}")
|
| 144 |
+
|
| 145 |
+
if iteration_results:
|
| 146 |
+
avg_perplexity = sum(iteration_results) / len(iteration_results)
|
| 147 |
+
print(f" 📊 Average pseudo-perplexity: {avg_perplexity:.2f}")
|
| 148 |
+
print()
|
| 149 |
+
|
| 150 |
+
def explain_mlm_probability():
|
| 151 |
+
"""Explain what MLM probability actually does"""
|
| 152 |
+
print("💡 Understanding MLM Probability")
|
| 153 |
+
print("=" * 60)
|
| 154 |
+
|
| 155 |
+
print("""
|
| 156 |
+
🎭 **What is MLM Probability?**
|
| 157 |
+
MLM (Masked Language Modeling) probability controls what fraction of tokens
|
| 158 |
+
get randomly selected for detailed perplexity analysis.
|
| 159 |
+
|
| 160 |
+
📊 **How it works:**
|
| 161 |
+
• Low MLM prob (0.15): Analyzes ~15% of tokens randomly
|
| 162 |
+
• High MLM prob (0.5): Analyzes ~50% of tokens randomly
|
| 163 |
+
• This affects both the average perplexity AND the visualization
|
| 164 |
+
|
| 165 |
+
🎯 **Why it matters:**
|
| 166 |
+
• Higher MLM prob = More tokens analyzed = More complete picture
|
| 167 |
+
• Lower MLM prob = Fewer tokens analyzed = Faster but less comprehensive
|
| 168 |
+
• The randomness simulates real MLM training conditions
|
| 169 |
+
|
| 170 |
+
🌈 **Visual Effect:**
|
| 171 |
+
• Analyzed tokens: Colored by their actual perplexity
|
| 172 |
+
• Non-analyzed tokens: Shown in gray (baseline)
|
| 173 |
+
• Try 0.15 vs 0.5 to see the difference!
|
| 174 |
+
|
| 175 |
+
⚖️ **Trade-offs:**
|
| 176 |
+
• MLM 0.15: Fast, matches BERT training, but sparse analysis
|
| 177 |
+
• MLM 0.5: Slower, more comprehensive, but artificial
|
| 178 |
+
• MLM 0.8: Very slow, nearly complete, but unrealistic
|
| 179 |
+
""")
|
| 180 |
+
|
| 181 |
+
def main():
|
| 182 |
+
"""Run MLM probability demonstration"""
|
| 183 |
+
try:
|
| 184 |
+
explain_mlm_probability()
|
| 185 |
+
demo_mlm_probability_effect()
|
| 186 |
+
simulate_perplexity_calculation()
|
| 187 |
+
|
| 188 |
+
print("🎉 MLM Probability Demo Complete!")
|
| 189 |
+
print("💡 Now try the app with different MLM probabilities:")
|
| 190 |
+
print(" • Use 0.15 for standard analysis")
|
| 191 |
+
print(" • Use 0.5 for more comprehensive analysis")
|
| 192 |
+
print(" • Watch how the visualization changes!")
|
| 193 |
+
|
| 194 |
+
except Exception as e:
|
| 195 |
+
print(f"❌ Demo failed: {e}")
|
| 196 |
+
print("💡 Make sure you have transformers installed: pip install transformers")
|
| 197 |
+
|
| 198 |
+
if __name__ == "__main__":
|
| 199 |
+
main()
|
simple_color_test.py
ADDED
|
@@ -0,0 +1,147 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Simple test to verify color visualization is working (no external dependencies)
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
def test_color_html():
|
| 7 |
+
"""Test the HTML color generation without imports"""
|
| 8 |
+
print("🎨 Testing Color HTML Generation")
|
| 9 |
+
print("=" * 50)
|
| 10 |
+
|
| 11 |
+
# Simple test data
|
| 12 |
+
tokens = ["The", "quick", "brown", "fox"]
|
| 13 |
+
perplexities = [1.2, 5.8, 12.3, 2.1]
|
| 14 |
+
|
| 15 |
+
# Manual color generation test (similar to app logic)
|
| 16 |
+
max_perplexity = max(perplexities)
|
| 17 |
+
normalized_perps = [p / max_perplexity for p in perplexities]
|
| 18 |
+
|
| 19 |
+
print(f"Tokens: {tokens}")
|
| 20 |
+
print(f"Perplexities: {perplexities}")
|
| 21 |
+
print(f"Normalized: {[f'{n:.2f}' for n in normalized_perps]}")
|
| 22 |
+
|
| 23 |
+
# Test HTML generation
|
| 24 |
+
html_parts = ['<div>']
|
| 25 |
+
|
| 26 |
+
for i, (token, perp, norm_perp) in enumerate(zip(tokens, perplexities, normalized_perps)):
|
| 27 |
+
# Simple color mapping
|
| 28 |
+
if norm_perp < 0.3: # Green
|
| 29 |
+
red, green, blue = 46, 204, 113
|
| 30 |
+
elif norm_perp < 0.7: # Yellow
|
| 31 |
+
red, green, blue = 241, 196, 15
|
| 32 |
+
else: # Red
|
| 33 |
+
red, green, blue = 231, 76, 60
|
| 34 |
+
|
| 35 |
+
html_parts.append(
|
| 36 |
+
f'<span style="background-color: rgba({red}, {green}, {blue}, 0.7); '
|
| 37 |
+
f'padding: 2px 4px; margin: 1px; border-radius: 3px;" '
|
| 38 |
+
f'title="Perplexity: {perp}">{token}</span> '
|
| 39 |
+
)
|
| 40 |
+
|
| 41 |
+
html_parts.append('</div>')
|
| 42 |
+
html = ''.join(html_parts)
|
| 43 |
+
|
| 44 |
+
print(f"\nGenerated HTML:")
|
| 45 |
+
print(html)
|
| 46 |
+
|
| 47 |
+
# Basic checks
|
| 48 |
+
assert 'background-color' in html, "No background-color in HTML"
|
| 49 |
+
assert 'rgba(' in html, "No rgba colors in HTML"
|
| 50 |
+
assert 'title=' in html, "No tooltip in HTML"
|
| 51 |
+
|
| 52 |
+
print("\n✅ Basic HTML generation works!")
|
| 53 |
+
print("✅ Colors are included in the HTML!")
|
| 54 |
+
print("✅ Tooltips are included!")
|
| 55 |
+
|
| 56 |
+
return html
|
| 57 |
+
|
| 58 |
+
def create_test_html_file():
|
| 59 |
+
"""Create a test HTML file to visually verify colors"""
|
| 60 |
+
html_content = """
|
| 61 |
+
<!DOCTYPE html>
|
| 62 |
+
<html>
|
| 63 |
+
<head>
|
| 64 |
+
<title>Color Test</title>
|
| 65 |
+
<style>
|
| 66 |
+
body { font-family: Arial, sans-serif; margin: 20px; }
|
| 67 |
+
.test-section { margin: 20px 0; padding: 15px; border: 1px solid #ccc; }
|
| 68 |
+
</style>
|
| 69 |
+
</head>
|
| 70 |
+
<body>
|
| 71 |
+
<h1>🎨 Perplexity Color Test</h1>
|
| 72 |
+
|
| 73 |
+
<div class="test-section">
|
| 74 |
+
<h2>Low Perplexity (Green - Confident)</h2>
|
| 75 |
+
<span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.2">The</span>
|
| 76 |
+
<span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.8">quick</span>
|
| 77 |
+
<span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 2.1">brown</span>
|
| 78 |
+
</div>
|
| 79 |
+
|
| 80 |
+
<div class="test-section">
|
| 81 |
+
<h2>Medium Perplexity (Yellow - Uncertain)</h2>
|
| 82 |
+
<span style="background-color: rgba(241, 196, 15, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 5.4">machine</span>
|
| 83 |
+
<span style="background-color: rgba(241, 196, 15, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 7.2">learning</span>
|
| 84 |
+
<span style="background-color: rgba(241, 196, 15, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 8.9">requires</span>
|
| 85 |
+
</div>
|
| 86 |
+
|
| 87 |
+
<div class="test-section">
|
| 88 |
+
<h2>High Perplexity (Red - Very Uncertain)</h2>
|
| 89 |
+
<span style="background-color: rgba(231, 76, 60, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 15.7">quantum</span>
|
| 90 |
+
<span style="background-color: rgba(231, 76, 60, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 23.4">entanglement</span>
|
| 91 |
+
<span style="background-color: rgba(231, 76, 60, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 31.2">defies</span>
|
| 92 |
+
</div>
|
| 93 |
+
|
| 94 |
+
<div class="test-section">
|
| 95 |
+
<h2>Mixed Example Sentence</h2>
|
| 96 |
+
<span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.2">The</span>
|
| 97 |
+
<span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 2.3">capital</span>
|
| 98 |
+
<span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.8">of</span>
|
| 99 |
+
<span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 2.1">France</span>
|
| 100 |
+
<span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.5">is</span>
|
| 101 |
+
<span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.9">Paris</span>
|
| 102 |
+
</div>
|
| 103 |
+
|
| 104 |
+
<p><strong>Instructions:</strong> Hover over tokens to see perplexity values in tooltips!</p>
|
| 105 |
+
<p><strong>Color Legend:</strong></p>
|
| 106 |
+
<ul>
|
| 107 |
+
<li>🟢 <strong>Green:</strong> Low perplexity (model is confident)</li>
|
| 108 |
+
<li>🟡 <strong>Yellow:</strong> Medium perplexity (model is somewhat uncertain)</li>
|
| 109 |
+
<li>🔴 <strong>Red:</strong> High perplexity (model is very uncertain)</li>
|
| 110 |
+
</ul>
|
| 111 |
+
</body>
|
| 112 |
+
</html>
|
| 113 |
+
"""
|
| 114 |
+
|
| 115 |
+
with open("color_test.html", "w") as f:
|
| 116 |
+
f.write(html_content)
|
| 117 |
+
|
| 118 |
+
print("💾 Created 'color_test.html' - open this in your browser!")
|
| 119 |
+
print(" You should see colored text with different backgrounds")
|
| 120 |
+
|
| 121 |
+
def main():
|
| 122 |
+
"""Run the simple color test"""
|
| 123 |
+
try:
|
| 124 |
+
print("🎨 Simple Color Visualization Test")
|
| 125 |
+
print("=" * 60)
|
| 126 |
+
|
| 127 |
+
# Test HTML generation
|
| 128 |
+
html = test_color_html()
|
| 129 |
+
|
| 130 |
+
# Create visual test file
|
| 131 |
+
create_test_html_file()
|
| 132 |
+
|
| 133 |
+
print("\n" + "=" * 60)
|
| 134 |
+
print("🎉 Color test completed successfully!")
|
| 135 |
+
print("🌈 Open 'color_test.html' in your browser to see the colors")
|
| 136 |
+
print("💡 If colors show up there, they should work in the app too!")
|
| 137 |
+
print("=" * 60)
|
| 138 |
+
|
| 139 |
+
return True
|
| 140 |
+
|
| 141 |
+
except Exception as e:
|
| 142 |
+
print(f"❌ Test failed: {e}")
|
| 143 |
+
return False
|
| 144 |
+
|
| 145 |
+
if __name__ == "__main__":
|
| 146 |
+
success = main()
|
| 147 |
+
exit(0 if success else 1)
|
test_app.py
ADDED
|
@@ -0,0 +1,271 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Test script for PerplexityViewer app
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import sys
|
| 7 |
+
import os
|
| 8 |
+
import torch
|
| 9 |
+
import numpy as np
|
| 10 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForMaskedLM
|
| 11 |
+
|
| 12 |
+
# Add the current directory to the path so we can import the app
|
| 13 |
+
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
|
| 14 |
+
|
| 15 |
+
try:
|
| 16 |
+
from app import (
|
| 17 |
+
load_model_and_tokenizer,
|
| 18 |
+
calculate_decoder_perplexity,
|
| 19 |
+
calculate_encoder_perplexity,
|
| 20 |
+
create_visualization,
|
| 21 |
+
process_text
|
| 22 |
+
)
|
| 23 |
+
from config import DEFAULT_MODELS, PROCESSING_SETTINGS
|
| 24 |
+
except ImportError as e:
|
| 25 |
+
print(f"Error importing app modules: {e}")
|
| 26 |
+
sys.exit(1)
|
| 27 |
+
|
| 28 |
+
def test_model_loading():
|
| 29 |
+
"""Test model and tokenizer loading"""
|
| 30 |
+
print("Testing model loading...")
|
| 31 |
+
|
| 32 |
+
# Test decoder model
|
| 33 |
+
try:
|
| 34 |
+
model, tokenizer = load_model_and_tokenizer("distilgpt2", "decoder")
|
| 35 |
+
print("✓ Decoder model (distilgpt2) loaded successfully")
|
| 36 |
+
assert model is not None
|
| 37 |
+
assert tokenizer is not None
|
| 38 |
+
except Exception as e:
|
| 39 |
+
print(f"✗ Failed to load decoder model: {e}")
|
| 40 |
+
return False
|
| 41 |
+
|
| 42 |
+
# Test encoder model
|
| 43 |
+
try:
|
| 44 |
+
model, tokenizer = load_model_and_tokenizer("distilbert-base-uncased", "encoder")
|
| 45 |
+
print("✓ Encoder model (distilbert-base-uncased) loaded successfully")
|
| 46 |
+
assert model is not None
|
| 47 |
+
assert tokenizer is not None
|
| 48 |
+
except Exception as e:
|
| 49 |
+
print(f"✗ Failed to load encoder model: {e}")
|
| 50 |
+
return False
|
| 51 |
+
|
| 52 |
+
return True
|
| 53 |
+
|
| 54 |
+
def test_decoder_perplexity():
|
| 55 |
+
"""Test decoder perplexity calculation"""
|
| 56 |
+
print("\nTesting decoder perplexity calculation...")
|
| 57 |
+
|
| 58 |
+
try:
|
| 59 |
+
model, tokenizer = load_model_and_tokenizer("distilgpt2", "decoder")
|
| 60 |
+
text = "The quick brown fox jumps over the lazy dog."
|
| 61 |
+
|
| 62 |
+
avg_perp, tokens, token_perps = calculate_decoder_perplexity(text, model, tokenizer, iterations=1)
|
| 63 |
+
|
| 64 |
+
print(f"✓ Average perplexity: {avg_perp:.4f}")
|
| 65 |
+
print(f"✓ Number of tokens: {len(tokens)}")
|
| 66 |
+
print(f"✓ Token perplexities shape: {token_perps.shape}")
|
| 67 |
+
|
| 68 |
+
assert avg_perp > 0
|
| 69 |
+
assert len(tokens) > 0
|
| 70 |
+
assert len(token_perps) == len(tokens)
|
| 71 |
+
assert all(p > 0 for p in token_perps)
|
| 72 |
+
|
| 73 |
+
return True
|
| 74 |
+
except Exception as e:
|
| 75 |
+
print(f"✗ Decoder perplexity test failed: {e}")
|
| 76 |
+
return False
|
| 77 |
+
|
| 78 |
+
def test_encoder_perplexity():
|
| 79 |
+
"""Test encoder perplexity calculation"""
|
| 80 |
+
print("\nTesting encoder perplexity calculation...")
|
| 81 |
+
|
| 82 |
+
try:
|
| 83 |
+
model, tokenizer = load_model_and_tokenizer("distilbert-base-uncased", "encoder")
|
| 84 |
+
text = "The capital of France is Paris."
|
| 85 |
+
|
| 86 |
+
avg_perp, tokens, token_perps = calculate_encoder_perplexity(
|
| 87 |
+
text, model, tokenizer, mlm_probability=0.15, iterations=1
|
| 88 |
+
)
|
| 89 |
+
|
| 90 |
+
print(f"✓ Average pseudo-perplexity: {avg_perp:.4f}")
|
| 91 |
+
print(f"✓ Number of tokens: {len(tokens)}")
|
| 92 |
+
print(f"✓ Token perplexities shape: {token_perps.shape}")
|
| 93 |
+
|
| 94 |
+
assert avg_perp > 0
|
| 95 |
+
assert len(tokens) > 0
|
| 96 |
+
assert len(token_perps) == len(tokens)
|
| 97 |
+
assert all(p > 0 for p in token_perps)
|
| 98 |
+
|
| 99 |
+
return True
|
| 100 |
+
except Exception as e:
|
| 101 |
+
print(f"✗ Encoder perplexity test failed: {e}")
|
| 102 |
+
return False
|
| 103 |
+
|
| 104 |
+
def test_visualization():
|
| 105 |
+
"""Test visualization creation"""
|
| 106 |
+
print("\nTesting visualization creation...")
|
| 107 |
+
|
| 108 |
+
try:
|
| 109 |
+
# Create dummy data
|
| 110 |
+
tokens = ["The", "quick", "brown", "fox", "jumps"]
|
| 111 |
+
perplexities = np.array([2.5, 1.8, 3.2, 4.1, 2.9])
|
| 112 |
+
|
| 113 |
+
html = create_visualization(tokens, perplexities)
|
| 114 |
+
|
| 115 |
+
print("✓ Visualization HTML generated")
|
| 116 |
+
assert isinstance(html, str)
|
| 117 |
+
assert len(html) > 0
|
| 118 |
+
assert "ent" in html.lower() # displaCy entity visualization
|
| 119 |
+
|
| 120 |
+
return True
|
| 121 |
+
except Exception as e:
|
| 122 |
+
print(f"✗ Visualization test failed: {e}")
|
| 123 |
+
return False
|
| 124 |
+
|
| 125 |
+
def test_edge_cases():
|
| 126 |
+
"""Test edge cases and error handling"""
|
| 127 |
+
print("\nTesting edge cases...")
|
| 128 |
+
|
| 129 |
+
# Test empty text
|
| 130 |
+
try:
|
| 131 |
+
summary, viz, table = process_text("", "distilgpt2", "decoder", 1, 0.15)
|
| 132 |
+
assert "enter some text" in summary.lower()
|
| 133 |
+
print("✓ Empty text handled correctly")
|
| 134 |
+
except Exception as e:
|
| 135 |
+
print(f"✗ Empty text test failed: {e}")
|
| 136 |
+
return False
|
| 137 |
+
|
| 138 |
+
# Test very short text
|
| 139 |
+
try:
|
| 140 |
+
model, tokenizer = load_model_and_tokenizer("distilgpt2", "decoder")
|
| 141 |
+
text = "Hi"
|
| 142 |
+
avg_perp, tokens, token_perps = calculate_decoder_perplexity(text, model, tokenizer, iterations=1)
|
| 143 |
+
print(f"✓ Short text handled: {len(tokens)} tokens")
|
| 144 |
+
except Exception as e:
|
| 145 |
+
print(f"✓ Short text error handled correctly: {e}")
|
| 146 |
+
|
| 147 |
+
# Test long text (should be truncated)
|
| 148 |
+
try:
|
| 149 |
+
long_text = " ".join(["word"] * 600) # More than max_length
|
| 150 |
+
model, tokenizer = load_model_and_tokenizer("distilgpt2", "decoder")
|
| 151 |
+
avg_perp, tokens, token_perps = calculate_decoder_perplexity(long_text, model, tokenizer, iterations=1)
|
| 152 |
+
print(f"✓ Long text truncated to {len(tokens)} tokens")
|
| 153 |
+
assert len(tokens) <= 512 # Should be truncated
|
| 154 |
+
except Exception as e:
|
| 155 |
+
print(f"✗ Long text test failed: {e}")
|
| 156 |
+
return False
|
| 157 |
+
|
| 158 |
+
return True
|
| 159 |
+
|
| 160 |
+
def test_process_text_integration():
|
| 161 |
+
"""Test the main process_text function"""
|
| 162 |
+
print("\nTesting process_text integration...")
|
| 163 |
+
|
| 164 |
+
test_cases = [
|
| 165 |
+
{
|
| 166 |
+
"text": "The quick brown fox jumps over the lazy dog.",
|
| 167 |
+
"model": "distilgpt2",
|
| 168 |
+
"type": "decoder",
|
| 169 |
+
"iterations": 1,
|
| 170 |
+
"mlm_prob": 0.15
|
| 171 |
+
},
|
| 172 |
+
{
|
| 173 |
+
"text": "Machine learning is a subset of artificial intelligence.",
|
| 174 |
+
"model": "distilbert-base-uncased",
|
| 175 |
+
"type": "encoder",
|
| 176 |
+
"iterations": 1,
|
| 177 |
+
"mlm_prob": 0.2
|
| 178 |
+
}
|
| 179 |
+
]
|
| 180 |
+
|
| 181 |
+
for i, case in enumerate(test_cases):
|
| 182 |
+
try:
|
| 183 |
+
summary, viz_html, df = process_text(
|
| 184 |
+
case["text"],
|
| 185 |
+
case["model"],
|
| 186 |
+
case["type"],
|
| 187 |
+
case["iterations"],
|
| 188 |
+
case["mlm_prob"]
|
| 189 |
+
)
|
| 190 |
+
|
| 191 |
+
print(f"✓ Test case {i+1} ({case['type']}) processed successfully")
|
| 192 |
+
assert "Analysis Results" in summary
|
| 193 |
+
assert len(viz_html) > 0
|
| 194 |
+
assert len(df) > 0
|
| 195 |
+
|
| 196 |
+
except Exception as e:
|
| 197 |
+
print(f"✗ Test case {i+1} failed: {e}")
|
| 198 |
+
return False
|
| 199 |
+
|
| 200 |
+
return True
|
| 201 |
+
|
| 202 |
+
def test_configuration():
|
| 203 |
+
"""Test configuration loading"""
|
| 204 |
+
print("\nTesting configuration...")
|
| 205 |
+
|
| 206 |
+
try:
|
| 207 |
+
assert "decoder" in DEFAULT_MODELS
|
| 208 |
+
assert "encoder" in DEFAULT_MODELS
|
| 209 |
+
assert len(DEFAULT_MODELS["decoder"]) > 0
|
| 210 |
+
assert len(DEFAULT_MODELS["encoder"]) > 0
|
| 211 |
+
assert PROCESSING_SETTINGS["default_iterations"] >= 1
|
| 212 |
+
print("✓ Configuration loaded correctly")
|
| 213 |
+
return True
|
| 214 |
+
except Exception as e:
|
| 215 |
+
print(f"✗ Configuration test failed: {e}")
|
| 216 |
+
return False
|
| 217 |
+
|
| 218 |
+
def run_all_tests():
|
| 219 |
+
"""Run all tests"""
|
| 220 |
+
print("="*50)
|
| 221 |
+
print("Running PerplexityViewer Tests")
|
| 222 |
+
print("="*50)
|
| 223 |
+
|
| 224 |
+
tests = [
|
| 225 |
+
("Configuration", test_configuration),
|
| 226 |
+
("Model Loading", test_model_loading),
|
| 227 |
+
("Decoder Perplexity", test_decoder_perplexity),
|
| 228 |
+
("Encoder Perplexity", test_encoder_perplexity),
|
| 229 |
+
("Visualization", test_visualization),
|
| 230 |
+
("Edge Cases", test_edge_cases),
|
| 231 |
+
("Integration", test_process_text_integration)
|
| 232 |
+
]
|
| 233 |
+
|
| 234 |
+
passed = 0
|
| 235 |
+
failed = 0
|
| 236 |
+
|
| 237 |
+
for test_name, test_func in tests:
|
| 238 |
+
print(f"\n[{test_name}]")
|
| 239 |
+
try:
|
| 240 |
+
if test_func():
|
| 241 |
+
passed += 1
|
| 242 |
+
print(f"✓ {test_name} PASSED")
|
| 243 |
+
else:
|
| 244 |
+
failed += 1
|
| 245 |
+
print(f"✗ {test_name} FAILED")
|
| 246 |
+
except Exception as e:
|
| 247 |
+
failed += 1
|
| 248 |
+
print(f"✗ {test_name} FAILED with exception: {e}")
|
| 249 |
+
|
| 250 |
+
print("\n" + "="*50)
|
| 251 |
+
print(f"Test Results: {passed} passed, {failed} failed")
|
| 252 |
+
print("="*50)
|
| 253 |
+
|
| 254 |
+
return failed == 0
|
| 255 |
+
|
| 256 |
+
if __name__ == "__main__":
|
| 257 |
+
# Check if PyTorch is available
|
| 258 |
+
print(f"PyTorch version: {torch.__version__}")
|
| 259 |
+
print(f"CUDA available: {torch.cuda.is_available()}")
|
| 260 |
+
if torch.cuda.is_available():
|
| 261 |
+
print(f"CUDA device: {torch.cuda.get_device_name()}")
|
| 262 |
+
|
| 263 |
+
# Run tests
|
| 264 |
+
success = run_all_tests()
|
| 265 |
+
|
| 266 |
+
if success:
|
| 267 |
+
print("\n🎉 All tests passed! The app should work correctly.")
|
| 268 |
+
sys.exit(0)
|
| 269 |
+
else:
|
| 270 |
+
print("\n❌ Some tests failed. Please check the errors above.")
|
| 271 |
+
sys.exit(1)
|
test_colors.py
ADDED
|
@@ -0,0 +1,198 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Test script to verify color visualization is working correctly
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import numpy as np
|
| 7 |
+
import re
|
| 8 |
+
from app import create_visualization
|
| 9 |
+
|
| 10 |
+
def test_color_visualization():
|
| 11 |
+
"""Test that the visualization creates colored HTML"""
|
| 12 |
+
print("🎨 Testing Color Visualization")
|
| 13 |
+
print("=" * 50)
|
| 14 |
+
|
| 15 |
+
# Test with sample data
|
| 16 |
+
tokens = ["The", "quick", "brown", "fox", "jumps", "over", "lazy", "dog"]
|
| 17 |
+
perplexities = np.array([1.2, 2.5, 8.3, 3.1, 15.7, 2.0, 12.4, 1.8])
|
| 18 |
+
|
| 19 |
+
print(f"📝 Tokens: {tokens}")
|
| 20 |
+
print(f"📊 Perplexities: {perplexities}")
|
| 21 |
+
|
| 22 |
+
# Generate visualization
|
| 23 |
+
html = create_visualization(tokens, perplexities)
|
| 24 |
+
|
| 25 |
+
# Check that HTML was generated
|
| 26 |
+
assert len(html) > 100, "HTML output too short"
|
| 27 |
+
print("✅ HTML generated successfully")
|
| 28 |
+
|
| 29 |
+
# Check for color information in HTML
|
| 30 |
+
color_pattern = r'rgba?\(\d+,\s*\d+,\s*\d+(?:,\s*[\d.]+)?\)'
|
| 31 |
+
colors_found = re.findall(color_pattern, html)
|
| 32 |
+
|
| 33 |
+
print(f"🎨 Colors found in HTML: {len(colors_found)}")
|
| 34 |
+
for i, color in enumerate(colors_found[:5]): # Show first 5
|
| 35 |
+
print(f" Color {i+1}: {color}")
|
| 36 |
+
|
| 37 |
+
assert len(colors_found) > 0, "No colors found in HTML output"
|
| 38 |
+
print("✅ Color information found in HTML")
|
| 39 |
+
|
| 40 |
+
# Check for span elements with style attributes
|
| 41 |
+
span_pattern = r'<span style="[^"]*background-color[^"]*"[^>]*>'
|
| 42 |
+
spans_found = re.findall(span_pattern, html)
|
| 43 |
+
|
| 44 |
+
print(f"🏷️ Styled spans found: {len(spans_found)}")
|
| 45 |
+
assert len(spans_found) >= len(tokens) - 2, "Not enough styled spans found" # Allow for some filtering
|
| 46 |
+
print("✅ Styled spans with background colors found")
|
| 47 |
+
|
| 48 |
+
# Check for tooltip information
|
| 49 |
+
assert 'title="Perplexity:' in html, "No tooltip information found"
|
| 50 |
+
print("✅ Tooltip information found")
|
| 51 |
+
|
| 52 |
+
# Verify different colors for different perplexity ranges
|
| 53 |
+
# Extract RGB values
|
| 54 |
+
rgb_values = []
|
| 55 |
+
for color in colors_found:
|
| 56 |
+
# Extract numbers from rgba(r,g,b,a) or rgb(r,g,b)
|
| 57 |
+
numbers = re.findall(r'\d+', color)
|
| 58 |
+
if len(numbers) >= 3:
|
| 59 |
+
rgb_values.append((int(numbers[0]), int(numbers[1]), int(numbers[2])))
|
| 60 |
+
|
| 61 |
+
if len(rgb_values) >= 2:
|
| 62 |
+
# Check that we have different colors (not all the same)
|
| 63 |
+
unique_colors = set(rgb_values)
|
| 64 |
+
print(f"🌈 Unique colors found: {len(unique_colors)}")
|
| 65 |
+
assert len(unique_colors) > 1, "All tokens have the same color"
|
| 66 |
+
print("✅ Multiple different colors found")
|
| 67 |
+
|
| 68 |
+
# Check color range makes sense
|
| 69 |
+
red_values = [r for r, g, b in rgb_values]
|
| 70 |
+
green_values = [g for r, g, b in rgb_values]
|
| 71 |
+
|
| 72 |
+
print(f"🔴 Red range: {min(red_values)} - {max(red_values)}")
|
| 73 |
+
print(f"🟢 Green range: {min(green_values)} - {max(green_values)}")
|
| 74 |
+
|
| 75 |
+
# Should have variation in color channels
|
| 76 |
+
assert max(red_values) - min(red_values) > 20, "Not enough red variation"
|
| 77 |
+
print("✅ Sufficient color variation found")
|
| 78 |
+
|
| 79 |
+
return html
|
| 80 |
+
|
| 81 |
+
def test_edge_cases():
|
| 82 |
+
"""Test edge cases for color visualization"""
|
| 83 |
+
print("\n🧪 Testing Edge Cases")
|
| 84 |
+
print("=" * 50)
|
| 85 |
+
|
| 86 |
+
# Test with very high perplexities
|
| 87 |
+
tokens = ["unusual", "words", "here"]
|
| 88 |
+
high_perplexities = np.array([100.0, 200.0, 50.0])
|
| 89 |
+
|
| 90 |
+
html = create_visualization(tokens, high_perplexities)
|
| 91 |
+
assert len(html) > 50, "HTML too short for high perplexities"
|
| 92 |
+
print("✅ High perplexity values handled")
|
| 93 |
+
|
| 94 |
+
# Test with very low perplexities
|
| 95 |
+
low_perplexities = np.array([0.1, 0.2, 0.15])
|
| 96 |
+
html = create_visualization(tokens, low_perplexities)
|
| 97 |
+
assert len(html) > 50, "HTML too short for low perplexities"
|
| 98 |
+
print("✅ Low perplexity values handled")
|
| 99 |
+
|
| 100 |
+
# Test with single token
|
| 101 |
+
single_token = ["word"]
|
| 102 |
+
single_perplexity = np.array([5.0])
|
| 103 |
+
html = create_visualization(single_token, single_perplexity)
|
| 104 |
+
assert len(html) > 50, "HTML too short for single token"
|
| 105 |
+
print("✅ Single token handled")
|
| 106 |
+
|
| 107 |
+
# Test with empty input
|
| 108 |
+
empty_html = create_visualization([], np.array([]))
|
| 109 |
+
assert "No tokens" in empty_html, "Empty case not handled properly"
|
| 110 |
+
print("✅ Empty input handled")
|
| 111 |
+
|
| 112 |
+
def test_color_gradient():
|
| 113 |
+
"""Test that color gradient works as expected"""
|
| 114 |
+
print("\n🌈 Testing Color Gradient")
|
| 115 |
+
print("=" * 50)
|
| 116 |
+
|
| 117 |
+
# Create tokens with ascending perplexities
|
| 118 |
+
tokens = [f"token_{i}" for i in range(10)]
|
| 119 |
+
perplexities = np.array([i * 2.0 + 1.0 for i in range(10)]) # 1, 3, 5, 7, 9, 11, 13, 15, 17, 19
|
| 120 |
+
|
| 121 |
+
html = create_visualization(tokens, perplexities)
|
| 122 |
+
|
| 123 |
+
# Extract all RGB values in order
|
| 124 |
+
color_pattern = r'rgba?\((\d+),\s*(\d+),\s*(\d+)(?:,\s*[\d.]+)?\)'
|
| 125 |
+
colors_found = re.findall(color_pattern, html)
|
| 126 |
+
|
| 127 |
+
if len(colors_found) >= 5:
|
| 128 |
+
# Convert to numeric values
|
| 129 |
+
rgb_values = [(int(r), int(g), int(b)) for r, g, b in colors_found]
|
| 130 |
+
|
| 131 |
+
# Check that low perplexity tokens are more green
|
| 132 |
+
low_perp_color = rgb_values[0] # First token (lowest perplexity)
|
| 133 |
+
high_perp_color = rgb_values[-1] # Last token (highest perplexity)
|
| 134 |
+
|
| 135 |
+
print(f"🟢 Low perplexity color (perp={perplexities[0]:.1f}): RGB{low_perp_color}")
|
| 136 |
+
print(f"🔴 High perplexity color (perp={perplexities[-1]:.1f}): RGB{high_perp_color}")
|
| 137 |
+
|
| 138 |
+
# Low perplexity should be more green (higher green value)
|
| 139 |
+
# High perplexity should be more red (higher red value)
|
| 140 |
+
if low_perp_color[1] > high_perp_color[1]: # Green component
|
| 141 |
+
print("✅ Low perplexity tokens are greener")
|
| 142 |
+
else:
|
| 143 |
+
print("⚠️ Expected low perplexity to be greener")
|
| 144 |
+
|
| 145 |
+
if high_perp_color[0] > low_perp_color[0]: # Red component
|
| 146 |
+
print("✅ High perplexity tokens are redder")
|
| 147 |
+
else:
|
| 148 |
+
print("⚠️ Expected high perplexity to be redder")
|
| 149 |
+
|
| 150 |
+
def main():
|
| 151 |
+
"""Run all color visualization tests"""
|
| 152 |
+
print("🎨 Color Visualization Test Suite")
|
| 153 |
+
print("=" * 60)
|
| 154 |
+
|
| 155 |
+
try:
|
| 156 |
+
# Test basic functionality
|
| 157 |
+
html = test_color_visualization()
|
| 158 |
+
|
| 159 |
+
# Test edge cases
|
| 160 |
+
test_edge_cases()
|
| 161 |
+
|
| 162 |
+
# Test color gradient
|
| 163 |
+
test_color_gradient()
|
| 164 |
+
|
| 165 |
+
print("\n" + "=" * 60)
|
| 166 |
+
print("🎉 All color visualization tests passed!")
|
| 167 |
+
print("🌈 The tokens should now appear with colored backgrounds!")
|
| 168 |
+
print("=" * 60)
|
| 169 |
+
|
| 170 |
+
# Save a sample HTML file for manual inspection
|
| 171 |
+
with open("sample_visualization.html", "w") as f:
|
| 172 |
+
f.write(f"""
|
| 173 |
+
<!DOCTYPE html>
|
| 174 |
+
<html>
|
| 175 |
+
<head>
|
| 176 |
+
<title>Sample Perplexity Visualization</title>
|
| 177 |
+
</head>
|
| 178 |
+
<body>
|
| 179 |
+
<h1>Sample Perplexity Visualization</h1>
|
| 180 |
+
<p>This is what the colored visualization should look like:</p>
|
| 181 |
+
{html}
|
| 182 |
+
</body>
|
| 183 |
+
</html>
|
| 184 |
+
""")
|
| 185 |
+
print("💾 Sample visualization saved to 'sample_visualization.html'")
|
| 186 |
+
print(" Open this file in your browser to see the colors!")
|
| 187 |
+
|
| 188 |
+
return True
|
| 189 |
+
|
| 190 |
+
except Exception as e:
|
| 191 |
+
print(f"\n❌ Color visualization test failed: {e}")
|
| 192 |
+
import traceback
|
| 193 |
+
traceback.print_exc()
|
| 194 |
+
return False
|
| 195 |
+
|
| 196 |
+
if __name__ == "__main__":
|
| 197 |
+
success = main()
|
| 198 |
+
exit(0 if success else 1)
|