Bram van Es commited on
Commit
ef12530
·
1 Parent(s): 797ad44
ITERATIONS_REMOVAL_SUMMARY.md ADDED
@@ -0,0 +1,189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🎯 Iterations Removal Summary - Final Simplification
2
+
3
+ ## Change Request
4
+ The user correctly identified that since we now **mask one token at a time** for comprehensive analysis, there's **no need for a settable number of iterations**. This final simplification removes the iterations slider for the cleanest possible interface.
5
+
6
+ ## Rationale
7
+
8
+ ### Why Iterations Made Sense Before
9
+ - **Random sampling**: When using MLM probability, we needed multiple iterations to get stable averages
10
+ - **Statistical variance**: Random token selection meant results could vary between runs
11
+ - **Confidence intervals**: Multiple iterations helped estimate uncertainty
12
+
13
+ ### Why Iterations Are Unnecessary Now
14
+ - **Deterministic analysis**: Each token is individually masked and analyzed
15
+ - **Complete coverage**: All content tokens are processed in a single pass
16
+ - **No randomness**: Results are identical on every run
17
+ - **Comprehensive by design**: Single iteration gives the complete picture
18
+
19
+ ## What Was Removed
20
+
21
+ ### 1. Iterations Slider
22
+ - **Before**: User could set iterations from 1-10
23
+ - **After**: No slider, single automatic analysis
24
+
25
+ ### 2. Iteration Logic
26
+ - **Before**: Loop through iterations, calculate averages
27
+ - **After**: Direct single-pass calculation
28
+
29
+ ### 3. Statistical Averaging
30
+ - **Before**: Average perplexity across multiple random samples
31
+ - **After**: Direct perplexity calculation from comprehensive analysis
32
+
33
+ ## Code Changes Made
34
+
35
+ ### Function Signatures Simplified
36
+ ```python
37
+ # OLD
38
+ def calculate_decoder_perplexity(text, model, tokenizer, iterations=1)
39
+ def calculate_encoder_perplexity(text, model, tokenizer, iterations=1)
40
+ def process_text(text, model_name, model_type, iterations)
41
+
42
+ # NEW
43
+ def calculate_decoder_perplexity(text, model, tokenizer)
44
+ def calculate_encoder_perplexity(text, model, tokenizer)
45
+ def process_text(text, model_name, model_type)
46
+ ```
47
+
48
+ ### Decoder Model Changes
49
+ - **Before**: Multiple forward passes, average the losses
50
+ - **After**: Single forward pass, direct perplexity calculation
51
+ - **Result**: Faster and equally accurate
52
+
53
+ ### Encoder Model Changes
54
+ - **Before**: Multiple iterations of random masking + averaging
55
+ - **After**: Single comprehensive pass masking each token
56
+ - **Result**: More accurate and deterministic
57
+
58
+ ### UI Changes
59
+ - **Removed**: Iterations slider and related controls
60
+ - **Simplified**: Function calls and event handlers
61
+ - **Cleaner**: Examples no longer include iterations parameter
62
+
63
+ ## Performance Impact
64
+
65
+ ### Decoder Models (GPT, etc.)
66
+ - ✅ **Faster**: No redundant iterations
67
+ - ✅ **Same accuracy**: Single pass gives true perplexity
68
+ - ✅ **Deterministic**: Consistent results every time
69
+
70
+ ### Encoder Models (BERT, etc.)
71
+ - ✅ **More accurate**: Every token analyzed vs. random sampling
72
+ - ✅ **Deterministic**: No statistical variance
73
+ - ✅ **Comprehensive**: Complete picture in single pass
74
+ - ⚠️ **Slightly slower**: But more thorough analysis
75
+
76
+ ## User Experience
77
+
78
+ ### Before (Confusing)
79
+ 1. Enter text
80
+ 2. Choose model
81
+ 3. Adjust iterations (why?)
82
+ 4. Analyze
83
+ 5. Wonder if more iterations would be better
84
+
85
+ ### After (Simple)
86
+ 1. Enter text
87
+ 2. Choose model
88
+ 3. Analyze
89
+ 4. Get complete results immediately
90
+
91
+ ## Technical Benefits
92
+
93
+ ### 1. **Deterministic Results**
94
+ - Same input always produces same output
95
+ - No statistical variance to worry about
96
+ - Reproducible for research and debugging
97
+
98
+ ### 2. **Optimal Performance**
99
+ - No wasted computation on redundant iterations
100
+ - Single comprehensive pass is most efficient
101
+ - Faster for decoder models, more thorough for encoder models
102
+
103
+ ### 3. **Cleaner Codebase**
104
+ - Simpler function signatures
105
+ - Less parameter validation
106
+ - Fewer edge cases to handle
107
+
108
+ ### 4. **Better User Understanding**
109
+ - Clear 1:1 relationship between input and output
110
+ - No abstract "iterations" concept to explain
111
+ - Results are intuitive and immediate
112
+
113
+ ## Interface Comparison
114
+
115
+ ### Complex Interface (Before)
116
+ ```
117
+ Text: [input box]
118
+ Model: [dropdown]
119
+ Model Type: [decoder/encoder]
120
+ Iterations: [1-10 slider] ← Removed
121
+ MLM Probability: [0.1-0.5 slider] ← Already removed
122
+ [Analyze Button]
123
+ ```
124
+
125
+ ### Simple Interface (After)
126
+ ```
127
+ Text: [input box]
128
+ Model: [dropdown]
129
+ Model Type: [decoder/encoder]
130
+ [Analyze Button]
131
+ ```
132
+
133
+ ## What Users Gain
134
+
135
+ ### 1. **Simplicity**
136
+ - Minimal cognitive load
137
+ - No parameters to tune
138
+ - Immediate results
139
+
140
+ ### 2. **Confidence**
141
+ - Results are comprehensive, not sampled
142
+ - No wondering about "optimal" iteration count
143
+ - Deterministic and reproducible
144
+
145
+ ### 3. **Speed**
146
+ - Faster workflow (fewer clicks)
147
+ - No time wasted on parameter adjustment
148
+ - Direct path to insights
149
+
150
+ ## Files Modified
151
+
152
+ 1. **`app.py`**: Removed iterations parameter throughout
153
+ 2. **`config.py`**: Removed iterations from examples and settings
154
+ 3. **`README.md`**: Updated documentation
155
+ 4. **`QUICKSTART.md`**: Simplified instructions
156
+
157
+ ## Migration Notes
158
+
159
+ ### For Users
160
+ - **Old workflow**: Text → Model → Iterations → Analyze
161
+ - **New workflow**: Text → Model → Analyze
162
+ - **Result**: Same quality, much simpler
163
+
164
+ ### For Developers
165
+ - Function signatures simplified (no iterations parameter)
166
+ - No iteration loops in core functions
167
+ - Single-pass algorithms throughout
168
+
169
+ ## Final State
170
+
171
+ The PerplexityViewer is now **maximally simplified**:
172
+
173
+ - ✅ **No MLM probability slider** (comprehensive token analysis)
174
+ - ✅ **No iterations slider** (single-pass analysis)
175
+ - ✅ **Clean interface** (text → model → analyze)
176
+ - ✅ **Deterministic results** (same input = same output)
177
+ - ✅ **Comprehensive analysis** (all tokens processed)
178
+
179
+ ## Result
180
+
181
+ The app now has the **simplest possible interface** while providing **the most comprehensive analysis**. This is exactly what good software engineering achieves: maximum functionality with minimum complexity.
182
+
183
+ ### User Benefits
184
+ - 🎯 **Simpler**: Just text and model selection
185
+ - 🚀 **Faster**: Direct workflow, no parameter tuning
186
+ - 🔍 **Complete**: Every token analyzed thoroughly
187
+ - 🎨 **Clear**: Beautiful color visualization of all results
188
+
189
+ The final interface is clean, intuitive, and powerful - perfect for exploring perplexity patterns in text! 🎉
MLM_EXPLANATION.md ADDED
@@ -0,0 +1,190 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🎭 MLM Probability Fix - Complete Documentation
2
+
3
+ ## Issue Identified
4
+ The user correctly observed that **changing the MLM probability did not affect the results at all** in the encoder model visualization. This was a significant bug in how the MLM probability parameter was being used.
5
+
6
+ ## Root Cause Analysis
7
+
8
+ ### What Was Wrong
9
+ The MLM probability setting had two separate effects that were not properly connected:
10
+
11
+ 1. **Average Perplexity Calculation** ✅ (Working correctly)
12
+ - Used random masking with the specified MLM probability
13
+ - Affected the summary statistic shown to the user
14
+
15
+ 2. **Per-Token Visualization** ❌ (Bug was here)
16
+ - Always masked each token individually
17
+ - Completely ignored the MLM probability setting
18
+ - This meant changing MLM probability had no visual effect
19
+
20
+ ### The Disconnect
21
+ ```python
22
+ # OLD CODE - MLM probability was ignored for visualization
23
+ for i in range(len(tokens)):
24
+ if not special_token:
25
+ # ALWAYS calculated detailed perplexity for every token
26
+ masked_input[0, i] = tokenizer.mask_token_id
27
+ # ... calculate perplexity
28
+ ```
29
+
30
+ ## The Fix
31
+
32
+ ### 1. Made MLM Probability Affect Visualization
33
+ Now the MLM probability controls which tokens get detailed analysis:
34
+
35
+ ```python
36
+ # NEW CODE - MLM probability affects visualization
37
+ for i in range(len(tokens)):
38
+ if not special_token:
39
+ if torch.rand(1).item() < mlm_probability: # ✅ Now respects MLM prob
40
+ # Calculate detailed perplexity for this token
41
+ masked_input[0, i] = tokenizer.mask_token_id
42
+ # ... calculate detailed perplexity
43
+ else:
44
+ # Use baseline perplexity for non-analyzed tokens
45
+ token_perplexities.append(2.0) # Neutral baseline
46
+ ```
47
+
48
+ ### 2. Visual Distinction
49
+ - **Analyzed tokens**: Colored by actual perplexity (green/yellow/red)
50
+ - **Non-analyzed tokens**: Gray color with baseline perplexity
51
+ - **Tooltip**: Shows whether token was analyzed or not
52
+
53
+ ### 3. Clear User Feedback
54
+ - Summary now shows: `MLM Probability: 0.15 (3/8 tokens analyzed in detail)`
55
+ - Legend updated: `🟢 Low → 🟡 Medium → 🔴 High → ⚫ Not analyzed`
56
+ - Improved help text: "Probability of detailed analysis per token"
57
+
58
+ ## How It Works Now
59
+
60
+ ### Low MLM Probability (0.15)
61
+ ```
62
+ Input: "The capital of France is Paris"
63
+ Result: Only ~15% of tokens get detailed analysis
64
+ Visualization: Mostly gray tokens with a few colored ones
65
+ Effect: Fast analysis, matches BERT training conditions
66
+ ```
67
+
68
+ ### High MLM Probability (0.5)
69
+ ```
70
+ Input: "The capital of France is Paris"
71
+ Result: ~50% of tokens get detailed analysis
72
+ Visualization: More colored tokens, fewer gray ones
73
+ Effect: More comprehensive but slower analysis
74
+ ```
75
+
76
+ ## User Experience Improvements
77
+
78
+ ### Before the Fix
79
+ - User changes MLM probability from 0.15 → 0.5
80
+ - No visual change in token colors
81
+ - Only summary statistic changed (confusing!)
82
+
83
+ ### After the Fix
84
+ - User changes MLM probability from 0.15 → 0.5
85
+ - More tokens become colored (analyzed)
86
+ - Fewer tokens remain gray (non-analyzed)
87
+ - Summary shows token count: "(3/8 tokens analyzed)"
88
+ - Clear visual feedback of the parameter's effect
89
+
90
+ ## Testing the Fix
91
+
92
+ ### 1. Quick Test
93
+ Try the same text with different MLM probabilities:
94
+ - Text: "Machine learning algorithms require computational resources"
95
+ - MLM 0.2: Few colored tokens
96
+ - MLM 0.8: Most tokens colored
97
+
98
+ ### 2. Demo Script
99
+ ```bash
100
+ python mlm_demo.py
101
+ ```
102
+ Shows exactly how MLM probability affects analysis.
103
+
104
+ ### 3. Visual Examples
105
+ The app now includes example pairs:
106
+ - Same text with MLM 0.2 vs 0.8
107
+ - Shows clear visual difference
108
+
109
+ ## Technical Details
110
+
111
+ ### Randomness Handling
112
+ - Uses `torch.rand()` for consistency with PyTorch
113
+ - Each token gets independent random chance
114
+ - Reproducible with manual seeds for testing
115
+
116
+ ### Baseline Perplexity
117
+ - Non-analyzed tokens get perplexity = 2.0
118
+ - This represents "neutral" confidence
119
+ - Avoids misleading very low/high values
120
+
121
+ ### Color Mapping
122
+ - Analyzed tokens: Full color spectrum based on actual perplexity
123
+ - Non-analyzed tokens: Gray (`rgb(200, 200, 200)`)
124
+ - Tooltips distinguish: "Perplexity: 5.2" vs "Not analyzed"
125
+
126
+ ## Performance Implications
127
+
128
+ ### Lower MLM Probability (0.15)
129
+ - **Pros**: Faster, matches BERT training, realistic
130
+ - **Cons**: Sparse analysis, some tokens not evaluated
131
+
132
+ ### Higher MLM Probability (0.8)
133
+ - **Pros**: Comprehensive analysis, more visual information
134
+ - **Cons**: Slower computation, unrealistic for MLM
135
+
136
+ ### Recommendation
137
+ - **Default 0.15**: Standard BERT-like analysis
138
+ - **Increase to 0.3-0.5**: For more detailed exploration
139
+ - **Avoid >0.8**: Diminishing returns, very slow
140
+
141
+ ## Impact on Model Types
142
+
143
+ ### Decoder Models (GPT, etc.)
144
+ - **No change**: MLM probability only affects encoder models
145
+ - Always analyze all tokens for next-token prediction
146
+
147
+ ### Encoder Models (BERT, etc.)
148
+ - **Major improvement**: MLM probability now has clear visual effect
149
+ - Users can explore different analysis depths
150
+ - Better understanding of model confidence patterns
151
+
152
+ ## User Guidance
153
+
154
+ ### When to Use Different MLM Probabilities
155
+
156
+ **0.15 (Standard)**
157
+ - Quick analysis
158
+ - Matches BERT training
159
+ - Good for initial exploration
160
+
161
+ **0.3-0.4 (Detailed)**
162
+ - More comprehensive view
163
+ - Better for understanding difficult texts
164
+ - Reasonable computation time
165
+
166
+ **0.5+ (Comprehensive)**
167
+ - Maximum detail
168
+ - Research/analysis purposes
169
+ - Slower but thorough
170
+
171
+ ## Future Enhancements
172
+
173
+ ### Possible Improvements
174
+ 1. **Adaptive MLM**: Adjust probability based on text difficulty
175
+ 2. **Token importance**: Prioritize content words over function words
176
+ 3. **Interactive selection**: Let users click tokens to analyze
177
+ 4. **Batch analysis**: Process multiple MLM probabilities simultaneously
178
+
179
+ ### Configuration Options
180
+ The fix is fully configurable via `config.py`:
181
+ - Default MLM probability
182
+ - Min/max ranges
183
+ - Baseline perplexity value
184
+ - Color scheme for non-analyzed tokens
185
+
186
+ ## Conclusion
187
+
188
+ This fix transforms the MLM probability from a "hidden parameter" that only affected summary statistics into a **visible, interactive control** that directly impacts the visualization. Users now get immediate visual feedback when adjusting MLM probability, making the parameter's purpose clear and the analysis more engaging.
189
+
190
+ The fix maintains backward compatibility while significantly improving the user experience for encoder model analysis. 🎉
QUICKSTART.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🚀 Quick Start Guide
2
+
3
+ ## Installation & Launch (3 steps)
4
+
5
+ 1. **Install dependencies:**
6
+ ```bash
7
+ pip install -r requirements.txt
8
+ ```
9
+
10
+ 2. **Launch the app:**
11
+ ```bash
12
+ python launch.py
13
+ ```
14
+
15
+ 3. **Open your browser** to http://localhost:7860
16
+
17
+ ## Alternative Launch Methods
18
+
19
+ If the above doesn't work, try these:
20
+
21
+ ```bash
22
+ # Method 1: Full startup script
23
+ python run.py
24
+
25
+ # Method 2: Direct app launch
26
+ python app.py
27
+
28
+ # Method 3: With dependency installation
29
+ python run.py --install
30
+ ```
31
+
32
+ ## First Time Usage
33
+
34
+ 1. **Enter text** in the input box (try: "The quick brown fox jumps over the lazy dog.")
35
+ 2. **Select a model** (default: gpt2)
36
+ 3. **Choose model type** (decoder for GPT-like, encoder for BERT-like)
37
+ 4. **Click "Analyze"**
38
+
39
+ You'll see:
40
+ - 🟢 Green tokens = Low perplexity (model is confident)
41
+ - 🔴 Red tokens = High perplexity (model is uncertain)
42
+
43
+ ## Troubleshooting
44
+
45
+ **Common Issues:**
46
+
47
+ - **"Module not found"** → Run: `pip install -r requirements.txt`
48
+ - **"Model download failed"** → Check internet connection
49
+ - **"Launch failed"** → Try: `python launch.py` or `python app.py`
50
+ - **Out of memory** → Use smaller models like `distilgpt2` or `distilbert-base-uncased`
51
+
52
+ **GPU Support:**
53
+ - Automatically uses GPU if available
54
+ - Falls back to CPU if no GPU found
55
+
56
+ ## Example Models to Try
57
+
58
+ **Decoder (GPT-like):**
59
+ - `gpt2` - Standard GPT-2
60
+ - `distilgpt2` - Smaller, faster
61
+ - `microsoft/DialoGPT-small` - Conversational
62
+
63
+ **Encoder (BERT-like):**
64
+ - `bert-base-uncased` - Standard BERT
65
+ - `distilbert-base-uncased` - Smaller, faster
66
+ - `roberta-base` - Improved BERT
67
+
68
+ ## Need Help?
69
+
70
+ Run the test suite:
71
+ ```bash
72
+ python test_app.py
73
+ ```
74
+
75
+ Or try the command-line demo:
76
+ ```bash
77
+ python demo.py
78
+ ```
79
+
80
+ **Still having issues?** Check the full README.md for detailed instructions.
81
+
82
+ ## ✅ Recent Updates
83
+
84
+ **Ultra-Simplified Interface!**
85
+ - Removed MLM probability slider for cleaner interface
86
+ - Removed iterations slider - single comprehensive analysis per run
87
+ - Encoder models now analyze all tokens for complete results
88
+ - Decoder models provide single-pass perplexity calculation
89
+ - Tokens are properly colored by perplexity (green=confident, red=uncertain)
90
+ - If you see black/white tokens, try refreshing the browser
91
+ - Test the colors with: `python simple_color_test.py` (creates color_test.html)
README.md CHANGED
@@ -12,3 +12,158 @@ short_description: Simple inspection of perplexity using color-gradients
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
15
+
16
+ # PerplexityViewer 📈
17
+
18
+ A Gradio-based web application for visualizing text perplexity using color-coded gradients. Perfect for understanding how confident language models are about different parts of your text.
19
+
20
+ ## Features
21
+
22
+ - **Dual Model Support**: Works with both decoder models (GPT, DialoGPT) and encoder models (BERT, RoBERTa)
23
+ - **Interactive Visualization**: Color-coded per-token perplexity using spaCy's displaCy
24
+ - **Configurable Analysis**: Adjustable iterations and MLM probability settings
25
+ - **Real-time Processing**: Instant analysis with cached models for faster subsequent runs
26
+ - **Multiple Model Types**:
27
+ - **Decoder Models**: Calculate true perplexity for causal language models
28
+ - **Encoder Models**: Calculate pseudo-perplexity using masked language modeling
29
+
30
+ ## How It Works
31
+
32
+ - **Red tokens**: High perplexity (model is uncertain about this token)
33
+ - **Green tokens**: Low perplexity (model is confident about this token)
34
+ - **Gradient colors**: Show varying degrees of model confidence
35
+
36
+ ## Installation
37
+
38
+ 1. Clone this repository or download the files
39
+ 2. Install dependencies:
40
+ ```bash
41
+ pip install -r requirements.txt
42
+ ```
43
+
44
+ ## Quick Start
45
+
46
+ ### Option 1: Using the startup script (recommended)
47
+ ```bash
48
+ python run.py
49
+ ```
50
+
51
+ ### Option 2: Direct launch
52
+ ```bash
53
+ python app.py
54
+ ```
55
+
56
+ ### Option 3: With dependency installation and testing
57
+ ```bash
58
+ python run.py --install --test
59
+ ```
60
+
61
+ ## Usage
62
+
63
+ 1. **Enter your text** in the input box
64
+ 2. **Select a model** from the dropdown or enter a custom HuggingFace model name
65
+ 3. **Choose model type**:
66
+ - **Decoder**: For GPT-like models (true perplexity)
67
+ - **Encoder**: For BERT-like models (pseudo-perplexity via MLM)
68
+ 4. **Adjust settings** (optional):
69
+ 5. **Click "Analyze"** to see the results
70
+
71
+ ## Supported Models
72
+
73
+ ### Decoder Models (Causal LM)
74
+ - `gpt2`, `distilgpt2`
75
+ - `microsoft/DialoGPT-small`, `microsoft/DialoGPT-medium`
76
+ - `openai-gpt`
77
+ - Any HuggingFace causal language model
78
+
79
+ ### Encoder Models (Masked LM)
80
+ - `bert-base-uncased`, `bert-base-cased`
81
+ - `distilbert-base-uncased`
82
+ - `roberta-base`
83
+ - `albert-base-v2`
84
+ - Any HuggingFace masked language model
85
+
86
+ ## Understanding the Results
87
+
88
+ ### Perplexity Interpretation
89
+ - **Lower perplexity**: Model is more confident (text is more predictable)
90
+ - **Higher perplexity**: Model is less confident (text is more surprising)
91
+
92
+ ### Color Coding
93
+ - **Green**: Low perplexity (≤ 2.0) - very predictable
94
+ - **Yellow/Orange**: Medium perplexity (2.0-10.0) - somewhat predictable
95
+ - **Red**: High perplexity (≥ 10.0) - surprising or difficult to predict
96
+
97
+ ## Technical Details
98
+
99
+ ### Decoder Models (True Perplexity)
100
+ - Uses next-token prediction to calculate perplexity
101
+ - Formula: `PPL = exp(average_cross_entropy_loss)`
102
+ - Each token's perplexity is based on how well the model predicted it given the previous context
103
+
104
+ ### Encoder Models (Pseudo-Perplexity)
105
+ - Uses masked language modeling (MLM)
106
+ - Masks each token individually and measures prediction confidence
107
+ - Pseudo-perplexity approximates true perplexity for bidirectional models
108
+ - All content tokens are analyzed for comprehensive results
109
+
110
+ ## Testing
111
+
112
+ Run the test suite to verify everything works:
113
+ ```bash
114
+ python test_app.py
115
+ ```
116
+
117
+ Or use the startup script with testing:
118
+ ```bash
119
+ python run.py --test
120
+ ```
121
+
122
+ ## Configuration
123
+
124
+ The app uses sensible defaults but can be customized via `config.py`:
125
+ - Default model lists
126
+ - Processing settings
127
+ - Visualization colors and settings
128
+ - UI configuration
129
+
130
+ ## Requirements
131
+
132
+ - Python 3.7+
133
+ - PyTorch
134
+ - Transformers
135
+ - Gradio 4.0+
136
+ - spaCy
137
+ - pandas
138
+ - numpy
139
+
140
+ ## GPU Support
141
+
142
+ The app automatically uses GPU acceleration when available, falling back to CPU processing otherwise.
143
+
144
+ ## Troubleshooting
145
+
146
+ ### Common Issues
147
+
148
+ 1. **Model loading errors**: Ensure you have internet connection for first-time model downloads
149
+ 2. **Memory issues**: Try smaller models like `distilgpt2` or `distilbert-base-uncased`
150
+ 3. **CUDA out of memory**: Reduce text length or use CPU-only mode
151
+ 4. **Encoder models slow**: This is normal - each token is analyzed individually for accuracy
152
+ 5. **Single analysis**: The app now performs one comprehensive analysis per run (no iterations needed)
153
+
154
+ ### Getting Help
155
+
156
+ If you encounter issues:
157
+ 1. Check the console output for error messages
158
+ 2. Try running the test suite: `python test_app.py`
159
+ 3. Ensure all dependencies are installed: `pip install -r requirements.txt`
160
+
161
+ ## Examples
162
+
163
+ Try these example texts to see the app in action:
164
+
165
+ - **"The quick brown fox jumps over the lazy dog."** (Common phrase - should show low perplexity)
166
+ - **"Quantum entanglement defies classical intuition."** (Technical content - may show higher perplexity)
167
+ - **"Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo."** (Grammatically complex - interesting perplexity patterns)
168
+
169
+
SIMPLIFICATION_SUMMARY.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🎯 Simplification Summary - MLM Probability Removal
2
+
3
+ ## Change Request
4
+ The user requested to **remove the MLM probability slider** and **analyze all tokens** for encoder models, simplifying the interface and making results more consistent.
5
+
6
+ ## What Was Removed
7
+
8
+ ### 1. MLM Probability Slider
9
+ - **Before**: User could adjust MLM probability from 0.1 to 0.5
10
+ - **After**: No slider, cleaner interface
11
+
12
+ ### 2. Random Token Selection
13
+ - **Before**: Only ~15-50% of tokens analyzed based on MLM probability
14
+ - **After**: ALL content tokens analyzed for comprehensive results
15
+
16
+ ### 3. Complex Configuration
17
+ - **Before**: MLM probability settings, thresholds, explanations
18
+ - **After**: Simplified configuration focused on core functionality
19
+
20
+ ## Code Changes Made
21
+
22
+ ### `app.py`
23
+ - **Removed**: `mlm_probability` parameter from all functions
24
+ - **Simplified**: `calculate_encoder_perplexity()` now analyzes all tokens
25
+ - **Cleaned**: UI no longer shows/hides MLM probability slider
26
+ - **Updated**: Process function signature simplified
27
+
28
+ ### `config.py`
29
+ - **Removed**: All MLM probability related settings
30
+ - **Simplified**: Examples no longer include MLM probability values
31
+ - **Cleaned**: Processing settings streamlined
32
+
33
+ ### UI Changes
34
+ - **Removed**: MLM probability slider and related controls
35
+ - **Updated**: Help text and examples
36
+ - **Simplified**: Model type change handler
37
+
38
+ ## New Behavior
39
+
40
+ ### Encoder Models (BERT, etc.)
41
+ 1. **Comprehensive Analysis**: Every content token is individually masked and analyzed
42
+ 2. **Consistent Results**: No randomness in token selection
43
+ 3. **Full Visualization**: All tokens get proper perplexity colors (no gray "not analyzed" tokens)
44
+ 4. **Better Performance**: No need to run multiple iterations for statistical sampling
45
+
46
+ ### Decoder Models (GPT, etc.)
47
+ - **No change**: Still analyzes all tokens as before
48
+ - **Consistent interface**: Same workflow for both model types
49
+
50
+ ## Benefits of Simplification
51
+
52
+ ### 1. **User Experience**
53
+ - ✅ Cleaner, less confusing interface
54
+ - ✅ Consistent results every time
55
+ - ✅ No need to understand MLM probability concept
56
+ - ✅ Faster workflow (fewer parameters to adjust)
57
+
58
+ ### 2. **Technical Benefits**
59
+ - ✅ More comprehensive analysis (100% of tokens)
60
+ - ✅ Deterministic results (no randomness)
61
+ - ✅ Simplified codebase (easier to maintain)
62
+ - ✅ Better visualization (all tokens colored)
63
+
64
+ ### 3. **Performance**
65
+ - ✅ More predictable compute time
66
+ - ✅ No wasted computation on statistical sampling
67
+ - ✅ Single iteration gives complete picture
68
+
69
+ ## Impact on Existing Functionality
70
+
71
+ ### What Still Works
72
+ - ✅ All model types supported
73
+ - ✅ Color visualization working perfectly
74
+ - ✅ Iterations parameter still available
75
+ - ✅ Model caching still functional
76
+ - ✅ All examples still work
77
+
78
+ ### What's Improved
79
+ - 🎯 Encoder model analysis is now comprehensive
80
+ - 🎯 No more confusing "not analyzed" gray tokens
81
+ - 🎯 Simpler parameter space to explore
82
+ - 🎯 More consistent results
83
+
84
+ ## Migration Notes
85
+
86
+ ### For Users
87
+ - **Old workflow**: Adjust MLM probability → Analyze → Interpret partial results
88
+ - **New workflow**: Select text → Choose model → Analyze → Get complete results
89
+
90
+ ### For Developers
91
+ - Function signatures simplified (removed `mlm_probability` parameter)
92
+ - Configuration streamlined (removed MLM-related settings)
93
+ - UI event handlers simplified (no MLM probability visibility toggle)
94
+
95
+ ## Files Modified
96
+
97
+ 1. **`app.py`**: Core functionality and UI
98
+ 2. **`config.py`**: Configuration and examples
99
+ 3. **`README.md`**: Updated documentation
100
+ 4. **`QUICKSTART.md`**: Simplified instructions
101
+
102
+ ## Files Created
103
+ 1. **`SIMPLIFICATION_SUMMARY.md`**: This documentation
104
+
105
+ ## Testing
106
+
107
+ The simplification maintains all existing functionality while providing better results:
108
+
109
+ ```bash
110
+ # Test the simplified interface
111
+ python launch.py
112
+
113
+ # Try encoder models - all tokens now analyzed:
114
+ # Text: "The capital of France is Paris"
115
+ # Model: bert-base-uncased
116
+ # Type: encoder
117
+ # Result: All content tokens get proper colors!
118
+ ```
119
+
120
+ ## Result
121
+
122
+ The app is now **simpler, faster, and more comprehensive** - exactly what the user requested! 🎉
123
+
124
+ - 🎯 **Simpler**: Removed confusing MLM probability parameter
125
+ - 🚀 **Faster**: More direct workflow
126
+ - 🔍 **Comprehensive**: All tokens analyzed for complete picture
127
+ - 🎨 **Better visualization**: No more gray "not analyzed" tokens
128
+
129
+ The interface is cleaner, the results are more complete, and the user experience is significantly improved.
__pycache__/app.cpython-310.pyc ADDED
Binary file (11.6 kB). View file
 
__pycache__/app.cpython-312.pyc ADDED
Binary file (20 kB). View file
 
__pycache__/config.cpython-310.pyc ADDED
Binary file (2.23 kB). View file
 
__pycache__/config.cpython-312.pyc ADDED
Binary file (2.44 kB). View file
 
__pycache__/launch.cpython-310.pyc ADDED
Binary file (1.28 kB). View file
 
__pycache__/mlm_demo.cpython-310.pyc ADDED
Binary file (6.11 kB). View file
 
__pycache__/run.cpython-310.pyc ADDED
Binary file (4.79 kB). View file
 
__pycache__/test_app.cpython-310.pyc ADDED
Binary file (7.47 kB). View file
 
app.py CHANGED
@@ -33,18 +33,16 @@ except ImportError:
33
  "displacy_options": {"ents": ["PP"], "colors": {}}
34
  }
35
  PROCESSING_SETTINGS = {
36
- "default_iterations": 1,
37
- "max_iterations": 10,
38
  "epsilon": 1e-10
39
  }
40
  UI_SETTINGS = {
41
- "title": "📈 Perplexity Viewer Simple",
42
- "description": "Visualize per-token perplexity using color gradients. Assumes single token masking.",
43
  "examples": [
44
- {"text": "The quick brown fox jumps over the lazy dog.", "model": "gpt2", "type": "decoder", "iterations": 1},
45
- {"text": "The capital of France is Paris.", "model": "bert-base-uncased", "type": "encoder", "iterations": 1},
46
- {"text": "Quantum entanglement defies classical physics intuition completely.", "model": "distilgpt2", "type": "decoder", "iterations": 1},
47
- {"text": "Machine learning algorithms require computational resources.", "model": "distilbert-base-uncased", "type": "encoder", "iterations": 1}
48
  ]
49
  }
50
  ERROR_MESSAGES = {
@@ -95,27 +93,24 @@ def load_model_and_tokenizer(model_name, model_type):
95
 
96
  return cached_models[cache_key], cached_tokenizers[cache_key]
97
 
98
- def calculate_decoder_perplexity(text, model, tokenizer, iterations=1):
99
  """Calculate perplexity for decoder models (like GPT)"""
100
  device = next(model.parameters()).device
101
 
102
- perplexities = []
 
 
103
 
104
- for iteration in range(iterations):
105
- # Tokenize the text
106
- inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=MODEL_SETTINGS["max_length"])
107
- input_ids = inputs.input_ids.to(device)
108
 
109
- if input_ids.size(1) < 2:
110
- raise gr.Error("Text is too short for perplexity calculation.")
111
-
112
- with torch.no_grad():
113
- outputs = model(input_ids, labels=input_ids)
114
- loss = outputs.loss
115
- perplexity = torch.exp(loss).item()
116
- perplexities.append(perplexity)
117
 
118
- # Get token-level perplexities for the last iteration
119
  with torch.no_grad():
120
  outputs = model(input_ids)
121
  logits = outputs.logits
@@ -142,46 +137,44 @@ def calculate_decoder_perplexity(text, model, tokenizer, iterations=1):
142
  else:
143
  cleaned_tokens.append(token)
144
 
145
- return np.mean(perplexities), cleaned_tokens, token_perplexities
146
 
147
- def calculate_encoder_perplexity(text, model, tokenizer, iterations=1):
148
  """Calculate pseudo-perplexity for encoder models (like BERT) using MLM on all tokens"""
149
  device = next(model.parameters()).device
150
 
151
- perplexities = []
 
 
152
 
153
- for iteration in range(iterations):
154
- # Tokenize the text
155
- inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=MODEL_SETTINGS["max_length"])
156
- input_ids = inputs.input_ids.to(device)
157
 
158
- if input_ids.size(1) < 3: # Need at least [CLS] + 1 token + [SEP]
159
- raise gr.Error("Text is too short for MLM perplexity calculation.")
160
-
161
- # Calculate average perplexity by masking all content tokens
162
- with torch.no_grad():
163
- seq_length = input_ids.size(1)
164
- special_token_ids = {tokenizer.cls_token_id, tokenizer.sep_token_id, tokenizer.pad_token_id}
165
 
166
- all_token_losses = []
167
 
168
- # Mask each non-special token individually and calculate loss
169
- for i in range(seq_length):
170
- if input_ids[0, i].item() not in special_token_ids:
171
- masked_input = input_ids.clone()
172
- original_token_id = input_ids[0, i]
173
- masked_input[0, i] = tokenizer.mask_token_id
174
 
175
- outputs = model(masked_input)
176
- predictions = outputs.logits[0, i]
177
- prob = F.softmax(predictions, dim=-1)[original_token_id]
178
- loss = -torch.log(prob + PROCESSING_SETTINGS["epsilon"])
179
- all_token_losses.append(loss.item())
180
 
181
- if all_token_losses:
182
- avg_loss = np.mean(all_token_losses)
183
- perplexity = math.exp(avg_loss)
184
- perplexities.append(perplexity)
 
185
 
186
  # Calculate per-token pseudo-perplexity for visualization (analyze all tokens)
187
  with torch.no_grad():
@@ -212,7 +205,7 @@ def calculate_encoder_perplexity(text, model, tokenizer, iterations=1):
212
  else:
213
  cleaned_tokens.append(token)
214
 
215
- return np.mean(perplexities) if perplexities else float('inf'), cleaned_tokens, np.array(token_perplexities)
216
 
217
  def create_visualization(tokens, perplexities):
218
  """Create custom HTML visualization with color-coded perplexities"""
@@ -318,26 +311,23 @@ def create_visualization(tokens, perplexities):
318
 
319
  return "".join(html_parts)
320
 
321
- def process_text(text, model_name, model_type, iterations):
322
  """Main processing function"""
323
  if not text.strip():
324
  return ERROR_MESSAGES["empty_text"], "", pd.DataFrame()
325
 
326
  try:
327
- # Validate inputs
328
- iterations = max(1, min(iterations, PROCESSING_SETTINGS["max_iterations"]))
329
-
330
  # Load model and tokenizer
331
  model, tokenizer = load_model_and_tokenizer(model_name, model_type)
332
 
333
  # Calculate perplexity
334
  if model_type == "decoder":
335
  avg_perplexity, tokens, token_perplexities = calculate_decoder_perplexity(
336
- text, model, tokenizer, iterations
337
  )
338
  else: # encoder
339
  avg_perplexity, tokens, token_perplexities = calculate_encoder_perplexity(
340
- text, model, tokenizer, iterations
341
  )
342
 
343
  # Create visualization
@@ -351,7 +341,6 @@ def process_text(text, model_name, model_type, iterations):
351
  **Model Type:** {model_type.title()}
352
  **Average Perplexity:** {avg_perplexity:.4f}
353
  **Number of Tokens:** {len(tokens)}
354
- **Iterations:** {iterations}
355
  """
356
 
357
 
@@ -397,15 +386,6 @@ with gr.Blocks(title=UI_SETTINGS["title"], theme=gr.themes.Soft()) as demo:
397
  info="Decoder for causal LM, Encoder for masked LM"
398
  )
399
 
400
- with gr.Row():
401
- iterations = gr.Slider(
402
- label="Iterations",
403
- minimum=1,
404
- maximum=PROCESSING_SETTINGS["max_iterations"],
405
- value=PROCESSING_SETTINGS["default_iterations"],
406
- step=1,
407
- info="Number of iterations to average over"
408
- )
409
  analyze_btn = gr.Button("🔍 Analyze Perplexity", variant="primary", size="lg")
410
 
411
  with gr.Column(scale=3):
@@ -433,20 +413,20 @@ with gr.Blocks(title=UI_SETTINGS["title"], theme=gr.themes.Soft()) as demo:
433
  # Set up the analysis function
434
  analyze_btn.click(
435
  fn=process_text,
436
- inputs=[text_input, model_name, model_type, iterations],
437
  outputs=[summary_output, viz_output, table_output]
438
  )
439
 
440
  # Add examples
441
  with gr.Accordion("📝 Example Texts", open=False):
442
  examples_data = [
443
- [ex["text"], ex["model"], ex["type"], ex["iterations"]]
444
  for ex in UI_SETTINGS["examples"]
445
  ]
446
 
447
  gr.Examples(
448
  examples=examples_data,
449
- inputs=[text_input, model_name, model_type, iterations],
450
  outputs=[summary_output, viz_output, table_output],
451
  fn=process_text,
452
  cache_examples=False,
@@ -468,7 +448,7 @@ with gr.Blocks(title=UI_SETTINGS["title"], theme=gr.themes.Soft()) as demo:
468
  - Models are cached after first use
469
  - Very long texts are truncated to 512 tokens
470
  - GPU acceleration is used when available
471
- - For encoder models, all content tokens are analyzed for comprehensive results
472
  """)
473
 
474
  if __name__ == "__main__":
 
33
  "displacy_options": {"ents": ["PP"], "colors": {}}
34
  }
35
  PROCESSING_SETTINGS = {
 
 
36
  "epsilon": 1e-10
37
  }
38
  UI_SETTINGS = {
39
+ "title": "📈 Perplexity Viewer",
40
+ "description": "Visualize per-token perplexity using color gradients.",
41
  "examples": [
42
+ {"text": "The quick brown fox jumps over the lazy dog.", "model": "gpt2", "type": "decoder"},
43
+ {"text": "The capital of France is Paris.", "model": "bert-base-uncased", "type": "encoder"},
44
+ {"text": "Quantum entanglement defies classical physics intuition completely.", "model": "distilgpt2", "type": "decoder"},
45
+ {"text": "Machine learning algorithms require computational resources.", "model": "distilbert-base-uncased", "type": "encoder"}
46
  ]
47
  }
48
  ERROR_MESSAGES = {
 
93
 
94
  return cached_models[cache_key], cached_tokenizers[cache_key]
95
 
96
+ def calculate_decoder_perplexity(text, model, tokenizer):
97
  """Calculate perplexity for decoder models (like GPT)"""
98
  device = next(model.parameters()).device
99
 
100
+ # Tokenize the text
101
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=MODEL_SETTINGS["max_length"])
102
+ input_ids = inputs.input_ids.to(device)
103
 
104
+ if input_ids.size(1) < 2:
105
+ raise gr.Error("Text is too short for perplexity calculation.")
 
 
106
 
107
+ # Calculate overall perplexity
108
+ with torch.no_grad():
109
+ outputs = model(input_ids, labels=input_ids)
110
+ loss = outputs.loss
111
+ perplexity = torch.exp(loss).item()
 
 
 
112
 
113
+ # Get token-level perplexities
114
  with torch.no_grad():
115
  outputs = model(input_ids)
116
  logits = outputs.logits
 
137
  else:
138
  cleaned_tokens.append(token)
139
 
140
+ return perplexity, cleaned_tokens, token_perplexities
141
 
142
+ def calculate_encoder_perplexity(text, model, tokenizer):
143
  """Calculate pseudo-perplexity for encoder models (like BERT) using MLM on all tokens"""
144
  device = next(model.parameters()).device
145
 
146
+ # Tokenize the text
147
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=MODEL_SETTINGS["max_length"])
148
+ input_ids = inputs.input_ids.to(device)
149
 
150
+ if input_ids.size(1) < 3: # Need at least [CLS] + 1 token + [SEP]
151
+ raise gr.Error("Text is too short for MLM perplexity calculation.")
 
 
152
 
153
+ # Calculate average perplexity by masking all content tokens
154
+ with torch.no_grad():
155
+ seq_length = input_ids.size(1)
156
+ special_token_ids = {tokenizer.cls_token_id, tokenizer.sep_token_id, tokenizer.pad_token_id}
 
 
 
157
 
158
+ all_token_losses = []
159
 
160
+ # Mask each non-special token individually and calculate loss
161
+ for i in range(seq_length):
162
+ if input_ids[0, i].item() not in special_token_ids:
163
+ masked_input = input_ids.clone()
164
+ original_token_id = input_ids[0, i]
165
+ masked_input[0, i] = tokenizer.mask_token_id
166
 
167
+ outputs = model(masked_input)
168
+ predictions = outputs.logits[0, i]
169
+ prob = F.softmax(predictions, dim=-1)[original_token_id]
170
+ loss = -torch.log(prob + PROCESSING_SETTINGS["epsilon"])
171
+ all_token_losses.append(loss.item())
172
 
173
+ if all_token_losses:
174
+ avg_loss = np.mean(all_token_losses)
175
+ perplexity = math.exp(avg_loss)
176
+ else:
177
+ perplexity = float('inf')
178
 
179
  # Calculate per-token pseudo-perplexity for visualization (analyze all tokens)
180
  with torch.no_grad():
 
205
  else:
206
  cleaned_tokens.append(token)
207
 
208
+ return perplexity, cleaned_tokens, np.array(token_perplexities)
209
 
210
  def create_visualization(tokens, perplexities):
211
  """Create custom HTML visualization with color-coded perplexities"""
 
311
 
312
  return "".join(html_parts)
313
 
314
+ def process_text(text, model_name, model_type):
315
  """Main processing function"""
316
  if not text.strip():
317
  return ERROR_MESSAGES["empty_text"], "", pd.DataFrame()
318
 
319
  try:
 
 
 
320
  # Load model and tokenizer
321
  model, tokenizer = load_model_and_tokenizer(model_name, model_type)
322
 
323
  # Calculate perplexity
324
  if model_type == "decoder":
325
  avg_perplexity, tokens, token_perplexities = calculate_decoder_perplexity(
326
+ text, model, tokenizer
327
  )
328
  else: # encoder
329
  avg_perplexity, tokens, token_perplexities = calculate_encoder_perplexity(
330
+ text, model, tokenizer
331
  )
332
 
333
  # Create visualization
 
341
  **Model Type:** {model_type.title()}
342
  **Average Perplexity:** {avg_perplexity:.4f}
343
  **Number of Tokens:** {len(tokens)}
 
344
  """
345
 
346
 
 
386
  info="Decoder for causal LM, Encoder for masked LM"
387
  )
388
 
 
 
 
 
 
 
 
 
 
389
  analyze_btn = gr.Button("🔍 Analyze Perplexity", variant="primary", size="lg")
390
 
391
  with gr.Column(scale=3):
 
413
  # Set up the analysis function
414
  analyze_btn.click(
415
  fn=process_text,
416
+ inputs=[text_input, model_name, model_type],
417
  outputs=[summary_output, viz_output, table_output]
418
  )
419
 
420
  # Add examples
421
  with gr.Accordion("📝 Example Texts", open=False):
422
  examples_data = [
423
+ [ex["text"], ex["model"], ex["type"]]
424
  for ex in UI_SETTINGS["examples"]
425
  ]
426
 
427
  gr.Examples(
428
  examples=examples_data,
429
+ inputs=[text_input, model_name, model_type],
430
  outputs=[summary_output, viz_output, table_output],
431
  fn=process_text,
432
  cache_examples=False,
 
448
  - Models are cached after first use
449
  - Very long texts are truncated to 512 tokens
450
  - GPU acceleration is used when available
451
+ - All tokens are analyzed in a single pass for accurate results
452
  """)
453
 
454
  if __name__ == "__main__":
color_test.html ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ <!DOCTYPE html>
3
+ <html>
4
+ <head>
5
+ <title>Color Test</title>
6
+ <style>
7
+ body { font-family: Arial, sans-serif; margin: 20px; }
8
+ .test-section { margin: 20px 0; padding: 15px; border: 1px solid #ccc; }
9
+ </style>
10
+ </head>
11
+ <body>
12
+ <h1>🎨 Perplexity Color Test</h1>
13
+
14
+ <div class="test-section">
15
+ <h2>Low Perplexity (Green - Confident)</h2>
16
+ <span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.2">The</span>
17
+ <span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.8">quick</span>
18
+ <span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 2.1">brown</span>
19
+ </div>
20
+
21
+ <div class="test-section">
22
+ <h2>Medium Perplexity (Yellow - Uncertain)</h2>
23
+ <span style="background-color: rgba(241, 196, 15, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 5.4">machine</span>
24
+ <span style="background-color: rgba(241, 196, 15, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 7.2">learning</span>
25
+ <span style="background-color: rgba(241, 196, 15, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 8.9">requires</span>
26
+ </div>
27
+
28
+ <div class="test-section">
29
+ <h2>High Perplexity (Red - Very Uncertain)</h2>
30
+ <span style="background-color: rgba(231, 76, 60, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 15.7">quantum</span>
31
+ <span style="background-color: rgba(231, 76, 60, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 23.4">entanglement</span>
32
+ <span style="background-color: rgba(231, 76, 60, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 31.2">defies</span>
33
+ </div>
34
+
35
+ <div class="test-section">
36
+ <h2>Mixed Example Sentence</h2>
37
+ <span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.2">The</span>
38
+ <span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 2.3">capital</span>
39
+ <span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.8">of</span>
40
+ <span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 2.1">France</span>
41
+ <span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.5">is</span>
42
+ <span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.9">Paris</span>
43
+ </div>
44
+
45
+ <p><strong>Instructions:</strong> Hover over tokens to see perplexity values in tooltips!</p>
46
+ <p><strong>Color Legend:</strong></p>
47
+ <ul>
48
+ <li>🟢 <strong>Green:</strong> Low perplexity (model is confident)</li>
49
+ <li>🟡 <strong>Yellow:</strong> Medium perplexity (model is somewhat uncertain)</li>
50
+ <li>🔴 <strong>Red:</strong> High perplexity (model is very uncertain)</li>
51
+ </ul>
52
+ </body>
53
+ </html>
demo.py ADDED
@@ -0,0 +1,263 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Demo script for PerplexityViewer - shows core functionality without GUI
4
+ """
5
+
6
+ import torch
7
+ import numpy as np
8
+ from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForMaskedLM
9
+ import warnings
10
+ warnings.filterwarnings("ignore")
11
+
12
+ def demo_decoder_perplexity():
13
+ """Demo decoder model perplexity calculation"""
14
+ print("="*60)
15
+ print("🤖 Decoder Model Demo (GPT-2)")
16
+ print("="*60)
17
+
18
+ # Load model
19
+ model_name = "distilgpt2"
20
+ print(f"Loading {model_name}...")
21
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
22
+ model = AutoModelForCausalLM.from_pretrained(model_name)
23
+
24
+ if tokenizer.pad_token is None:
25
+ tokenizer.pad_token = tokenizer.eos_token
26
+
27
+ model.eval()
28
+
29
+ # Test texts
30
+ test_texts = [
31
+ "The quick brown fox jumps over the lazy dog.",
32
+ "Machine learning is revolutionizing artificial intelligence.",
33
+ "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo.",
34
+ "The capital of France is Paris."
35
+ ]
36
+
37
+ for i, text in enumerate(test_texts, 1):
38
+ print(f"\n📝 Text {i}: {text}")
39
+
40
+ # Tokenize
41
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
42
+ input_ids = inputs.input_ids
43
+
44
+ # Calculate perplexity
45
+ with torch.no_grad():
46
+ outputs = model(input_ids, labels=input_ids)
47
+ loss = outputs.loss
48
+ perplexity = torch.exp(loss).item()
49
+
50
+ print(f" 💯 Perplexity: {perplexity:.2f}")
51
+
52
+ # Get token-level details
53
+ tokens = tokenizer.convert_ids_to_tokens(input_ids[0][1:]) # Skip first token
54
+
55
+ with torch.no_grad():
56
+ outputs = model(input_ids)
57
+ logits = outputs.logits
58
+ shift_logits = logits[..., :-1, :].contiguous()
59
+ shift_labels = input_ids[..., 1:].contiguous()
60
+
61
+ loss_fct = torch.nn.CrossEntropyLoss(reduction='none')
62
+ token_losses = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
63
+ token_perplexities = torch.exp(token_losses).cpu().numpy()
64
+
65
+ print(" 🎯 Token details:")
66
+ for token, pp in zip(tokens[:5], token_perplexities[:5]): # Show first 5
67
+ clean_token = token.replace('Ġ', ' ').replace('##', '')
68
+ color = '🟢' if pp < 3 else '🟡' if pp < 10 else '🔴'
69
+ print(f" {color} '{clean_token}': {pp:.2f}")
70
+
71
+ if len(tokens) > 5:
72
+ print(f" ... and {len(tokens) - 5} more tokens")
73
+
74
+ def demo_encoder_perplexity():
75
+ """Demo encoder model pseudo-perplexity calculation"""
76
+ print("\n" + "="*60)
77
+ print("🤖 Encoder Model Demo (DistilBERT)")
78
+ print("="*60)
79
+
80
+ # Load model
81
+ model_name = "distilbert-base-uncased"
82
+ print(f"Loading {model_name}...")
83
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
84
+ model = AutoModelForMaskedLM.from_pretrained(model_name)
85
+ model.eval()
86
+
87
+ # Test texts
88
+ test_texts = [
89
+ "The capital of France is Paris.",
90
+ "Python is a programming language.",
91
+ "The weather today is beautiful.",
92
+ "Machine learning requires large datasets."
93
+ ]
94
+
95
+ mlm_probability = 0.15
96
+
97
+ for i, text in enumerate(test_texts, 1):
98
+ print(f"\n📝 Text {i}: {text}")
99
+
100
+ # Tokenize
101
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
102
+ input_ids = inputs.input_ids
103
+
104
+ # Create masked version
105
+ masked_input_ids = input_ids.clone()
106
+ original_tokens = input_ids.clone()
107
+
108
+ # Randomly mask tokens (excluding special tokens)
109
+ seq_length = input_ids.size(1)
110
+ mask_indices = []
111
+ special_token_ids = {tokenizer.cls_token_id, tokenizer.sep_token_id, tokenizer.pad_token_id}
112
+
113
+ for j in range(seq_length):
114
+ if input_ids[0, j].item() not in special_token_ids:
115
+ if torch.rand(1).item() < mlm_probability:
116
+ mask_indices.append(j)
117
+ masked_input_ids[0, j] = tokenizer.mask_token_id
118
+
119
+ if not mask_indices: # Ensure at least one token is masked
120
+ non_special_indices = [j for j in range(seq_length) if input_ids[0, j].item() not in special_token_ids]
121
+ if non_special_indices:
122
+ mask_idx = torch.randint(0, len(non_special_indices), (1,)).item()
123
+ mask_indices = [non_special_indices[mask_idx]]
124
+ masked_input_ids[0, mask_indices[0]] = tokenizer.mask_token_id
125
+
126
+ # Calculate pseudo-perplexity
127
+ with torch.no_grad():
128
+ outputs = model(masked_input_ids)
129
+ predictions = outputs.logits
130
+
131
+ masked_token_losses = []
132
+ for idx in mask_indices:
133
+ target_id = original_tokens[0, idx]
134
+ pred_scores = predictions[0, idx]
135
+ prob = torch.softmax(pred_scores, dim=-1)[target_id]
136
+ loss = -torch.log(prob + 1e-10)
137
+ masked_token_losses.append(loss.item())
138
+
139
+ if masked_token_losses:
140
+ avg_loss = np.mean(masked_token_losses)
141
+ pseudo_perplexity = np.exp(avg_loss)
142
+ else:
143
+ pseudo_perplexity = float('inf')
144
+
145
+ print(f" 💯 Pseudo-perplexity: {pseudo_perplexity:.2f}")
146
+ print(f" 🎭 Masked {len(mask_indices)} tokens")
147
+
148
+ # Show some token-level pseudo-perplexities
149
+ tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
150
+ print(" 🎯 Sample token pseudo-perplexities:")
151
+
152
+ with torch.no_grad():
153
+ sample_indices = list(range(1, min(6, len(tokens)-1))) # Skip [CLS] and [SEP]
154
+ for idx in sample_indices:
155
+ if input_ids[0, idx].item() not in special_token_ids:
156
+ masked_input = input_ids.clone()
157
+ original_token_id = input_ids[0, idx]
158
+ masked_input[0, idx] = tokenizer.mask_token_id
159
+
160
+ outputs = model(masked_input)
161
+ predictions = outputs.logits[0, idx]
162
+ prob = torch.softmax(predictions, dim=-1)[original_token_id]
163
+ token_pseudo_perplexity = 1.0 / (prob.item() + 1e-10)
164
+
165
+ clean_token = tokens[idx].replace('##', '')
166
+ color = '🟢' if token_pseudo_perplexity < 5 else '🟡' if token_pseudo_perplexity < 20 else '🔴'
167
+ print(f" {color} '{clean_token}': {token_pseudo_perplexity:.2f}")
168
+
169
+ def demo_comparison():
170
+ """Compare perplexity across different model types"""
171
+ print("\n" + "="*60)
172
+ print("🔬 Model Comparison Demo")
173
+ print("="*60)
174
+
175
+ test_text = "The quick brown fox jumps over the lazy dog."
176
+ print(f"📝 Comparing models on: {test_text}")
177
+
178
+ models_to_test = [
179
+ ("distilgpt2", "decoder"),
180
+ ("distilbert-base-uncased", "encoder")
181
+ ]
182
+
183
+ results = []
184
+
185
+ for model_name, model_type in models_to_test:
186
+ print(f"\n🤖 Testing {model_name} ({model_type})...")
187
+
188
+ try:
189
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
190
+
191
+ if model_type == "decoder":
192
+ model = AutoModelForCausalLM.from_pretrained(model_name)
193
+ if tokenizer.pad_token is None:
194
+ tokenizer.pad_token = tokenizer.eos_token
195
+ else:
196
+ model = AutoModelForMaskedLM.from_pretrained(model_name)
197
+
198
+ model.eval()
199
+
200
+ inputs = tokenizer(test_text, return_tensors="pt", truncation=True, max_length=512)
201
+ input_ids = inputs.input_ids
202
+
203
+ if model_type == "decoder":
204
+ with torch.no_grad():
205
+ outputs = model(input_ids, labels=input_ids)
206
+ loss = outputs.loss
207
+ perplexity = torch.exp(loss).item()
208
+ else: # encoder
209
+ # Quick pseudo-perplexity calculation
210
+ masked_input_ids = input_ids.clone()
211
+ seq_length = input_ids.size(1)
212
+
213
+ # Mask middle token
214
+ if seq_length > 2:
215
+ middle_idx = seq_length // 2
216
+ masked_input_ids[0, middle_idx] = tokenizer.mask_token_id
217
+
218
+ with torch.no_grad():
219
+ outputs = model(masked_input_ids)
220
+ predictions = outputs.logits[0, middle_idx]
221
+ prob = torch.softmax(predictions, dim=-1)[input_ids[0, middle_idx]]
222
+ perplexity = 1.0 / (prob.item() + 1e-10)
223
+ else:
224
+ perplexity = float('inf')
225
+
226
+ results.append((model_name, model_type, perplexity))
227
+ print(f" ✅ Perplexity: {perplexity:.2f}")
228
+
229
+ except Exception as e:
230
+ print(f" ❌ Error: {e}")
231
+ results.append((model_name, model_type, float('inf')))
232
+
233
+ print(f"\n📊 Summary for '{test_text}':")
234
+ for model_name, model_type, perplexity in results:
235
+ if perplexity != float('inf'):
236
+ confidence = "High" if perplexity < 5 else "Medium" if perplexity < 15 else "Low"
237
+ print(f" • {model_name} ({model_type}): {perplexity:.2f} - {confidence} confidence")
238
+ else:
239
+ print(f" • {model_name} ({model_type}): Failed")
240
+
241
+ def main():
242
+ """Run all demos"""
243
+ print("🎭 PerplexityViewer Core Functionality Demo")
244
+ print("This demo shows how perplexity calculation works under the hood")
245
+
246
+ try:
247
+ demo_decoder_perplexity()
248
+ demo_encoder_perplexity()
249
+ demo_comparison()
250
+
251
+ print("\n" + "="*60)
252
+ print("🎉 Demo completed successfully!")
253
+ print("💡 To try the interactive web interface, run: python run.py")
254
+ print("="*60)
255
+
256
+ except KeyboardInterrupt:
257
+ print("\n👋 Demo interrupted by user")
258
+ except Exception as e:
259
+ print(f"\n❌ Demo failed with error: {e}")
260
+ print("💡 Make sure you have installed all dependencies: pip install -r requirements.txt")
261
+
262
+ if __name__ == "__main__":
263
+ main()
mlm_demo.py ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Demo script showing how MLM probability affects encoder model analysis
4
+ """
5
+
6
+ import torch
7
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
8
+ import warnings
9
+ warnings.filterwarnings("ignore")
10
+
11
+ def demo_mlm_probability_effect():
12
+ """Demonstrate how MLM probability affects the analysis"""
13
+ print("🎭 MLM Probability Effect Demo")
14
+ print("=" * 60)
15
+
16
+ # Load a BERT model
17
+ model_name = "distilbert-base-uncased"
18
+ print(f"Loading {model_name}...")
19
+
20
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
21
+ model = AutoModelForMaskedLM.from_pretrained(model_name)
22
+ model.eval()
23
+
24
+ # Test text
25
+ text = "The capital of France is Paris and it is beautiful."
26
+ print(f"📝 Text: {text}")
27
+
28
+ # Tokenize
29
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
30
+ input_ids = inputs.input_ids
31
+ tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
32
+
33
+ print(f"🔤 Tokens: {tokens}")
34
+ print()
35
+
36
+ # Test different MLM probabilities
37
+ mlm_probs = [0.1, 0.15, 0.3, 0.5, 0.8]
38
+
39
+ for mlm_prob in mlm_probs:
40
+ print(f"🎯 MLM Probability: {mlm_prob}")
41
+
42
+ # Simulate the analysis process
43
+ seq_length = input_ids.size(1)
44
+ special_token_ids = {tokenizer.cls_token_id, tokenizer.sep_token_id, tokenizer.pad_token_id}
45
+
46
+ # Count how many tokens would be analyzed
47
+ analyzed_count = 0
48
+ analyzed_tokens = []
49
+
50
+ torch.manual_seed(42) # For reproducible results
51
+
52
+ for i in range(seq_length):
53
+ token = tokens[i]
54
+ if input_ids[0, i].item() not in special_token_ids:
55
+ if torch.rand(1).item() < mlm_prob:
56
+ analyzed_count += 1
57
+ analyzed_tokens.append(f"'{token}'")
58
+
59
+ total_content_tokens = sum(1 for i in range(seq_length) if input_ids[0, i].item() not in special_token_ids)
60
+
61
+ print(f" 📊 Analyzed: {analyzed_count}/{total_content_tokens} content tokens ({analyzed_count/total_content_tokens*100:.1f}%)")
62
+ print(f" 🎯 Analyzed tokens: {', '.join(analyzed_tokens[:5])}" + (f" + {len(analyzed_tokens)-5} more" if len(analyzed_tokens) > 5 else ""))
63
+ print()
64
+
65
+ def simulate_perplexity_calculation():
66
+ """Simulate how different MLM probabilities affect perplexity calculation"""
67
+ print("🧮 Perplexity Calculation Simulation")
68
+ print("=" * 60)
69
+
70
+ # Load model
71
+ model_name = "distilbert-base-uncased"
72
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
73
+ model = AutoModelForMaskedLM.from_pretrained(model_name)
74
+ model.eval()
75
+
76
+ text = "Machine learning is transforming artificial intelligence rapidly."
77
+ inputs = tokenizer(text, return_tensors="pt")
78
+ input_ids = inputs.input_ids
79
+
80
+ print(f"📝 Text: {text}")
81
+ print(f"🔤 Tokens: {tokenizer.convert_ids_to_tokens(input_ids[0])}")
82
+ print()
83
+
84
+ mlm_probs = [0.15, 0.3, 0.5]
85
+
86
+ for mlm_prob in mlm_probs:
87
+ print(f"🎭 MLM Probability: {mlm_prob}")
88
+
89
+ # Simulate multiple iterations
90
+ iteration_results = []
91
+
92
+ for iteration in range(3):
93
+ # Simulate masking
94
+ masked_input_ids = input_ids.clone()
95
+ original_tokens = input_ids.clone()
96
+ seq_length = input_ids.size(1)
97
+
98
+ mask_indices = []
99
+ special_token_ids = {tokenizer.cls_token_id, tokenizer.sep_token_id, tokenizer.pad_token_id}
100
+
101
+ torch.manual_seed(42 + iteration) # Different seed per iteration
102
+
103
+ for i in range(seq_length):
104
+ if input_ids[0, i].item() not in special_token_ids:
105
+ if torch.rand(1).item() < mlm_prob:
106
+ mask_indices.append(i)
107
+ masked_input_ids[0, i] = tokenizer.mask_token_id
108
+
109
+ if not mask_indices:
110
+ # Ensure at least one token is masked
111
+ non_special_indices = [i for i in range(seq_length)
112
+ if input_ids[0, i].item() not in special_token_ids]
113
+ if non_special_indices:
114
+ mask_idx = torch.randint(0, len(non_special_indices), (1,)).item()
115
+ mask_indices = [non_special_indices[mask_idx]]
116
+ masked_input_ids[0, mask_indices[0]] = tokenizer.mask_token_id
117
+
118
+ # Calculate pseudo-perplexity for masked tokens
119
+ with torch.no_grad():
120
+ outputs = model(masked_input_ids)
121
+ predictions = outputs.logits
122
+
123
+ masked_token_losses = []
124
+ masked_tokens = []
125
+
126
+ for idx in mask_indices:
127
+ target_id = original_tokens[0, idx]
128
+ pred_scores = predictions[0, idx]
129
+ prob = torch.softmax(pred_scores, dim=-1)[target_id]
130
+ loss = -torch.log(prob + 1e-10)
131
+ masked_token_losses.append(loss.item())
132
+
133
+ token = tokenizer.convert_ids_to_tokens([target_id])[0]
134
+ masked_tokens.append(token)
135
+
136
+ if masked_token_losses:
137
+ avg_loss = sum(masked_token_losses) / len(masked_token_losses)
138
+ perplexity = torch.exp(torch.tensor(avg_loss)).item()
139
+ iteration_results.append(perplexity)
140
+
141
+ print(f" Iteration {iteration + 1}: {len(mask_indices)} tokens masked")
142
+ print(f" Masked: {', '.join(masked_tokens[:3])}" + (f" + {len(masked_tokens)-3} more" if len(masked_tokens) > 3 else ""))
143
+ print(f" Pseudo-perplexity: {perplexity:.2f}")
144
+
145
+ if iteration_results:
146
+ avg_perplexity = sum(iteration_results) / len(iteration_results)
147
+ print(f" 📊 Average pseudo-perplexity: {avg_perplexity:.2f}")
148
+ print()
149
+
150
+ def explain_mlm_probability():
151
+ """Explain what MLM probability actually does"""
152
+ print("💡 Understanding MLM Probability")
153
+ print("=" * 60)
154
+
155
+ print("""
156
+ 🎭 **What is MLM Probability?**
157
+ MLM (Masked Language Modeling) probability controls what fraction of tokens
158
+ get randomly selected for detailed perplexity analysis.
159
+
160
+ 📊 **How it works:**
161
+ • Low MLM prob (0.15): Analyzes ~15% of tokens randomly
162
+ • High MLM prob (0.5): Analyzes ~50% of tokens randomly
163
+ • This affects both the average perplexity AND the visualization
164
+
165
+ 🎯 **Why it matters:**
166
+ • Higher MLM prob = More tokens analyzed = More complete picture
167
+ • Lower MLM prob = Fewer tokens analyzed = Faster but less comprehensive
168
+ • The randomness simulates real MLM training conditions
169
+
170
+ 🌈 **Visual Effect:**
171
+ • Analyzed tokens: Colored by their actual perplexity
172
+ • Non-analyzed tokens: Shown in gray (baseline)
173
+ • Try 0.15 vs 0.5 to see the difference!
174
+
175
+ ⚖️ **Trade-offs:**
176
+ • MLM 0.15: Fast, matches BERT training, but sparse analysis
177
+ • MLM 0.5: Slower, more comprehensive, but artificial
178
+ • MLM 0.8: Very slow, nearly complete, but unrealistic
179
+ """)
180
+
181
+ def main():
182
+ """Run MLM probability demonstration"""
183
+ try:
184
+ explain_mlm_probability()
185
+ demo_mlm_probability_effect()
186
+ simulate_perplexity_calculation()
187
+
188
+ print("🎉 MLM Probability Demo Complete!")
189
+ print("💡 Now try the app with different MLM probabilities:")
190
+ print(" • Use 0.15 for standard analysis")
191
+ print(" • Use 0.5 for more comprehensive analysis")
192
+ print(" • Watch how the visualization changes!")
193
+
194
+ except Exception as e:
195
+ print(f"❌ Demo failed: {e}")
196
+ print("💡 Make sure you have transformers installed: pip install transformers")
197
+
198
+ if __name__ == "__main__":
199
+ main()
simple_color_test.py ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Simple test to verify color visualization is working (no external dependencies)
4
+ """
5
+
6
+ def test_color_html():
7
+ """Test the HTML color generation without imports"""
8
+ print("🎨 Testing Color HTML Generation")
9
+ print("=" * 50)
10
+
11
+ # Simple test data
12
+ tokens = ["The", "quick", "brown", "fox"]
13
+ perplexities = [1.2, 5.8, 12.3, 2.1]
14
+
15
+ # Manual color generation test (similar to app logic)
16
+ max_perplexity = max(perplexities)
17
+ normalized_perps = [p / max_perplexity for p in perplexities]
18
+
19
+ print(f"Tokens: {tokens}")
20
+ print(f"Perplexities: {perplexities}")
21
+ print(f"Normalized: {[f'{n:.2f}' for n in normalized_perps]}")
22
+
23
+ # Test HTML generation
24
+ html_parts = ['<div>']
25
+
26
+ for i, (token, perp, norm_perp) in enumerate(zip(tokens, perplexities, normalized_perps)):
27
+ # Simple color mapping
28
+ if norm_perp < 0.3: # Green
29
+ red, green, blue = 46, 204, 113
30
+ elif norm_perp < 0.7: # Yellow
31
+ red, green, blue = 241, 196, 15
32
+ else: # Red
33
+ red, green, blue = 231, 76, 60
34
+
35
+ html_parts.append(
36
+ f'<span style="background-color: rgba({red}, {green}, {blue}, 0.7); '
37
+ f'padding: 2px 4px; margin: 1px; border-radius: 3px;" '
38
+ f'title="Perplexity: {perp}">{token}</span> '
39
+ )
40
+
41
+ html_parts.append('</div>')
42
+ html = ''.join(html_parts)
43
+
44
+ print(f"\nGenerated HTML:")
45
+ print(html)
46
+
47
+ # Basic checks
48
+ assert 'background-color' in html, "No background-color in HTML"
49
+ assert 'rgba(' in html, "No rgba colors in HTML"
50
+ assert 'title=' in html, "No tooltip in HTML"
51
+
52
+ print("\n✅ Basic HTML generation works!")
53
+ print("✅ Colors are included in the HTML!")
54
+ print("✅ Tooltips are included!")
55
+
56
+ return html
57
+
58
+ def create_test_html_file():
59
+ """Create a test HTML file to visually verify colors"""
60
+ html_content = """
61
+ <!DOCTYPE html>
62
+ <html>
63
+ <head>
64
+ <title>Color Test</title>
65
+ <style>
66
+ body { font-family: Arial, sans-serif; margin: 20px; }
67
+ .test-section { margin: 20px 0; padding: 15px; border: 1px solid #ccc; }
68
+ </style>
69
+ </head>
70
+ <body>
71
+ <h1>🎨 Perplexity Color Test</h1>
72
+
73
+ <div class="test-section">
74
+ <h2>Low Perplexity (Green - Confident)</h2>
75
+ <span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.2">The</span>
76
+ <span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.8">quick</span>
77
+ <span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 2.1">brown</span>
78
+ </div>
79
+
80
+ <div class="test-section">
81
+ <h2>Medium Perplexity (Yellow - Uncertain)</h2>
82
+ <span style="background-color: rgba(241, 196, 15, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 5.4">machine</span>
83
+ <span style="background-color: rgba(241, 196, 15, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 7.2">learning</span>
84
+ <span style="background-color: rgba(241, 196, 15, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 8.9">requires</span>
85
+ </div>
86
+
87
+ <div class="test-section">
88
+ <h2>High Perplexity (Red - Very Uncertain)</h2>
89
+ <span style="background-color: rgba(231, 76, 60, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 15.7">quantum</span>
90
+ <span style="background-color: rgba(231, 76, 60, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 23.4">entanglement</span>
91
+ <span style="background-color: rgba(231, 76, 60, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 31.2">defies</span>
92
+ </div>
93
+
94
+ <div class="test-section">
95
+ <h2>Mixed Example Sentence</h2>
96
+ <span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.2">The</span>
97
+ <span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 2.3">capital</span>
98
+ <span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.8">of</span>
99
+ <span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 2.1">France</span>
100
+ <span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.5">is</span>
101
+ <span style="background-color: rgba(46, 204, 113, 0.7); padding: 4px 8px; margin: 2px; border-radius: 3px;" title="Perplexity: 1.9">Paris</span>
102
+ </div>
103
+
104
+ <p><strong>Instructions:</strong> Hover over tokens to see perplexity values in tooltips!</p>
105
+ <p><strong>Color Legend:</strong></p>
106
+ <ul>
107
+ <li>🟢 <strong>Green:</strong> Low perplexity (model is confident)</li>
108
+ <li>🟡 <strong>Yellow:</strong> Medium perplexity (model is somewhat uncertain)</li>
109
+ <li>🔴 <strong>Red:</strong> High perplexity (model is very uncertain)</li>
110
+ </ul>
111
+ </body>
112
+ </html>
113
+ """
114
+
115
+ with open("color_test.html", "w") as f:
116
+ f.write(html_content)
117
+
118
+ print("💾 Created 'color_test.html' - open this in your browser!")
119
+ print(" You should see colored text with different backgrounds")
120
+
121
+ def main():
122
+ """Run the simple color test"""
123
+ try:
124
+ print("🎨 Simple Color Visualization Test")
125
+ print("=" * 60)
126
+
127
+ # Test HTML generation
128
+ html = test_color_html()
129
+
130
+ # Create visual test file
131
+ create_test_html_file()
132
+
133
+ print("\n" + "=" * 60)
134
+ print("🎉 Color test completed successfully!")
135
+ print("🌈 Open 'color_test.html' in your browser to see the colors")
136
+ print("💡 If colors show up there, they should work in the app too!")
137
+ print("=" * 60)
138
+
139
+ return True
140
+
141
+ except Exception as e:
142
+ print(f"❌ Test failed: {e}")
143
+ return False
144
+
145
+ if __name__ == "__main__":
146
+ success = main()
147
+ exit(0 if success else 1)
test_app.py ADDED
@@ -0,0 +1,271 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test script for PerplexityViewer app
4
+ """
5
+
6
+ import sys
7
+ import os
8
+ import torch
9
+ import numpy as np
10
+ from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForMaskedLM
11
+
12
+ # Add the current directory to the path so we can import the app
13
+ sys.path.append(os.path.dirname(os.path.abspath(__file__)))
14
+
15
+ try:
16
+ from app import (
17
+ load_model_and_tokenizer,
18
+ calculate_decoder_perplexity,
19
+ calculate_encoder_perplexity,
20
+ create_visualization,
21
+ process_text
22
+ )
23
+ from config import DEFAULT_MODELS, PROCESSING_SETTINGS
24
+ except ImportError as e:
25
+ print(f"Error importing app modules: {e}")
26
+ sys.exit(1)
27
+
28
+ def test_model_loading():
29
+ """Test model and tokenizer loading"""
30
+ print("Testing model loading...")
31
+
32
+ # Test decoder model
33
+ try:
34
+ model, tokenizer = load_model_and_tokenizer("distilgpt2", "decoder")
35
+ print("✓ Decoder model (distilgpt2) loaded successfully")
36
+ assert model is not None
37
+ assert tokenizer is not None
38
+ except Exception as e:
39
+ print(f"✗ Failed to load decoder model: {e}")
40
+ return False
41
+
42
+ # Test encoder model
43
+ try:
44
+ model, tokenizer = load_model_and_tokenizer("distilbert-base-uncased", "encoder")
45
+ print("✓ Encoder model (distilbert-base-uncased) loaded successfully")
46
+ assert model is not None
47
+ assert tokenizer is not None
48
+ except Exception as e:
49
+ print(f"✗ Failed to load encoder model: {e}")
50
+ return False
51
+
52
+ return True
53
+
54
+ def test_decoder_perplexity():
55
+ """Test decoder perplexity calculation"""
56
+ print("\nTesting decoder perplexity calculation...")
57
+
58
+ try:
59
+ model, tokenizer = load_model_and_tokenizer("distilgpt2", "decoder")
60
+ text = "The quick brown fox jumps over the lazy dog."
61
+
62
+ avg_perp, tokens, token_perps = calculate_decoder_perplexity(text, model, tokenizer, iterations=1)
63
+
64
+ print(f"✓ Average perplexity: {avg_perp:.4f}")
65
+ print(f"✓ Number of tokens: {len(tokens)}")
66
+ print(f"✓ Token perplexities shape: {token_perps.shape}")
67
+
68
+ assert avg_perp > 0
69
+ assert len(tokens) > 0
70
+ assert len(token_perps) == len(tokens)
71
+ assert all(p > 0 for p in token_perps)
72
+
73
+ return True
74
+ except Exception as e:
75
+ print(f"✗ Decoder perplexity test failed: {e}")
76
+ return False
77
+
78
+ def test_encoder_perplexity():
79
+ """Test encoder perplexity calculation"""
80
+ print("\nTesting encoder perplexity calculation...")
81
+
82
+ try:
83
+ model, tokenizer = load_model_and_tokenizer("distilbert-base-uncased", "encoder")
84
+ text = "The capital of France is Paris."
85
+
86
+ avg_perp, tokens, token_perps = calculate_encoder_perplexity(
87
+ text, model, tokenizer, mlm_probability=0.15, iterations=1
88
+ )
89
+
90
+ print(f"✓ Average pseudo-perplexity: {avg_perp:.4f}")
91
+ print(f"✓ Number of tokens: {len(tokens)}")
92
+ print(f"✓ Token perplexities shape: {token_perps.shape}")
93
+
94
+ assert avg_perp > 0
95
+ assert len(tokens) > 0
96
+ assert len(token_perps) == len(tokens)
97
+ assert all(p > 0 for p in token_perps)
98
+
99
+ return True
100
+ except Exception as e:
101
+ print(f"✗ Encoder perplexity test failed: {e}")
102
+ return False
103
+
104
+ def test_visualization():
105
+ """Test visualization creation"""
106
+ print("\nTesting visualization creation...")
107
+
108
+ try:
109
+ # Create dummy data
110
+ tokens = ["The", "quick", "brown", "fox", "jumps"]
111
+ perplexities = np.array([2.5, 1.8, 3.2, 4.1, 2.9])
112
+
113
+ html = create_visualization(tokens, perplexities)
114
+
115
+ print("✓ Visualization HTML generated")
116
+ assert isinstance(html, str)
117
+ assert len(html) > 0
118
+ assert "ent" in html.lower() # displaCy entity visualization
119
+
120
+ return True
121
+ except Exception as e:
122
+ print(f"✗ Visualization test failed: {e}")
123
+ return False
124
+
125
+ def test_edge_cases():
126
+ """Test edge cases and error handling"""
127
+ print("\nTesting edge cases...")
128
+
129
+ # Test empty text
130
+ try:
131
+ summary, viz, table = process_text("", "distilgpt2", "decoder", 1, 0.15)
132
+ assert "enter some text" in summary.lower()
133
+ print("✓ Empty text handled correctly")
134
+ except Exception as e:
135
+ print(f"✗ Empty text test failed: {e}")
136
+ return False
137
+
138
+ # Test very short text
139
+ try:
140
+ model, tokenizer = load_model_and_tokenizer("distilgpt2", "decoder")
141
+ text = "Hi"
142
+ avg_perp, tokens, token_perps = calculate_decoder_perplexity(text, model, tokenizer, iterations=1)
143
+ print(f"✓ Short text handled: {len(tokens)} tokens")
144
+ except Exception as e:
145
+ print(f"✓ Short text error handled correctly: {e}")
146
+
147
+ # Test long text (should be truncated)
148
+ try:
149
+ long_text = " ".join(["word"] * 600) # More than max_length
150
+ model, tokenizer = load_model_and_tokenizer("distilgpt2", "decoder")
151
+ avg_perp, tokens, token_perps = calculate_decoder_perplexity(long_text, model, tokenizer, iterations=1)
152
+ print(f"✓ Long text truncated to {len(tokens)} tokens")
153
+ assert len(tokens) <= 512 # Should be truncated
154
+ except Exception as e:
155
+ print(f"✗ Long text test failed: {e}")
156
+ return False
157
+
158
+ return True
159
+
160
+ def test_process_text_integration():
161
+ """Test the main process_text function"""
162
+ print("\nTesting process_text integration...")
163
+
164
+ test_cases = [
165
+ {
166
+ "text": "The quick brown fox jumps over the lazy dog.",
167
+ "model": "distilgpt2",
168
+ "type": "decoder",
169
+ "iterations": 1,
170
+ "mlm_prob": 0.15
171
+ },
172
+ {
173
+ "text": "Machine learning is a subset of artificial intelligence.",
174
+ "model": "distilbert-base-uncased",
175
+ "type": "encoder",
176
+ "iterations": 1,
177
+ "mlm_prob": 0.2
178
+ }
179
+ ]
180
+
181
+ for i, case in enumerate(test_cases):
182
+ try:
183
+ summary, viz_html, df = process_text(
184
+ case["text"],
185
+ case["model"],
186
+ case["type"],
187
+ case["iterations"],
188
+ case["mlm_prob"]
189
+ )
190
+
191
+ print(f"✓ Test case {i+1} ({case['type']}) processed successfully")
192
+ assert "Analysis Results" in summary
193
+ assert len(viz_html) > 0
194
+ assert len(df) > 0
195
+
196
+ except Exception as e:
197
+ print(f"✗ Test case {i+1} failed: {e}")
198
+ return False
199
+
200
+ return True
201
+
202
+ def test_configuration():
203
+ """Test configuration loading"""
204
+ print("\nTesting configuration...")
205
+
206
+ try:
207
+ assert "decoder" in DEFAULT_MODELS
208
+ assert "encoder" in DEFAULT_MODELS
209
+ assert len(DEFAULT_MODELS["decoder"]) > 0
210
+ assert len(DEFAULT_MODELS["encoder"]) > 0
211
+ assert PROCESSING_SETTINGS["default_iterations"] >= 1
212
+ print("✓ Configuration loaded correctly")
213
+ return True
214
+ except Exception as e:
215
+ print(f"✗ Configuration test failed: {e}")
216
+ return False
217
+
218
+ def run_all_tests():
219
+ """Run all tests"""
220
+ print("="*50)
221
+ print("Running PerplexityViewer Tests")
222
+ print("="*50)
223
+
224
+ tests = [
225
+ ("Configuration", test_configuration),
226
+ ("Model Loading", test_model_loading),
227
+ ("Decoder Perplexity", test_decoder_perplexity),
228
+ ("Encoder Perplexity", test_encoder_perplexity),
229
+ ("Visualization", test_visualization),
230
+ ("Edge Cases", test_edge_cases),
231
+ ("Integration", test_process_text_integration)
232
+ ]
233
+
234
+ passed = 0
235
+ failed = 0
236
+
237
+ for test_name, test_func in tests:
238
+ print(f"\n[{test_name}]")
239
+ try:
240
+ if test_func():
241
+ passed += 1
242
+ print(f"✓ {test_name} PASSED")
243
+ else:
244
+ failed += 1
245
+ print(f"✗ {test_name} FAILED")
246
+ except Exception as e:
247
+ failed += 1
248
+ print(f"✗ {test_name} FAILED with exception: {e}")
249
+
250
+ print("\n" + "="*50)
251
+ print(f"Test Results: {passed} passed, {failed} failed")
252
+ print("="*50)
253
+
254
+ return failed == 0
255
+
256
+ if __name__ == "__main__":
257
+ # Check if PyTorch is available
258
+ print(f"PyTorch version: {torch.__version__}")
259
+ print(f"CUDA available: {torch.cuda.is_available()}")
260
+ if torch.cuda.is_available():
261
+ print(f"CUDA device: {torch.cuda.get_device_name()}")
262
+
263
+ # Run tests
264
+ success = run_all_tests()
265
+
266
+ if success:
267
+ print("\n🎉 All tests passed! The app should work correctly.")
268
+ sys.exit(0)
269
+ else:
270
+ print("\n❌ Some tests failed. Please check the errors above.")
271
+ sys.exit(1)
test_colors.py ADDED
@@ -0,0 +1,198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test script to verify color visualization is working correctly
4
+ """
5
+
6
+ import numpy as np
7
+ import re
8
+ from app import create_visualization
9
+
10
+ def test_color_visualization():
11
+ """Test that the visualization creates colored HTML"""
12
+ print("🎨 Testing Color Visualization")
13
+ print("=" * 50)
14
+
15
+ # Test with sample data
16
+ tokens = ["The", "quick", "brown", "fox", "jumps", "over", "lazy", "dog"]
17
+ perplexities = np.array([1.2, 2.5, 8.3, 3.1, 15.7, 2.0, 12.4, 1.8])
18
+
19
+ print(f"📝 Tokens: {tokens}")
20
+ print(f"📊 Perplexities: {perplexities}")
21
+
22
+ # Generate visualization
23
+ html = create_visualization(tokens, perplexities)
24
+
25
+ # Check that HTML was generated
26
+ assert len(html) > 100, "HTML output too short"
27
+ print("✅ HTML generated successfully")
28
+
29
+ # Check for color information in HTML
30
+ color_pattern = r'rgba?\(\d+,\s*\d+,\s*\d+(?:,\s*[\d.]+)?\)'
31
+ colors_found = re.findall(color_pattern, html)
32
+
33
+ print(f"🎨 Colors found in HTML: {len(colors_found)}")
34
+ for i, color in enumerate(colors_found[:5]): # Show first 5
35
+ print(f" Color {i+1}: {color}")
36
+
37
+ assert len(colors_found) > 0, "No colors found in HTML output"
38
+ print("✅ Color information found in HTML")
39
+
40
+ # Check for span elements with style attributes
41
+ span_pattern = r'<span style="[^"]*background-color[^"]*"[^>]*>'
42
+ spans_found = re.findall(span_pattern, html)
43
+
44
+ print(f"🏷️ Styled spans found: {len(spans_found)}")
45
+ assert len(spans_found) >= len(tokens) - 2, "Not enough styled spans found" # Allow for some filtering
46
+ print("✅ Styled spans with background colors found")
47
+
48
+ # Check for tooltip information
49
+ assert 'title="Perplexity:' in html, "No tooltip information found"
50
+ print("✅ Tooltip information found")
51
+
52
+ # Verify different colors for different perplexity ranges
53
+ # Extract RGB values
54
+ rgb_values = []
55
+ for color in colors_found:
56
+ # Extract numbers from rgba(r,g,b,a) or rgb(r,g,b)
57
+ numbers = re.findall(r'\d+', color)
58
+ if len(numbers) >= 3:
59
+ rgb_values.append((int(numbers[0]), int(numbers[1]), int(numbers[2])))
60
+
61
+ if len(rgb_values) >= 2:
62
+ # Check that we have different colors (not all the same)
63
+ unique_colors = set(rgb_values)
64
+ print(f"🌈 Unique colors found: {len(unique_colors)}")
65
+ assert len(unique_colors) > 1, "All tokens have the same color"
66
+ print("✅ Multiple different colors found")
67
+
68
+ # Check color range makes sense
69
+ red_values = [r for r, g, b in rgb_values]
70
+ green_values = [g for r, g, b in rgb_values]
71
+
72
+ print(f"🔴 Red range: {min(red_values)} - {max(red_values)}")
73
+ print(f"🟢 Green range: {min(green_values)} - {max(green_values)}")
74
+
75
+ # Should have variation in color channels
76
+ assert max(red_values) - min(red_values) > 20, "Not enough red variation"
77
+ print("✅ Sufficient color variation found")
78
+
79
+ return html
80
+
81
+ def test_edge_cases():
82
+ """Test edge cases for color visualization"""
83
+ print("\n🧪 Testing Edge Cases")
84
+ print("=" * 50)
85
+
86
+ # Test with very high perplexities
87
+ tokens = ["unusual", "words", "here"]
88
+ high_perplexities = np.array([100.0, 200.0, 50.0])
89
+
90
+ html = create_visualization(tokens, high_perplexities)
91
+ assert len(html) > 50, "HTML too short for high perplexities"
92
+ print("✅ High perplexity values handled")
93
+
94
+ # Test with very low perplexities
95
+ low_perplexities = np.array([0.1, 0.2, 0.15])
96
+ html = create_visualization(tokens, low_perplexities)
97
+ assert len(html) > 50, "HTML too short for low perplexities"
98
+ print("✅ Low perplexity values handled")
99
+
100
+ # Test with single token
101
+ single_token = ["word"]
102
+ single_perplexity = np.array([5.0])
103
+ html = create_visualization(single_token, single_perplexity)
104
+ assert len(html) > 50, "HTML too short for single token"
105
+ print("✅ Single token handled")
106
+
107
+ # Test with empty input
108
+ empty_html = create_visualization([], np.array([]))
109
+ assert "No tokens" in empty_html, "Empty case not handled properly"
110
+ print("✅ Empty input handled")
111
+
112
+ def test_color_gradient():
113
+ """Test that color gradient works as expected"""
114
+ print("\n🌈 Testing Color Gradient")
115
+ print("=" * 50)
116
+
117
+ # Create tokens with ascending perplexities
118
+ tokens = [f"token_{i}" for i in range(10)]
119
+ perplexities = np.array([i * 2.0 + 1.0 for i in range(10)]) # 1, 3, 5, 7, 9, 11, 13, 15, 17, 19
120
+
121
+ html = create_visualization(tokens, perplexities)
122
+
123
+ # Extract all RGB values in order
124
+ color_pattern = r'rgba?\((\d+),\s*(\d+),\s*(\d+)(?:,\s*[\d.]+)?\)'
125
+ colors_found = re.findall(color_pattern, html)
126
+
127
+ if len(colors_found) >= 5:
128
+ # Convert to numeric values
129
+ rgb_values = [(int(r), int(g), int(b)) for r, g, b in colors_found]
130
+
131
+ # Check that low perplexity tokens are more green
132
+ low_perp_color = rgb_values[0] # First token (lowest perplexity)
133
+ high_perp_color = rgb_values[-1] # Last token (highest perplexity)
134
+
135
+ print(f"🟢 Low perplexity color (perp={perplexities[0]:.1f}): RGB{low_perp_color}")
136
+ print(f"🔴 High perplexity color (perp={perplexities[-1]:.1f}): RGB{high_perp_color}")
137
+
138
+ # Low perplexity should be more green (higher green value)
139
+ # High perplexity should be more red (higher red value)
140
+ if low_perp_color[1] > high_perp_color[1]: # Green component
141
+ print("✅ Low perplexity tokens are greener")
142
+ else:
143
+ print("⚠️ Expected low perplexity to be greener")
144
+
145
+ if high_perp_color[0] > low_perp_color[0]: # Red component
146
+ print("✅ High perplexity tokens are redder")
147
+ else:
148
+ print("⚠️ Expected high perplexity to be redder")
149
+
150
+ def main():
151
+ """Run all color visualization tests"""
152
+ print("🎨 Color Visualization Test Suite")
153
+ print("=" * 60)
154
+
155
+ try:
156
+ # Test basic functionality
157
+ html = test_color_visualization()
158
+
159
+ # Test edge cases
160
+ test_edge_cases()
161
+
162
+ # Test color gradient
163
+ test_color_gradient()
164
+
165
+ print("\n" + "=" * 60)
166
+ print("🎉 All color visualization tests passed!")
167
+ print("🌈 The tokens should now appear with colored backgrounds!")
168
+ print("=" * 60)
169
+
170
+ # Save a sample HTML file for manual inspection
171
+ with open("sample_visualization.html", "w") as f:
172
+ f.write(f"""
173
+ <!DOCTYPE html>
174
+ <html>
175
+ <head>
176
+ <title>Sample Perplexity Visualization</title>
177
+ </head>
178
+ <body>
179
+ <h1>Sample Perplexity Visualization</h1>
180
+ <p>This is what the colored visualization should look like:</p>
181
+ {html}
182
+ </body>
183
+ </html>
184
+ """)
185
+ print("💾 Sample visualization saved to 'sample_visualization.html'")
186
+ print(" Open this file in your browser to see the colors!")
187
+
188
+ return True
189
+
190
+ except Exception as e:
191
+ print(f"\n❌ Color visualization test failed: {e}")
192
+ import traceback
193
+ traceback.print_exc()
194
+ return False
195
+
196
+ if __name__ == "__main__":
197
+ success = main()
198
+ exit(0 if success else 1)