Jellyfish042 commited on
Commit
24da7ec
·
1 Parent(s): 350392a

bug fix and improvements

Browse files
.gitignore CHANGED
@@ -25,3 +25,6 @@ Thumbs.db
25
 
26
  # Gradio
27
  flagged/
 
 
 
 
25
 
26
  # Gradio
27
  flagged/
28
+
29
+ # Test artifacts
30
+ tests/_out/
examples/sample_texts.json DELETED
@@ -1,24 +0,0 @@
1
- {
2
- "examples": [
3
- {
4
- "name": "News",
5
- "text": "The rapid advancement of artificial intelligence has sparked both excitement and concern among researchers worldwide. While AI systems demonstrate remarkable capabilities in language understanding and generation, questions remain about their potential impact on employment and society."
6
- },
7
- {
8
- "name": "Code",
9
- "text": "def fibonacci(n):\n if n <= 1:\n return n\n return fibonacci(n-1) + fibonacci(n-2)\n\n# Calculate first 10 Fibonacci numbers\nfor i in range(10):\n print(f\"F({i}) = {fibonacci(i)}\")"
10
- },
11
- {
12
- "name": "Literature",
13
- "text": "It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness."
14
- },
15
- {
16
- "name": "Chinese",
17
- "text": "人工智能的快速发展在全球研究人员中引发了兴奋和担忧。虽然人工智能系统在语言理解和生成方面展现了非凡的能力,但关于其对就业和社会的潜在影响的问题仍然存在。"
18
- },
19
- {
20
- "name": "Mixed",
21
- "text": "The transformer architecture, introduced in the paper \"Attention Is All You Need\" (2017), revolutionized NLP. 这种架构使用自注意力机制来处理序列数据,比传统的RNN和LSTM更加高效。"
22
- }
23
- ]
24
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
precomputed/example_metadata.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "example_text": "The Bitter Lesson\nRich Sutton\nMarch 13, 2019\nThe biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers' belated learning of this bitter lesson, and it is instructive to review some of the most prominent.\n\nIn computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess. When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. They said that ``brute force\" search may have won this time, but it was not a general strategy, and anyway it was not how people played chess. These researchers wanted methods based on human input to win and were disappointed when they did not.\n\nA similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale. Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear. Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. In computer Go, as in computer chess, researchers' initial effort was directed towards utilizing human understanding (so that less search was needed) and only much later was much greater success had by embracing search and learning.\n\nIn speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. Entrants included a host of special methods that took advantage of human knowledge---knowledge of words, of phonemes, of the human vocal tract, etc. On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs). Again, the statistical methods won out over the human-knowledge-based methods. This led to a major change in all of natural language processing, gradually over decades, where statistics and computation came to dominate the field. The recent rise of deep learning in speech recognition is the most recent step in this consistent direction. Deep learning methods rely even less on human knowledge, and use even more computation, together with learning on huge training sets, to produce dramatically better speech recognition systems. As in the games, researchers always tried to make systems that worked the way the researchers thought their own minds worked---they tried to put that knowledge in their systems---but it proved ultimately counterproductive, and a colossal waste of researcher's time, when, through Moore's law, massive computation became available and a means was found to put it to good use.\n\nIn computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.\n\nThis is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.\n\nOne thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.\n\nThe second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.\n",
3
- "qwen_inference_time": 20.51845383644104,
4
- "rwkv_inference_time": 32.119553327560425,
5
  "qwen_compression_rate": 48.14428559434192,
6
  "rwkv_compression_rate": 47.62502588510778
7
  }
 
1
  {
2
  "example_text": "The Bitter Lesson\nRich Sutton\nMarch 13, 2019\nThe biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers' belated learning of this bitter lesson, and it is instructive to review some of the most prominent.\n\nIn computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess. When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. They said that ``brute force\" search may have won this time, but it was not a general strategy, and anyway it was not how people played chess. These researchers wanted methods based on human input to win and were disappointed when they did not.\n\nA similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale. Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear. Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. In computer Go, as in computer chess, researchers' initial effort was directed towards utilizing human understanding (so that less search was needed) and only much later was much greater success had by embracing search and learning.\n\nIn speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. Entrants included a host of special methods that took advantage of human knowledge---knowledge of words, of phonemes, of the human vocal tract, etc. On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs). Again, the statistical methods won out over the human-knowledge-based methods. This led to a major change in all of natural language processing, gradually over decades, where statistics and computation came to dominate the field. The recent rise of deep learning in speech recognition is the most recent step in this consistent direction. Deep learning methods rely even less on human knowledge, and use even more computation, together with learning on huge training sets, to produce dramatically better speech recognition systems. As in the games, researchers always tried to make systems that worked the way the researchers thought their own minds worked---they tried to put that knowledge in their systems---but it proved ultimately counterproductive, and a colossal waste of researcher's time, when, through Moore's law, massive computation became available and a means was found to put it to good use.\n\nIn computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.\n\nThis is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.\n\nOne thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.\n\nThe second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.\n",
3
+ "qwen_inference_time": 19.49976086616516,
4
+ "rwkv_inference_time": 29.02472949028015,
5
  "qwen_compression_rate": 48.14428559434192,
6
  "rwkv_compression_rate": 47.62502588510778
7
  }
precomputed/example_visualization.html CHANGED
The diff for this file is too large to render. See raw diff
 
test_sample.txt ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ BEGIN TEST
2
+ Leading spaces (2) + trailing spaces (2)··
3
+ TAB_LITERAL: [START] [END] (这里中间有一个真实的TAB)
4
+
5
+ Raw escape-like text: \n \r \t \\n \\r \\t \\x00 \\x1f \\x7f \\xff \u0000 \u202E \u200F \u200E
6
+ Bytes-ish hex: e5 bd 93 e7 84 b6 | 00 1f 7f ff | 0x00 0x1F 0x7F 0xFF | b"\x00\x1f\x7f\xff"
7
+
8
+ HTML tags (should render as text, not tags):
9
+ <think></think> <think>inner</think> <script>alert('x')</script> <style>body{color:red}</style>
10
+ <div class="x" data-x="1 & 2">Hello</div> <span>Span</span> <a href="https://example.com?q=1&x=<tag>">link</a>
11
+ <img src=x onerror=alert(1)> <br> <hr> <p>para</p> <table><tr><td>cell</td></tr></table>
12
+ Nested-ish: </span><span data-x="</span>">confuse</span>
13
+
14
+ HTML entities:
15
+ &lt;think&gt; &lt;/think&gt; &amp; &quot; &#39; &nbsp; &#10; &#x3C; &#x3E; &#x26;
16
+
17
+ Markdown-ish:
18
+ # H1
19
+ ## H2
20
+ - list item 1
21
+ - list item 2
22
+ > blockquote
23
+ --- (three hyphens)
24
+
25
+ Languages:
26
+ 中文 简体/繁體 日本語 かな カタカナ 한국어 العربية עברית हिन्दी ไทย Русский Ελληνικά Español Français Português Türkçe Việt
27
+ RTL mix: العربية ABC עברית 123 (注意混排方向)
28
+
29
+ Combining vs composed:
30
+ é (e + combining acute) vs é (single codepoint)
31
+ Å (A + combining ring) vs Å
32
+
33
+ Emoji / ZWJ:
34
+ 😀 😅 🧠 👩🏽‍💻 🏳️‍🌈 👨‍👩‍👧‍👦 🧑🏾‍🚀 🫠
35
+ Zero-width samples (括号里含真实不可见字符):
36
+ ZWS(​) ZWNJ(‌) ZWJ(‍) LRM(‎) RLM(‏)
37
+
38
+ Long line to test wrapping:
39
+ human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation; therefore human-should-not-wrap-weirdly-here.
40
+
41
+ Literal backslashes:
42
+ C:\path\to\file\name.txt and \\server\share\folder
43
+
44
+ End.
tests/_out/stress.output.html DELETED
The diff for this file is too large to render. See raw diff
 
tests/_out/stress.render_model.json DELETED
The diff for this file is too large to render. See raw diff
 
visualization/assets/main.css CHANGED
@@ -100,7 +100,7 @@
100
  #tooltip {
101
  position: fixed;
102
  background-color: rgba(0, 0, 0, 0.9);
103
- color: white;
104
  padding: 10px 14px;
105
  border-radius: 6px;
106
  font-size: 12px;
@@ -113,7 +113,7 @@
113
  box-shadow: 0 2px 10px rgba(0,0,0,0.3);
114
  }
115
  #tooltip .label {
116
- color: #aaa;
117
  font-weight: bold;
118
  }
119
  #tooltip .bytes {
@@ -121,18 +121,18 @@
121
  font-family: monospace;
122
  }
123
  #tooltip .loss-a {
124
- color: #86efac;
125
  font-family: monospace;
126
  }
127
  #tooltip .loss-b {
128
- color: #fca5a5;
129
  font-family: monospace;
130
  }
131
  #tooltip .model-a {
132
- color: #fcd34d;
133
  }
134
  #tooltip .model-b {
135
- color: #7dd3fc;
136
  }
137
  #tooltip .topk-section {
138
  margin-top: 8px;
@@ -142,22 +142,23 @@
142
  #tooltip .topk-container {
143
  display: flex;
144
  gap: 16px;
 
145
  }
146
  #tooltip .topk-column {
147
- flex: 1;
148
  min-width: 180px;
149
  }
150
  #tooltip .topk-title {
151
- color: #aaa;
152
  font-weight: bold;
153
  margin-bottom: 4px;
154
  font-size: 11px;
155
  }
156
  #tooltip .topk-title.model-a {
157
- color: #86efac;
158
  }
159
  #tooltip .topk-title.model-b {
160
- color: #fca5a5;
161
  }
162
  #tooltip .topk-list {
163
  font-size: 11px;
@@ -192,13 +193,13 @@
192
  white-space: nowrap;
193
  }
194
  #tooltip .token-prob {
195
- color: #86efac;
196
  font-family: monospace;
197
  font-size: 11px;
198
  white-space: nowrap;
199
  }
200
  #tooltip .token-id {
201
- color: #888;
202
  font-family: monospace;
203
  white-space: nowrap;
204
  }
@@ -211,14 +212,14 @@
211
  word-break: normal;
212
  }
213
  #tooltip .topk-rank {
214
- color: #888;
215
  min-width: 18px;
216
  }
217
  #tooltip .topk-rank.hit {
218
- color: #ffd700;
219
  }
220
  #tooltip .topk-token {
221
- color: #a5f3fc;
222
  white-space: pre-wrap;
223
  overflow-wrap: anywhere;
224
  word-break: break-word;
@@ -236,7 +237,7 @@
236
  color: #fb7185;
237
  }
238
  #tooltip .topk-prob {
239
- color: #86efac;
240
  min-width: 45px;
241
  text-align: right;
242
  }
@@ -244,7 +245,7 @@
244
  color: #22c55e;
245
  }
246
  #tooltip .topk-miss {
247
- color: #ef4444;
248
  font-style: italic;
249
  }
250
 
 
100
  #tooltip {
101
  position: fixed;
102
  background-color: rgba(0, 0, 0, 0.9);
103
+ color: #e5e7eb;
104
  padding: 10px 14px;
105
  border-radius: 6px;
106
  font-size: 12px;
 
113
  box-shadow: 0 2px 10px rgba(0,0,0,0.3);
114
  }
115
  #tooltip .label {
116
+ color: #9ca3af;
117
  font-weight: bold;
118
  }
119
  #tooltip .bytes {
 
121
  font-family: monospace;
122
  }
123
  #tooltip .loss-a {
124
+ color: #fbbf24;
125
  font-family: monospace;
126
  }
127
  #tooltip .loss-b {
128
+ color: #60a5fa;
129
  font-family: monospace;
130
  }
131
  #tooltip .model-a {
132
+ color: #fbbf24;
133
  }
134
  #tooltip .model-b {
135
+ color: #60a5fa;
136
  }
137
  #tooltip .topk-section {
138
  margin-top: 8px;
 
142
  #tooltip .topk-container {
143
  display: flex;
144
  gap: 16px;
145
+ align-items: flex-start;
146
  }
147
  #tooltip .topk-column {
148
+ flex: 0 0 auto;
149
  min-width: 180px;
150
  }
151
  #tooltip .topk-title {
152
+ color: #9ca3af;
153
  font-weight: bold;
154
  margin-bottom: 4px;
155
  font-size: 11px;
156
  }
157
  #tooltip .topk-title.model-a {
158
+ color: #fbbf24;
159
  }
160
  #tooltip .topk-title.model-b {
161
+ color: #60a5fa;
162
  }
163
  #tooltip .topk-list {
164
  font-size: 11px;
 
193
  white-space: nowrap;
194
  }
195
  #tooltip .token-prob {
196
+ color: #a7f3d0;
197
  font-family: monospace;
198
  font-size: 11px;
199
  white-space: nowrap;
200
  }
201
  #tooltip .token-id {
202
+ color: #9ca3af;
203
  font-family: monospace;
204
  white-space: nowrap;
205
  }
 
212
  word-break: normal;
213
  }
214
  #tooltip .topk-rank {
215
+ color: #6b7280;
216
  min-width: 18px;
217
  }
218
  #tooltip .topk-rank.hit {
219
+ color: #22c55e;
220
  }
221
  #tooltip .topk-token {
222
+ color: #e5e7eb;
223
  white-space: pre-wrap;
224
  overflow-wrap: anywhere;
225
  word-break: break-word;
 
237
  color: #fb7185;
238
  }
239
  #tooltip .topk-prob {
240
+ color: #a7f3d0;
241
  min-width: 45px;
242
  text-align: right;
243
  }
 
245
  color: #22c55e;
246
  }
247
  #tooltip .topk-miss {
248
+ color: #f87171;
249
  font-style: italic;
250
  }
251