diff --git "a/precomputed/example_visualization.html" "b/precomputed/example_visualization.html" --- "a/precomputed/example_visualization.html" +++ "b/precomputed/example_visualization.html" @@ -97,14 +97,21 @@ .token:hover { outline: 1px dashed #666; } + .token-kind-control { + color: #f59e0b; + } + .token-kind-raw { + color: #fb7185; + } #tooltip { position: fixed; background-color: rgba(0, 0, 0, 0.9); - color: white; + color: #e5e7eb; padding: 10px 14px; border-radius: 6px; font-size: 12px; - max-width: 500px; + max-width: none; + width: max-content; z-index: 2000; pointer-events: none; display: none; @@ -112,7 +119,7 @@ box-shadow: 0 2px 10px rgba(0,0,0,0.3); } #tooltip .label { - color: #aaa; + color: #9ca3af; font-weight: bold; } #tooltip .bytes { @@ -120,18 +127,18 @@ font-family: monospace; } #tooltip .loss-a { - color: #86efac; + color: #fbbf24; font-family: monospace; } #tooltip .loss-b { - color: #fca5a5; + color: #60a5fa; font-family: monospace; } #tooltip .model-a { - color: #fcd34d; + color: #fbbf24; } #tooltip .model-b { - color: #7dd3fc; + color: #60a5fa; } #tooltip .topk-section { margin-top: 8px; @@ -141,22 +148,23 @@ #tooltip .topk-container { display: flex; gap: 16px; + align-items: flex-start; } #tooltip .topk-column { - flex: 1; + flex: 0 0 auto; min-width: 180px; } #tooltip .topk-title { - color: #aaa; + color: #9ca3af; font-weight: bold; margin-bottom: 4px; font-size: 11px; } #tooltip .topk-title.model-a { - color: #86efac; + color: #fbbf24; } #tooltip .topk-title.model-b { - color: #fca5a5; + color: #60a5fa; } #tooltip .topk-list { font-size: 11px; @@ -173,33 +181,51 @@ align-items: center; gap: 6px; white-space: nowrap; + flex-wrap: nowrap; + overflow-x: visible; } #tooltip .token-chips { display: flex; flex-wrap: nowrap; gap: 4px; + align-items: center; + flex: 0 0 auto; } #tooltip .token-chip-group { display: inline-flex; align-items: center; gap: 4px; + flex: 0 0 auto; + white-space: nowrap; + } + #tooltip .token-prob { + color: #a7f3d0; + font-family: monospace; + font-size: 11px; + white-space: nowrap; } #tooltip .token-id { - color: #888; + color: #9ca3af; font-family: monospace; + white-space: nowrap; } #tooltip .token-chip { max-width: 100%; } + #tooltip .token-chip-group .topk-token { + white-space: pre; + overflow-wrap: normal; + word-break: normal; + } #tooltip .topk-rank { - color: #888; + color: #6b7280; min-width: 18px; } #tooltip .topk-rank.hit { - color: #ffd700; + color: #22c55e; } #tooltip .topk-token { - color: #a5f3fc; + color: #e5e7eb; white-space: pre-wrap; overflow-wrap: anywhere; word-break: break-word; @@ -217,7 +243,7 @@ color: #fb7185; } #tooltip .topk-prob { - color: #86efac; + color: #a7f3d0; min-width: 45px; text-align: right; } @@ -225,65 +251,139 @@ color: #22c55e; } #tooltip .topk-miss { - color: #ef4444; + color: #f87171; font-style: italic; } +
-
-
-
Model A: RWKV7-G1C-1.5B
-
Model B: Qwen3-1.7B-Base
-
RWKV Compression: 9.35%
-
Qwen Compression: 9.47%
-
Avg Delta: -0.12%
-
-
-
-
- RWKV better than avg +
+
+
Model A: RWKV7-G1C-1.5B
+
Model B: Qwen3-1.7B-Base
+
RWKV Compression: 9.35%
+
Qwen Compression: 9.47%
+
Avg Delta: -0.12%
-
-
- Equal to avg -
-
-
- RWKV worse than avg -
-
- Color Range: - - 10% +
+
+
+ RWKV better than avg +
+
+
+ Equal to avg +
+
+
+ RWKV worse than avg +
+
+ Color Range: + + 10% +
-
-
The Bitter Lesson -Rich Sutton -March 13, 2019 -The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers' belated learning of this bitter lesson, and it is instructive to review some of the most prominent. - -In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess. When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. They said that ``brute force" search may have won this time, but it was not a general strategy, and anyway it was not how people played chess. These researchers wanted methods based on human input to win and were disappointed when they did not. - -A similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale. Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear. Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. In computer Go, as in computer chess, researchers' initial effort was directed towards utilizing human understanding (so that less search was needed) and only much later was much greater success had by embracing search and learning. + +
+ +