Spaces:
Sleeping
Sleeping
Joschka Strueber
commited on
Commit
·
5623280
1
Parent(s):
69fd3ae
[Ref] switch from mathjax in markdown to html block
Browse files
app.py
CHANGED
|
@@ -78,11 +78,17 @@ with gr.Blocks(title="LLM Similarity Analyzer", css=app_util.custom_css) as demo
|
|
| 78 |
)
|
| 79 |
|
| 80 |
gr.Markdown("## Information")
|
| 81 |
-
gr.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
for model similarity which adjusts for chance agreement due to accuracy. Using CAPA, we find: (1) LLM-as-a-judge scores are \
|
| 83 |
biased towards more similar models controlling for the model's capability. (2) Gain from training strong models on annotations \
|
| 84 |
of weak supervisors (weak-to-strong generalization) is higher when the two models are more different. (3) Concerningly, model \
|
| 85 |
-
errors are getting more correlated as capabilities increase
|
|
|
|
| 86 |
with gr.Row():
|
| 87 |
gr.Image(value="data/table_capa.png", label="Comparison of different similarity metrics for multiple-choice questions", elem_classes="image_container", interactive=False)
|
| 88 |
gr.Markdown("""
|
|
|
|
| 78 |
)
|
| 79 |
|
| 80 |
gr.Markdown("## Information")
|
| 81 |
+
gr.HTML("""
|
| 82 |
+
<script type="text/javascript" async
|
| 83 |
+
src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-MML-AM_CHTML">
|
| 84 |
+
</script>
|
| 85 |
+
|
| 86 |
+
<p>We propose Chance Adjusted Probabilistic Agreement (<span>\(\operatorname{CAPA}\)</span>, or <span>\(\kappa_p\)</span>), a novel metric
|
| 87 |
for model similarity which adjusts for chance agreement due to accuracy. Using CAPA, we find: (1) LLM-as-a-judge scores are \
|
| 88 |
biased towards more similar models controlling for the model's capability. (2) Gain from training strong models on annotations \
|
| 89 |
of weak supervisors (weak-to-strong generalization) is higher when the two models are more different. (3) Concerningly, model \
|
| 90 |
+
errors are getting more correlated as capabilities increase.</p>
|
| 91 |
+
""")
|
| 92 |
with gr.Row():
|
| 93 |
gr.Image(value="data/table_capa.png", label="Comparison of different similarity metrics for multiple-choice questions", elem_classes="image_container", interactive=False)
|
| 94 |
gr.Markdown("""
|