Spaces:
Running
Running
fix
Browse files
app.py
CHANGED
|
@@ -107,56 +107,17 @@ def goto(page: str):
|
|
| 107 |
page = st.query_params.get("page", "demo")
|
| 108 |
|
| 109 |
if page == "info":
|
| 110 |
-
st.title("about this demo")
|
| 111 |
st.write("""
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
-
|
| 118 |
-
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
---
|
| 123 |
-
|
| 124 |
-
## 📌 What are Vector Embeddings?
|
| 125 |
-
A **vector embedding** is a way of representing text (words, sentences, or documents) as a list of numbers — a point in a high-dimensional space.
|
| 126 |
-
These numbers are produced by a trained **language model** that captures semantic meaning.
|
| 127 |
-
|
| 128 |
-
In this space:
|
| 129 |
-
- Words with **similar meanings** end up **near each other**
|
| 130 |
-
- Dissimilar words are placed **far apart**
|
| 131 |
-
- The model can detect relationships and groupings that aren’t obvious from spelling or grammar alone
|
| 132 |
-
|
| 133 |
-
Example:
|
| 134 |
-
`"cat"` and `"dog"` will likely be closer to each other than to `"table"`, because the model “knows” they are both animals.
|
| 135 |
-
|
| 136 |
-
---
|
| 137 |
-
|
| 138 |
-
## 🔍 How the Demo Works
|
| 139 |
-
1. **Embedding step** – Each word is converted into a high-dimensional vector (e.g., 384, 768, or 1024 dimensions depending on the model).
|
| 140 |
-
2. **Dimensionality reduction** – Since humans can’t visualize hundreds of dimensions, the vectors are projected to 2D or 3D using **PCA** (Principal Component Analysis).
|
| 141 |
-
3. **Visualization** – The projected points are plotted, with labels showing the original words.
|
| 142 |
-
You can rotate the 3D view to explore groupings.
|
| 143 |
-
|
| 144 |
-
---
|
| 145 |
-
|
| 146 |
-
## 💡 Typical Applications of Embeddings
|
| 147 |
-
- **Semantic search** – Find relevant results even if exact keywords don’t match
|
| 148 |
-
- **Clustering & topic discovery** – Group related items automatically
|
| 149 |
-
- **Recommendations** – Suggest similar products, movies, or articles
|
| 150 |
-
- **Deduplication** – Detect near-duplicate content
|
| 151 |
-
- **Analogies** – Explore relationships like *"king" – "man" + "woman" ≈ "queen"*
|
| 152 |
-
|
| 153 |
-
---
|
| 154 |
-
|
| 155 |
-
## 🚀 Try it Yourself
|
| 156 |
-
- Pick a dataset or create your own by editing the list
|
| 157 |
-
- Switch models to compare how the embedding space changes
|
| 158 |
-
- Toggle between 2D and 3D to explore patterns
|
| 159 |
-
|
| 160 |
""".strip())
|
| 161 |
if st.button("⬅ back to demo"):
|
| 162 |
goto("demo")
|
|
@@ -184,10 +145,10 @@ with c2:
|
|
| 184 |
st.session_state.model_name = MODELS[chosen_label]
|
| 185 |
|
| 186 |
with c3:
|
| 187 |
-
#
|
| 188 |
radio_kwargs = dict(options=["2D", "3D"], horizontal=True, key="proj_mode")
|
| 189 |
if "proj_mode" not in st.session_state:
|
| 190 |
-
radio_kwargs["index"] = 1 #
|
| 191 |
st.radio("projection", **radio_kwargs)
|
| 192 |
|
| 193 |
with c4:
|
|
@@ -311,4 +272,4 @@ with right:
|
|
| 311 |
)]
|
| 312 |
)
|
| 313 |
|
| 314 |
-
st.plotly_chart(fig, use_container_width=True)
|
|
|
|
| 107 |
page = st.query_params.get("page", "demo")
|
| 108 |
|
| 109 |
if page == "info":
|
| 110 |
+
st.title("ℹ about this demo")
|
| 111 |
st.write("""
|
| 112 |
+
**embeddings** turn words (or longer text) into numerical vectors.
|
| 113 |
+
in this vector space, **semantically related** items end up **near** each other.
|
| 114 |
+
use cases:
|
| 115 |
+
- semantic search & retrieval
|
| 116 |
+
- clustering & topic discovery
|
| 117 |
+
- recommendations & deduplication
|
| 118 |
+
- measuring similarity and analogies
|
| 119 |
+
this demo embeds single words with a selectable model, reduces to 2d/3d with pca,
|
| 120 |
+
and shows how related words appear near each other in the projected space.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
""".strip())
|
| 122 |
if st.button("⬅ back to demo"):
|
| 123 |
goto("demo")
|
|
|
|
| 145 |
st.session_state.model_name = MODELS[chosen_label]
|
| 146 |
|
| 147 |
with c3:
|
| 148 |
+
# Default to 3D on first render; single-click thereafter
|
| 149 |
radio_kwargs = dict(options=["2D", "3D"], horizontal=True, key="proj_mode")
|
| 150 |
if "proj_mode" not in st.session_state:
|
| 151 |
+
radio_kwargs["index"] = 1 # 3D default
|
| 152 |
st.radio("projection", **radio_kwargs)
|
| 153 |
|
| 154 |
with c4:
|
|
|
|
| 272 |
)]
|
| 273 |
)
|
| 274 |
|
| 275 |
+
st.plotly_chart(fig, use_container_width=True)
|