Spaces:
Running
Running
ping98k
Refactor README.md to enhance feature descriptions and improve usage instructions for clarity and consistency.
88218fc
| title: Embedding Playground | |
| emoji: ๐ | |
| colorFrom: blue | |
| colorTo: yellow | |
| sdk: static | |
| pinned: false | |
| license: apache-2.0 | |
| short_description: 'Exploring text embeddings and group similarity' | |
| models: | |
| - onnx-community/Qwen3-0.6B-ONNX | |
| - onnx-community/Qwen3-Embedding-0.6B-ONNX | |
| # Embedding WebGPU Playground | |
| This is a browser-based playground for exploring text embeddings, group similarity, and clustering using WebGPU and ONNX models. | |
| ## Features | |
| - **Text search**: Use your browser's search (Ctrl+F) to quickly find and highlight text within the textarea or results. | |
| - **Text input**: Enter text in the textarea. Use single newlines (`\n`) to separate lines within a group, and triple newlines (`\n\n\n`) to separate groups. | |
| - **Group similarity heatmap**: Click **Show Similarity Heatmap** to compute and visualize cosine similarity between group embeddings as a heatmap. | |
| - **Search cluster reordering**: If a group header contains the word `search`, you can control how other groups and lines are ordered relative to the search group using the **Search Cluster Sort Mode** dropdown: | |
| - **By Group Similarity**: Orders groups by similarity to the search group, and lines within each group by similarity to the search group embedding. | |
| - **By Max Search Line**: Orders lines within each group by their maximum similarity to any line in the search group. | |
| - **K-Means & Balanced K-Means clustering**: Set the number of clusters and clustering type, then click **Clustering** to group all lines into clusters. The textarea is updated to reflect the new clusters. | |
| - **UMAP scatter plot**: Click **Cluster Plot** to visualize clusters in 2D using UMAP. Cluster names are shown in the legend. | |
| - **Cluster naming**: Click **Naming Cluster** to generate descriptive names for each cluster using a text generation model. Names are updated in both the textarea and the scatter plot legend. | |
| - **Progress bar**: All major actions display a progress bar during processing. | |
| ## Tech stack | |
| - [@huggingface/transformers](https://www.npmjs.com/package/@huggingface/transformers) (ESM, WebGPU) | |
| - [ONNX Qwen3-Embedding-0.6B-ONNX](https://huggingface.co/onnx-community/Qwen3-Embedding-0.6B-ONNX) | |
| - [Plotly.js](https://plotly.com/javascript/) (UMD) | |
| - [umap-js](https://github.com/PAIR-code/umap-js) (for 2D projection) | |
| ## Usage | |
| 1. Enter or paste your text in the textarea. | |
| 2. Separate groups with triple newlines if you want to compare group similarity. | |
| 3. (Optional) Use the **Search Cluster Sort Mode** dropdown to control how the search cluster reorders groups/lines. | |
| 4. Click **Show Similarity Heatmap** to compute and visualize group similarities. | |
| 5. To cluster all lines, set the number of clusters and click **Clustering**. The textarea and heatmap will update to reflect the new clusters. | |
| 6. Click **Cluster Plot** to visualize clusters in 2D. | |
| 7. Click **Naming Cluster** to generate descriptive names for each cluster. | |
| --- |