Parametric UMAP: English-German Cross-Lingual Embeddings (768D β 2D)
A Parametric UMAP model that projects 768-dimensional semantic embeddings from google/embeddinggemma-300m into a shared 2D cross-lingual space for English and German.
Architecture
- Input: 768-dimensional embeddings (from embeddinggemma-300m)
- Encoder:
- Dense(768 β 256) + ReLU
- Dense(256 β 128) + ReLU
- Dense(128 β 2) (linear output)
- Output: 2D coordinates
Training Details
- Base embeddings:
google/embeddinggemma-300m - Training data: 200,000 English-German sentence pairs from Helsinki-NLP/tatoeba_mt
- Method: Parametric UMAP with cosine metric
- Framework: TensorFlow 2.14 + umap-learn 0.5.5
- Epochs: 10 full epochs over UMAP graph
- Batch size: 1000 edges
Primary Use Cases
Visualization: Plot bilingual text data in 2D for exploration
Similarity analysis: Find semantically similar texts across languages
Cross-lingual clustering: Group related content in EN/DE
Semantic search: Fast nearest-neighbor search in 2D space
How to Use
Installation
pip install sentence-transformers tensorflow umap-learn numpy
Basic Usage
import numpy as np
from sentence_transformers import SentenceTransformer
from tensorflow import keras
# Load models
embedding_model = SentenceTransformer("google/embeddinggemma-300m")
umap_encoder = keras.models.load_model("path/to/encoder")
# Your sentences
sentences = [
"Hello world",
"Hallo Welt"
]
# Generate 768D embeddings
embeddings_768d = embedding_model.encode(
sentences,
prompt_name="Clustering",
convert_to_numpy=True
)
# Project to 2D
coords_2d = umap_encoder.predict(embeddings_768d)
print(coords_2d)
# [[0.11, 7.84],
# [0.28, 7.70]]
Visualization Example
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 8))
plt.scatter(coords_2d[:, 0], coords_2d[:, 1])
for i, sent in enumerate(sentences):
plt.annotate(sent, (coords_2d[i, 0], coords_2d[i, 1]))
plt.xlabel("UMAP Dimension 1")
plt.ylabel("UMAP Dimension 2")
plt.title("2D Semantic Space")
plt.show()
- Downloads last month
- 12
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for tudi2d/parametric-umap-embeddinggemma-en-de-2d
Base model
google/embeddinggemma-300m