Spaces:
Runtime error
Runtime error
Trent
commited on
Commit
·
113ad6b
1
Parent(s):
6a49bc1
Contributions
Browse files
app.py
CHANGED
|
@@ -18,10 +18,15 @@ Hi! This is the demo for the [flax sentence embeddings](https://huggingface.co/f
|
|
| 18 |
We trained three general-purpose flax-sentence-embeddings models: a **distilroberta base**, a **mpnet base** and a **minilm-l6**.
|
| 19 |
The models were trained on a dataset comprising of [1 Billion+ training corpus](https://huggingface.co/flax-sentence-embeddings/all_datasets_v4_MiniLM-L6#training-data) with the v3 setup.
|
| 20 |
|
| 21 |
-
In addition, we trained [20 models](https://huggingface.co/flax-sentence-embeddings) focused on general-purpose, QuestionAnswering and Code search.
|
| 22 |
We also uploaded [8 datasets](https://huggingface.co/flax-sentence-embeddings) specialized for Question Answering, Sentence-Similiarity and Gender Evaluation.
|
| 23 |
You can view our models and datasets [here](https://huggingface.co/flax-sentence-embeddings).
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
## Contributions
|
| 26 |
|
| 27 |
- 20 performant Sentence Embedding models that can be utilized for Sentence Simliarity / Asymmetric QA / Search & Clustering.
|
|
|
|
| 18 |
We trained three general-purpose flax-sentence-embeddings models: a **distilroberta base**, a **mpnet base** and a **minilm-l6**.
|
| 19 |
The models were trained on a dataset comprising of [1 Billion+ training corpus](https://huggingface.co/flax-sentence-embeddings/all_datasets_v4_MiniLM-L6#training-data) with the v3 setup.
|
| 20 |
|
| 21 |
+
In addition, we trained [20 models](https://huggingface.co/flax-sentence-embeddings) focused on general-purpose, QuestionAnswering and Code search and achieved SOTA on multiple benchmarks.
|
| 22 |
We also uploaded [8 datasets](https://huggingface.co/flax-sentence-embeddings) specialized for Question Answering, Sentence-Similiarity and Gender Evaluation.
|
| 23 |
You can view our models and datasets [here](https://huggingface.co/flax-sentence-embeddings).
|
| 24 |
|
| 25 |
+
| Model | [FullEvaluation](https://docs.google.com/spreadsheets/d/1vXJrIg38cEaKjOG5y4I4PQwAQFUmCkohbViJ9zj_Emg/edit#gid=1809754143) Average | 20Newsgroups Clustering | StackOverflow DupQuestions | Twitter SemEval2015 |
|
| 26 |
+
|-----------|---------------------------------------|-------|-------|-------|
|
| 27 |
+
| paraphrase-mpnet-base-v2 (previous SOTA) | 67.97 | 47.79 | 49.03 | 72.36 |
|
| 28 |
+
| all_datasets_v3_roberta-large (400k steps) | **70.22** | 50.12 | 52.18 | 75.28 |
|
| 29 |
+
|
| 30 |
## Contributions
|
| 31 |
|
| 32 |
- 20 performant Sentence Embedding models that can be utilized for Sentence Simliarity / Asymmetric QA / Search & Clustering.
|