Feature Extraction
sentence-transformers
Safetensors
xlm-roberta
datadreamer
datadreamer-0.35.0
Synthetic
sentence-similarity
text-embeddings-inference
Instructions to use StyleDistance/mstyledistance with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use StyleDistance/mstyledistance with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("StyleDistance/mstyledistance") sentences = [ "彼は技術的な複雑さと格闘し、彼の作品は驚くべき視覚的緊張を生み出した。", "Serviste mariscos frescos en el condado de Middlesex y áreas circundantes.", "Él sirvió mariscos frescos en el condado de Middlesex y áreas circundantes." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -31,7 +31,13 @@ widget:
|
|
| 31 |
---
|
| 32 |
# Model Card
|
| 33 |
|
| 34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
## Example Usage
|
| 37 |
|
|
@@ -39,12 +45,19 @@ widget:
|
|
| 39 |
from sentence_transformers import SentenceTransformer
|
| 40 |
from sentence_transformers.util import cos_sim
|
| 41 |
|
| 42 |
-
model = SentenceTransformer('StyleDistance/
|
| 43 |
|
| 44 |
-
input = model.encode('
|
| 45 |
-
others = model.encode(['
|
| 46 |
print(cos_sim(input, others))
|
| 47 |
```
|
| 48 |
|
| 49 |
---
|
| 50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
---
|
| 32 |
# Model Card
|
| 33 |
|
| 34 |
+
mStyleDistance is a **multilingual style embedding model** that aims to embed texts with similar writing styles closely and different styles far apart, regardless of content and regardless of language. You may find this model useful for stylistic analysis of multilingual text, clustering, authorship identfication and verification tasks, and automatic style transfer evaluation.
|
| 35 |
+
|
| 36 |
+
This model is an multilingual version of the English-only [StyleDistance](https://huggingface.co/StyleDistance/styledistance) model.
|
| 37 |
+
|
| 38 |
+
## Training Data and Variants of StyleDistance
|
| 39 |
+
|
| 40 |
+
mStyleDistance was contrastively trained on [mSynthSTEL](https://huggingface.co/datasets/StyleDistance/msynthstel), a synthetically generated dataset of positive and negative examples of ~40 style features being used in text in 9 non-English languages. By utilizing this synthetic dataset, mStyleDistance is able to achieve stronger content-independence than other style embedding models currently available and is able to operate on multilingual text.
|
| 41 |
|
| 42 |
## Example Usage
|
| 43 |
|
|
|
|
| 45 |
from sentence_transformers import SentenceTransformer
|
| 46 |
from sentence_transformers.util import cos_sim
|
| 47 |
|
| 48 |
+
model = SentenceTransformer('StyleDistance/styledistance') # Load model
|
| 49 |
|
| 50 |
+
input = model.encode("Did you hear about the Wales wing? He'll h8 2 withdraw due 2 injuries from future competitions.")
|
| 51 |
+
others = model.encode(["We're raising funds 2 improve our school's storage facilities and add new playground equipment!", "Did you hear about the Wales wing? He'll hate to withdraw due to injuries from future competitions."])
|
| 52 |
print(cos_sim(input, others))
|
| 53 |
```
|
| 54 |
|
| 55 |
---
|
| 56 |
+
## Trained with DataDreamer
|
| 57 |
+
|
| 58 |
+
This model was trained with a synthetic dataset with [DataDreamer 🤖💤](https://datadreamer.dev). The synthetic dataset card and model card can be found [here](datadreamer.json). The training arguments can be found [here](training_args.json).
|
| 59 |
+
|
| 60 |
+
---
|
| 61 |
+
#### Funding Acknowledgements
|
| 62 |
+
|
| 63 |
+
<small> This research is supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via the HIATUS Program contract #2022-22072200005. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein. </small>
|