Update README.md
Browse files
README.md
CHANGED
|
@@ -11,37 +11,48 @@ pipeline_tag: sentence-similarity
|
|
| 11 |
library_name: sentence-transformers
|
| 12 |
---
|
| 13 |
|
| 14 |
-
# FronyAI
|
| 15 |
-
|
| 16 |
-
This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for Retrieval.
|
| 17 |
|
| 18 |
## Model Details
|
| 19 |
|
| 20 |
### Model Description
|
| 21 |
- **Model Type:** Sentence Transformer
|
|
|
|
| 22 |
<!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
|
| 23 |
- **Maximum Sequence Length:** 512 tokens
|
| 24 |
-
- **Output Dimensionality:** 384 dimensions
|
| 25 |
- **Similarity Function:** Cosine Similarity
|
| 26 |
<!-- - **Training Dataset:** Unknown -->
|
| 27 |
- **Languages:** ko, en
|
| 28 |
- **License:** apache-2.0
|
| 29 |
|
| 30 |
-
###
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
## Usage
|
| 47 |
|
|
|
|
| 11 |
library_name: sentence-transformers
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# FronyAI Embedding (tiny)
|
|
|
|
|
|
|
| 15 |
|
| 16 |
## Model Details
|
| 17 |
|
| 18 |
### Model Description
|
| 19 |
- **Model Type:** Sentence Transformer
|
| 20 |
+
- **Base Model:** microsoft/Multilingual-MiniLM-L12-H384
|
| 21 |
<!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
|
| 22 |
- **Maximum Sequence Length:** 512 tokens
|
| 23 |
+
- **Output Dimensionality:** 384 / 192 dimensions
|
| 24 |
- **Similarity Function:** Cosine Similarity
|
| 25 |
<!-- - **Training Dataset:** Unknown -->
|
| 26 |
- **Languages:** ko, en
|
| 27 |
- **License:** apache-2.0
|
| 28 |
|
| 29 |
+
### Datasets
|
| 30 |
+
This model is trained from many sources data including **AI 허브**.
|
| 31 |
+
Total trained query and document pair is 100,000.
|
| 32 |
+
|
| 33 |
+
### Evaluation
|
| 34 |
+
The evaluation consists of five dataset groups, and the results in the table represent the average retrieval performance across these five groups.
|
| 35 |
+
Three groups are subsets extracted from **AI 허브** datasets.
|
| 36 |
+
One group is based on a specific sports regulation PDF, for which synthetic query and **markdown-style passage** pairs were generated using GPT-4o-mini.
|
| 37 |
+
The final group is a concatenation of all four aforementioned groups, providing a comprehensive mixed set.
|
| 38 |
+
The following table presents the average retrieval performance across five dataset groups.
|
| 39 |
+
|
| 40 |
+
| Models | Open/Closed | Size | Accuracy@1 | Accuracy@3 | Accuracy@5 | Accuracy@10 |
|
| 41 |
+
|--------------|-----------|-----------|-----------|------------|------------|-------------|
|
| 42 |
+
| frony-embed-medium | Open | 337M | 0.6649 | 0.8040 | 0.8458 | 0.8876 |
|
| 43 |
+
| frony-embed-medium (half dim) | Open | 337M | 0.6520 | 0.7923 | 0.8361 | 0.8796 |
|
| 44 |
+
| frony-embed-small | Open | 111M | 0.6152 | 0.7616 | 0.8056 | 0.8559 |
|
| 45 |
+
| frony-embed-small (half dim) | Open | 111M | 0.5988 | 0.7478 | 0.7984 | 0.8461 |
|
| 46 |
+
| frony-embed-tiny | **Open** | 0.5084 | **0.6757** | 0.7278 | 0.7845 |
|
| 47 |
+
| frony-embed-tiny (half dim) | Open | 0.4710 | 0.6390 | 0.6933 | 0.7596 |
|
| 48 |
+
| bge-m3 | **Open** | 0.5852 | **0.7763** | 0.8418 | 0.8987 |
|
| 49 |
+
| multilingual-e5-large | Open | 0.5764 | 0.7630 | 0.8267 | 0.8891 |
|
| 50 |
+
| snowflake-arctic-embed-l-v2.0 | Open | 0.5726 | 0.7591 | 0.8232 | 0.8917 |
|
| 51 |
+
| jina-embeddings-v3 | Open | 0.5270 | 0.7246 | 0.7953 | 0.8649 |
|
| 52 |
+
| upstage-large | **Closed** | 0.6334 | **0.8527** | 0.9065 | 0.9478 |
|
| 53 |
+
| openai-text-embedding-3-large | Closed | 0.4907 | 0.6617 | 0.7311 | 0.8148 |
|
| 54 |
+
|
| 55 |
+
## Training
|
| 56 |
|
| 57 |
## Usage
|
| 58 |
|