FronyAI
/

frony-embed-tiny-ko-v1

@@ -11,37 +11,48 @@ pipeline_tag: sentence-similarity
 library_name: sentence-transformers
 ---
-# FronyAI/frony-embed-tiny-ko-v1
-This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for Retrieval.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
 <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
 - **Maximum Sequence Length:** 512 tokens
-- **Output Dimensionality:** 384 dimensions
 - **Similarity Function:** Cosine Similarity
 <!-- - **Training Dataset:** Unknown -->
 - **Languages:** ko, en
 - **License:** apache-2.0
-### Model Sources
-- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
-- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
-- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
-### Full Model Architecture
-```
-SentenceTransformer(
-  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
-  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
-  (2): Normalize()
-)
-```
 ## Usage

 library_name: sentence-transformers
 ---
+# FronyAI Embedding (tiny)
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
+- **Base Model:** microsoft/Multilingual-MiniLM-L12-H384
 <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
 - **Maximum Sequence Length:** 512 tokens
+- **Output Dimensionality:** 384 / 192 dimensions
 - **Similarity Function:** Cosine Similarity
 <!-- - **Training Dataset:** Unknown -->
 - **Languages:** ko, en
 - **License:** apache-2.0
+### Datasets
+This model is trained from many sources data including **AI 허브**.
+Total trained query and document pair is 100,000.
+### Evaluation
+The evaluation consists of five dataset groups, and the results in the table represent the average retrieval performance across these five groups.
+Three groups are subsets extracted from **AI 허브** datasets.
+One group is based on a specific sports regulation PDF, for which synthetic query and **markdown-style passage** pairs were generated using GPT-4o-mini.
+The final group is a concatenation of all four aforementioned groups, providing a comprehensive mixed set.
+The following table presents the average retrieval performance across five dataset groups.
+| Models | Open/Closed | Size | Accuracy@1 | Accuracy@3 | Accuracy@5 | Accuracy@10 |
+|--------------|-----------|-----------|-----------|------------|------------|-------------|
+| frony-embed-medium | Open | 337M | 0.6649 | 0.8040 | 0.8458 | 0.8876 |
+| frony-embed-medium (half dim) | Open | 337M | 0.6520 | 0.7923 | 0.8361 | 0.8796 |
+| frony-embed-small | Open | 111M | 0.6152 | 0.7616 | 0.8056 | 0.8559 |
+| frony-embed-small (half dim) | Open | 111M | 0.5988 | 0.7478 | 0.7984 | 0.8461 |
+| frony-embed-tiny | **Open** | 0.5084 | **0.6757** | 0.7278 | 0.7845 |
+| frony-embed-tiny (half dim) | Open | 0.4710 | 0.6390 | 0.6933 | 0.7596 |
+| bge-m3 | **Open** | 0.5852 | **0.7763** | 0.8418 | 0.8987 |
+| multilingual-e5-large | Open | 0.5764 | 0.7630 | 0.8267 | 0.8891 |
+| snowflake-arctic-embed-l-v2.0 | Open | 0.5726 | 0.7591 | 0.8232 | 0.8917 |
+| jina-embeddings-v3 | Open | 0.5270 | 0.7246 | 0.7953 | 0.8649 |
+| upstage-large | **Closed** | 0.6334 | **0.8527** | 0.9065 | 0.9478 |
+| openai-text-embedding-3-large | Closed | 0.4907 | 0.6617 | 0.7311 | 0.8148 |
+## Training
 ## Usage