FronyAI commited on
Commit
10ef023
·
verified ·
1 Parent(s): 2b8c73c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -19
README.md CHANGED
@@ -11,37 +11,48 @@ pipeline_tag: sentence-similarity
11
  library_name: sentence-transformers
12
  ---
13
 
14
- # FronyAI/frony-embed-tiny-ko-v1
15
-
16
- This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for Retrieval.
17
 
18
  ## Model Details
19
 
20
  ### Model Description
21
  - **Model Type:** Sentence Transformer
 
22
  <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
23
  - **Maximum Sequence Length:** 512 tokens
24
- - **Output Dimensionality:** 384 dimensions
25
  - **Similarity Function:** Cosine Similarity
26
  <!-- - **Training Dataset:** Unknown -->
27
  - **Languages:** ko, en
28
  - **License:** apache-2.0
29
 
30
- ### Model Sources
31
-
32
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
33
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
34
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
35
-
36
- ### Full Model Architecture
37
-
38
- ```
39
- SentenceTransformer(
40
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
41
- (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
42
- (2): Normalize()
43
- )
44
- ```
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ## Usage
47
 
 
11
  library_name: sentence-transformers
12
  ---
13
 
14
+ # FronyAI Embedding (tiny)
 
 
15
 
16
  ## Model Details
17
 
18
  ### Model Description
19
  - **Model Type:** Sentence Transformer
20
+ - **Base Model:** microsoft/Multilingual-MiniLM-L12-H384
21
  <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
22
  - **Maximum Sequence Length:** 512 tokens
23
+ - **Output Dimensionality:** 384 / 192 dimensions
24
  - **Similarity Function:** Cosine Similarity
25
  <!-- - **Training Dataset:** Unknown -->
26
  - **Languages:** ko, en
27
  - **License:** apache-2.0
28
 
29
+ ### Datasets
30
+ This model is trained from many sources data including **AI 허브**.
31
+ Total trained query and document pair is 100,000.
32
+
33
+ ### Evaluation
34
+ The evaluation consists of five dataset groups, and the results in the table represent the average retrieval performance across these five groups.
35
+ Three groups are subsets extracted from **AI 허브** datasets.
36
+ One group is based on a specific sports regulation PDF, for which synthetic query and **markdown-style passage** pairs were generated using GPT-4o-mini.
37
+ The final group is a concatenation of all four aforementioned groups, providing a comprehensive mixed set.
38
+ The following table presents the average retrieval performance across five dataset groups.
39
+
40
+ | Models | Open/Closed | Size | Accuracy@1 | Accuracy@3 | Accuracy@5 | Accuracy@10 |
41
+ |--------------|-----------|-----------|-----------|------------|------------|-------------|
42
+ | frony-embed-medium | Open | 337M | 0.6649 | 0.8040 | 0.8458 | 0.8876 |
43
+ | frony-embed-medium (half dim) | Open | 337M | 0.6520 | 0.7923 | 0.8361 | 0.8796 |
44
+ | frony-embed-small | Open | 111M | 0.6152 | 0.7616 | 0.8056 | 0.8559 |
45
+ | frony-embed-small (half dim) | Open | 111M | 0.5988 | 0.7478 | 0.7984 | 0.8461 |
46
+ | frony-embed-tiny | **Open** | 0.5084 | **0.6757** | 0.7278 | 0.7845 |
47
+ | frony-embed-tiny (half dim) | Open | 0.4710 | 0.6390 | 0.6933 | 0.7596 |
48
+ | bge-m3 | **Open** | 0.5852 | **0.7763** | 0.8418 | 0.8987 |
49
+ | multilingual-e5-large | Open | 0.5764 | 0.7630 | 0.8267 | 0.8891 |
50
+ | snowflake-arctic-embed-l-v2.0 | Open | 0.5726 | 0.7591 | 0.8232 | 0.8917 |
51
+ | jina-embeddings-v3 | Open | 0.5270 | 0.7246 | 0.7953 | 0.8649 |
52
+ | upstage-large | **Closed** | 0.6334 | **0.8527** | 0.9065 | 0.9478 |
53
+ | openai-text-embedding-3-large | Closed | 0.4907 | 0.6617 | 0.7311 | 0.8148 |
54
+
55
+ ## Training
56
 
57
  ## Usage
58