FronyAI
/

frony-embed-tiny-ko-v1

@@ -14,16 +14,16 @@ base_model:
 ---
 # FronyAI Embedding (tiny)
-This is a lightweight and efficient embedding model designed specifically for the Korean language.<br>
-It has been trained on a diverse set of data sources, including AI 허브, to ensure robust performance in a wide range of retrieval tasks.<br>
 The model demonstrates strong retrieval capabilities across:<br>
 * Korean–Korean
 * Korean–English
 * English–Korean
-To support resource-constrained environments, the model also provides compatibility with Matryoshka Embeddings, enabling retrieval even at reduced dimensions **(e.g., half of the original size)** without significant performance loss.<br>
-All training and data preprocessing were performed on **a single GPU (46VRAM)**, showcasing not only the model’s effectiveness but also its efficiency.<br>
 ## Model Details
@@ -42,13 +42,13 @@ Total trained query and document pair is 100,000.<br>
 ### Training Details
 The overall training process was conducted with reference to **snowflake-arctic-2.0**.<br>
-Training was divided into two stages: Pre-training and Post-training.<br>
 * In the pre-training stage, the model was trained using in-batch negatives.
 * In the post-training stage, we utilized the multilingual-e5-large model to identify hard negatives—specifically, the top 4 samples with a similarity score below a **99% threshold**.
 Given the increasing prevalence of LLM-generated content, we also converted existing data into Markdown-style passages to improve retrieval performance on such formats.<br>
-The types of data augmentation applied are as follows:<br>
 | Augmentation* | Description |
 -----------|-----------|
 | Pair concatenation | Multi-query & Multi-passage |
@@ -57,11 +57,11 @@ The types of data augmentation applied are as follows:<br>
 **Augmentation was carried out using the Gemma-3-12B*
 ### Evaluation
-The evaluation consists of five dataset groups, and the results in the table represent the average retrieval performance across these five groups.<br>
-Three groups are subsets extracted from AI 허브 datasets.<br>
-One group is based on a specific sports regulation PDF, for which synthetic query and **markdown-style passage** pairs were generated using GPT-4o-mini.<br>
 The final group is a concatenation of all four aforementioned groups, providing a comprehensive mixed set.<br>
-The following table presents the average retrieval performance across five dataset groups.<br>
 | Models | Open/Closed | Size | Accuracy@1 | Accuracy@3 | Accuracy@5 | Accuracy@10 |
 |--------------|-----------|-----------|-----------|------------|------------|-------------|

 ---
 # FronyAI Embedding (tiny)
+This is a lightweight and efficient embedding model designed specifically for the Korean language.
+It has been trained on a diverse set of data sources, including AI 허브, to ensure robust performance in a wide range of retrieval tasks.
 The model demonstrates strong retrieval capabilities across:<br>
 * Korean–Korean
 * Korean–English
 * English–Korean
+To support resource-constrained environments, the model also provides compatibility with Matryoshka Embeddings, enabling retrieval even at reduced dimensions **(e.g., half of the original size)** without significant performance loss.
+All training and data preprocessing were performed on **a single GPU (46VRAM)**, showcasing not only the model’s effectiveness but also its efficiency.
 ## Model Details
 ### Training Details
 The overall training process was conducted with reference to **snowflake-arctic-2.0**.<br>
+Training was divided into two stages: Pre-training and Post-training.
 * In the pre-training stage, the model was trained using in-batch negatives.
 * In the post-training stage, we utilized the multilingual-e5-large model to identify hard negatives—specifically, the top 4 samples with a similarity score below a **99% threshold**.
 Given the increasing prevalence of LLM-generated content, we also converted existing data into Markdown-style passages to improve retrieval performance on such formats.<br>
+The types of data augmentation applied are as follows:
 | Augmentation* | Description |
 -----------|-----------|
 | Pair concatenation | Multi-query & Multi-passage |
 **Augmentation was carried out using the Gemma-3-12B*
 ### Evaluation
+The evaluation consists of five dataset groups, and the results in the table represent the average retrieval performance across these five groups.
+Three groups are subsets extracted from AI 허브 datasets.
+One group is based on a specific sports regulation PDF, for which synthetic query and **markdown-style passage** pairs were generated using GPT-4o-mini.
 The final group is a concatenation of all four aforementioned groups, providing a comprehensive mixed set.<br>
+The following table presents the average retrieval performance across five dataset groups.
 | Models | Open/Closed | Size | Accuracy@1 | Accuracy@3 | Accuracy@5 | Accuracy@10 |
 |--------------|-----------|-----------|-----------|------------|------------|-------------|