summerstars
/

MARK-Embedding

@@ -1,3 +1,4 @@
 ---
 tags:
 - sentence-transformers
@@ -48,42 +49,35 @@ model-index:
   results:
   - task:
       type: semantic-similarity
-      name: Semantic Similarity
-    dataset:
-      name: Unknown
-      type: unknown
     metrics:
     - type: pearson_cosine
       value: 0.4639747212598005
-      name: Pearson Cosine
     - type: spearman_cosine
       value: 0.4595105448711385
-      name: Spearman Cosine
 ---
 # SentenceTransformer
-This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 256-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
-## Model Details
-### Model Description
-- **Model Type:** Sentence Transformer
-<!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
-- **Maximum Sequence Length:** 2048 tokens
-- **Output Dimensionality:** 256 dimensions
-- **Similarity Function:** Cosine Similarity
-<!-- - **Training Dataset:** Unknown -->
-<!-- - **Language:** Unknown -->
-<!-- - **License:** Unknown -->
-### Model Sources
-- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
-- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
 - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
-### Full Model Architecture
 ```
 SentenceTransformer(
@@ -92,23 +86,23 @@ SentenceTransformer(
 )
 ```
-## Usage
-### Direct Usage (Sentence Transformers)
-First install the Sentence Transformers library:
 ```bash
 pip install -U sentence-transformers
 ```
-Then you can load this model and run inference.
 ```python
 from sentence_transformers import SentenceTransformer
-# Download from the 🤗 Hub
 model = SentenceTransformer("sentence_transformers_model_id")
-# Run inference
 sentences = [
     'Comcast Class A shares were up 8 cents at $30.50 in morning trading on the Nasdaq Stock Market.',
     'The stock rose 48 cents to $30 yesterday in Nasdaq Stock Market trading.',
@@ -118,7 +112,7 @@ embeddings = model.encode(sentences)
 print(embeddings.shape)
 # [3, 256]
-# Get the similarity scores for the embeddings
 similarities = model.similarity(embeddings, embeddings)
 print(similarities)
 # tensor([[1.0000, 0.5752, 0.2980],
@@ -126,91 +120,55 @@ print(similarities)
 #         [0.2980, 0.2161, 1.0000]])
 ```
-<!--
-### Direct Usage (Transformers)
-<details><summary>Click to see the direct usage in Transformers</summary>
-</details>
--->
-<!--
-### Downstream Usage (Sentence Transformers)
-You can finetune this model on your own dataset.
-<details><summary>Click to expand</summary>
-</details>
--->
-<!--
-### Out-of-Scope Use
-*List how the model may foreseeably be misused and address what users ought not to do with the model.*
--->
-## Evaluation
-### Metrics
-#### Semantic Similarity
-* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
-| Metric              | Value      |
 |:--------------------|:-----------|
 | pearson_cosine      | 0.464      |
 | **spearman_cosine** | **0.4595** |
-<!--
-## Bias, Risks and Limitations
-*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
--->
-<!--
-### Recommendations
-*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
--->
-## Training Details
-### Training Dataset
-#### Unnamed Dataset
-* Size: 5,749 training samples
-* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
-* Approximate statistics based on the first 1000 samples:
   |         | sentence_0                                                                        | sentence_1                                                                        | label                                                          |
   |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
-  | type    | string                                                                            | string                                                                            | float                                                          |
-  | details | <ul><li>min: 6 tokens</li><li>mean: 14.76 tokens</li><li>max: 55 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 14.73 tokens</li><li>max: 57 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.55</li><li>max: 1.0</li></ul> |
-* Samples:
   | sentence_0                                                                  | sentence_1                                                                           | label                           |
   |:----------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:--------------------------------|
-  | <code>Forecasters said warnings might go up for Cuba later Thursday.</code> | <code>Watches or warnings could be issued for eastern Cuba later on Thursday.</code> | <code>0.8</code>                |
-  | <code>Death toll in Lebanon bombings rises to 47</code>                     | <code>1 suspect arrested after Lebanon car bombings kill 45</code>                   | <code>0.5599999904632569</code> |
-  | <code>Three dogs running on a racetrack.</code>                             | <code>Three dogs round a bend at a racetrack.</code>                                 | <code>0.9600000381469727</code> |
-* Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
   ```json
   {
       "loss_fct": "torch.nn.modules.loss.MSELoss"
   }
   ```
-### Training Hyperparameters
-#### Non-Default Hyperparameters
 - `eval_strategy`: steps
 - `per_device_train_batch_size`: 16
 - `per_device_eval_batch_size`: 16
 - `multi_dataset_batch_sampler`: round_robin
-#### All Hyperparameters
-<details><summary>Click to expand</summary>
 - `overwrite_output_dir`: False
 - `do_predict`: False
@@ -333,8 +291,8 @@ You can finetune this model on your own dataset.
 </details>
-### Training Logs
-| Epoch  | Step | Training Loss | spearman_cosine |
 |:------:|:----:|:-------------:|:---------------:|
 | 1.0    | 360  | -             | 0.2967          |
 | 1.3889 | 500  | 0.11          | 0.3338          |
@@ -343,7 +301,7 @@ You can finetune this model on your own dataset.
 | 3.0    | 1080 | -             | 0.4595          |
-### Framework Versions
 - Python: 3.12.11
 - Sentence Transformers: 5.1.0
 - Transformers: 4.56.1
@@ -352,7 +310,7 @@ You can finetune this model on your own dataset.
 - Datasets: 4.0.0
 - Tokenizers: 0.22.0
-## Citation
 ### BibTeX
@@ -367,22 +325,4 @@ You can finetune this model on your own dataset.
     publisher = "Association for Computational Linguistics",
     url = "https://arxiv.org/abs/1908.10084",
 }
-```
-<!--
-## Glossary
-*Clearly define terms in order to be accessible across audiences.*
--->
-<!--
-## Model Card Authors
-*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
--->
-<!--
-## Model Card Contact
-*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
--->

 ---
 tags:
 - sentence-transformers
   results:
   - task:
       type: semantic-similarity
+      name: 意味的類似性 (Semantic Similarity)
     metrics:
     - type: pearson_cosine
       value: 0.4639747212598005
+      name: ピアソン相関係数 (コサイン類似度)
     - type: spearman_cosine
       value: 0.4595105448711385
+      name: スピアマン相関係数 (コサイン類似度)
 ---
 # SentenceTransformer
+これは、訓練済みの[sentence-transformers](https://www.SBERT.net)モデルです。このモデルは、文と段落を256次元の密なベクトル空間にマッピングし、意味的テキスト類似性、意味検索、言い換えマイニング、テキスト分類、クラスタリングなどに使用できます。
+## モデル詳細
+### モデルの説明
+- **モデルタイプ:** Sentence Transformer
+- **最大シーケンス長:** 2048トークン
+- **出力次元数:** 256次元
+- **類似度関数:** コサイン類似度
+### モデルのソース
+- **ドキュメント:** [Sentence Transformers Documentation](https://sbert.net)
+- **リポジトリ:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
 - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### 完全なモデルアーキテクチャ
 ```
 SentenceTransformer(
 )
 ```
+## 使用方法
+### 直接使用 (Sentence Transformers)
+まず、Sentence Transformersライブラリをインストールします:
 ```bash
 pip install -U sentence-transformers
 ```
+次に、このモデルをロードして推論を実行できます。
 ```python
 from sentence_transformers import SentenceTransformer
+# 🤗 Hubからダウンロード
 model = SentenceTransformer("sentence_transformers_model_id")
+# 推論を実行
 sentences = [
     'Comcast Class A shares were up 8 cents at $30.50 in morning trading on the Nasdaq Stock Market.',
     'The stock rose 48 cents to $30 yesterday in Nasdaq Stock Market trading.',
 print(embeddings.shape)
 # [3, 256]
+# 埋め込みベクトルの類似度スコアを取得
 similarities = model.similarity(embeddings, embeddings)
 print(similarities)
 # tensor([[1.0000, 0.5752, 0.2980],
 #         [0.2980, 0.2161, 1.0000]])
 ```
+## 評価
+### メトリクス
+#### 意味的類似性
+* [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)で評価
+| メトリクス          | 値         |
 |:--------------------|:-----------|
 | pearson_cosine      | 0.464      |
 | **spearman_cosine** | **0.4595** |
+## 訓練詳細
+### 訓練データセット
+#### 名称未設定のデータセット
+* サイズ: 5,749 訓練サンプル
+* カラム: `sentence_0`, `sentence_1`, `label`
+* 最初の1000サンプルに基づくおおよその統計:
   |         | sentence_0                                                                        | sentence_1                                                                        | label                                                          |
   |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
+  | 型      | string                                                                            | string                                                                            | float                                                          |
+  | 詳細    | <ul><li>最小: 6 トークン</li><li>平均: 14.76 トークン</li><li>最大: 55 トークン</li></ul> | <ul><li>最小: 6 トークン</li><li>平均: 14.73 トークン</li><li>最大: 57 トークン</li></ul> | <ul><li>最小: 0.0</li><li>平均: 0.55</li><li>最大: 1.0</li></ul> |
+* サンプル:
   | sentence_0                                                                  | sentence_1                                                                           | label                           |
   |:----------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:--------------------------------|
+  | `Forecasters said warnings might go up for Cuba later Thursday.`            | `Watches or warnings could be issued for eastern Cuba later on Thursday.`            | `0.8`                           |
+  | `Death toll in Lebanon bombings rises to 47`                                | `1 suspect arrested after Lebanon car bombings kill 45`                              | `0.5599999904632569`            |
+  | `Three dogs running on a racetrack.`                                        | `Three dogs round a bend at a racetrack.`                                            | `0.9600000381469727`            |
+* 損失関数: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) 以下のパラメータを使用:
   ```json
   {
       "loss_fct": "torch.nn.modules.loss.MSELoss"
   }
   ```
+### 訓練ハイパーパラメータ
+#### デフォルト以外のハイパーパラメータ
 - `eval_strategy`: steps
 - `per_device_train_batch_size`: 16
 - `per_device_eval_batch_size`: 16
 - `multi_dataset_batch_sampler`: round_robin
+#### すべてのハイパーパラメータ
+<details><summary>クリックして展開</summary>
 - `overwrite_output_dir`: False
 - `do_predict`: False
 </details>
+### 訓練ログ
+| エポック | ステップ | 訓練損失 | spearman_cosine |
 |:------:|:----:|:-------------:|:---------------:|
 | 1.0    | 360  | -             | 0.2967          |
 | 1.3889 | 500  | 0.11          | 0.3338          |
 | 3.0    | 1080 | -             | 0.4595          |
+### フレームワークのバージョン
 - Python: 3.12.11
 - Sentence Transformers: 5.1.0
 - Transformers: 4.56.1
 - Datasets: 4.0.0
 - Tokenizers: 0.22.0
+## 引用
 ### BibTeX
     publisher = "Association for Computational Linguistics",
     url = "https://arxiv.org/abs/1908.10084",
 }
+```