Shuu12121
/

CodeSearch-ModernBERT-Owl

@@ -36,88 +36,102 @@ language:
-## SentenceTransformer based on Shuu12121/CodeModernBERT-Owl
 このモデルは、[Shuu12121/CodeModernBERT-Owl](https://huggingface.co/Shuu12121/CodeModernBERT-Owl) をベースにファインチューニングされた [sentence-transformers](https://www.SBERT.net) モデルです。
 **特にコードサーチに特化しており、コード片やドキュメントから効果的に意味的類似性を計算できる** ように設計されています。
 ---
-### モデル評価
-#### CoIRにおける評価結果
-本モデルは、150M程度と比較的小さいモデルながら**コードサーチタスクにおける評価指標である CodeSearchNet で 76.89** を達成しました。
 他のタスクには対応していないため、評価値は提供されていません。
 CodeSearchNetタスクにおける評価値としては、他の有名なモデルと比較しても高いパフォーマンスを示しています。
-| モデル名                                     | CodeSearchNet 評価値 |
-|----------------------------------------------|-----------------------|
-| **Shuu12121/CodeModernBERT-Owl**               | **76.89**              |
-| Salesforce/SFR-Embedding-Code-2B_R              | 73.5                   |
-| CodeSage-large-v2                              | 94.26                   |
-| Salesforce/SFR-Embedding-Code-400M_R             | 72.53                   |
-| CodeSage-large                                 | 90.58                   |
-| Voyage-Code-002                                | 81.79                   |
-| E5-Mistral                                     | 54.25                   |
-| E5-Base-v2                                     | 67.99                   |
-| OpenAI-Ada-002                                 | 74.21                   |
-| BGE-Base-en-v1.5                               | 69.6                    |
-| BGE-M3                                         | 43.23                   |
-| UniXcoder                                      | 60.2                    |
-| GTE-Base-en-v1.5                               | 43.35                   |
-| Contriever                                     | 34.72                   |
 ---
-### モデル詳細
-- **モデルタイプ:** Sentence Transformer
-- **ベースモデル:** [Shuu12121/CodeModernBERT-Owl](https://huggingface.co/Shuu12121/CodeModernBERT-Owl)
-- **最大シーケンス長:** 2048トークン
-- **出力次元:** 768次元
-- **類似度関数:** コサイン類似度
-- **ライセンス:** Apache-2.0
 ---
-### 使用方法
-#### Sentence Transformers ライブラリのインストール
-以下のコマンドで Sentence Transformers をインストールします。
 ```bash
 pip install -U sentence-transformers
 ```
-#### モデルのロードと推論
 ```python
 from sentence_transformers import SentenceTransformer
-# モデルをダウンロードしてロード
 model = SentenceTransformer("Shuu12121/CodeSearch-ModernBERT-Owl")
-# 推論用の文リスト
 sentences = [
     'Encrypts the zip file',
     'def freeze_encrypt(dest_dir, zip_filename, config, opt):\n    \n    pgp_keys = grok_keys(config)\n    icefile_prefix = "aomi-%s" % \\\n                     os.path.basename(os.path.dirname(opt.secretfile))\n    if opt.icefile_prefix:\n        icefile_prefix = opt.icefile_prefix\n\n    timestamp = time.strftime("%H%M%S-%m-%d-%Y",\n                              datetime.datetime.now().timetuple())\n    ice_file = "%s/%s-%s.ice" % (dest_dir, icefile_prefix, timestamp)\n    if not encrypt(zip_filename, ice_file, pgp_keys):\n        raise aomi.exceptions.GPG("Unable to encrypt zipfile")\n\n    return ice_file',
     'def transform(self, sents):\n        \n\n        def convert(tokens):\n            return torch.tensor([self.vocab.stoi[t] for t in tokens], dtype=torch.long)\n\n        if self.vocab is None:\n            raise Exception(\n                "Must run .fit() for .fit_transform() before " "calling .transform()."\n            )\n\n        seqs = sorted([convert(s) for s in sents], key=lambda x: -len(x))\n        X = torch.LongTensor(pad_sequence(seqs, batch_first=True))\n        return X',
 ]
-# 埋め込みベクトルの生成
 embeddings = model.encode(sentences)
-print(embeddings.shape)  # [3, 768]
-# 類似度スコアの計算
 similarities = model.similarity(embeddings, embeddings)
-print(similarities.shape)  # [3, 3]
 ```
 ---
-### ライブラリバージョン
 - Python: 3.11.11
 - Sentence Transformers: 3.4.1
@@ -129,7 +143,7 @@ print(similarities.shape)  # [3, 3]
 ---
-### 引用情報
 #### Sentence Transformers
 ```bibtex
@@ -154,4 +168,4 @@ print(similarities.shape)  # [3, 3]
     archivePrefix={arXiv},
     primaryClass={cs.CL}
 }
-```

+## SentenceTransformer based on Shuu12121/CodeModernBERT-Owl🦉
+This model is a [sentence-transformers](https://www.SBERT.net) model fine-tuned from [Shuu12121/CodeModernBERT-Owl](https://huggingface.co/Shuu12121/CodeModernBERT-Owl).
+**It is specifically designed for code search and efficiently calculates semantic similarity between code snippets and documentation.**
+---
 このモデルは、[Shuu12121/CodeModernBERT-Owl](https://huggingface.co/Shuu12121/CodeModernBERT-Owl) をベースにファインチューニングされた [sentence-transformers](https://www.SBERT.net) モデルです。
 **特にコードサーチに特化しており、コード片やドキュメントから効果的に意味的類似性を計算できる** ように設計されています。
 ---
+### Model Evaluation / モデル評価
+#### CoIR Evaluation Results / CoIRにおける評価結果
+Despite being a relatively small model with around **150M parameters**, this model achieved an impressive **76.89** on the **CodeSearchNet** benchmark, demonstrating its high performance in code search tasks.
+Since this model is specialized for code search, it does not support other tasks, and thus evaluation scores for other tasks are not provided.
+In the CodeSearchNet task, this model outperforms many well-known models, as shown in the comparison table below.
+このモデルは、**150M程度と比較的小さいモデル**ながら、**コードサーチタスクにおける評価指標である CodeSearchNet で 76.89** を達成しました。
 他のタスクには対応していないため、評価値は提供されていません。
 CodeSearchNetタスクにおける評価値としては、他の有名なモデルと比較しても高いパフォーマンスを示しています。
+| Model Name                                    | CodeSearchNet Score |
+|-----------------------------------------------|----------------------|
+| **Shuu12121/CodeModernBERT-Owl**                | **76.89**             |
+| Salesforce/SFR-Embedding-Code-2B_R              | 73.5                  |
+| CodeSage-large-v2                              | 94.26                 |
+| Salesforce/SFR-Embedding-Code-400M_R             | 72.53                 |
+| CodeSage-large                                 | 90.58                 |
+| Voyage-Code-002                                | 81.79                 |
+| E5-Mistral                                     | 54.25                 |
+| E5-Base-v2                                     | 67.99                 |
+| OpenAI-Ada-002                                 | 74.21                 |
+| BGE-Base-en-v1.5                               | 69.6                  |
+| BGE-M3                                         | 43.23                 |
+| UniXcoder                                      | 60.2                  |
+| GTE-Base-en-v1.5                               | 43.35                 |
+| Contriever                                     | 34.72                 |
 ---
+### Model Details / モデル詳細
+- **Model Type / モデルタイプ:** Sentence Transformer
+- **Base Model / ベースモデル:** [Shuu12121/CodeModernBERT-Owl](https://huggingface.co/Shuu12121/CodeModernBERT-Owl)
+- **Maximum Sequence Length / 最大シーケンス長:** 2048 tokens
+- **Output Dimensions / 出力次元:** 768 dimensions
+- **Similarity Function / 類似度関数:** Cosine Similarity
+- **License / ライセンス:** Apache-2.0
 ---
+### Usage / 使用方法
+#### Installation / インストール
+To install Sentence Transformers, run the following command:
+Sentence Transformers をインストールするには、以下のコマンドを実行します。
 ```bash
 pip install -U sentence-transformers
 ```
+#### Model Loading and Inference / モデルのロードと推論
 ```python
 from sentence_transformers import SentenceTransformer
+# Load the model / モデルをダウンロードしてロード
 model = SentenceTransformer("Shuu12121/CodeSearch-ModernBERT-Owl")
+# Example sentences for inference / 推論用の文リスト
 sentences = [
     'Encrypts the zip file',
     'def freeze_encrypt(dest_dir, zip_filename, config, opt):\n    \n    pgp_keys = grok_keys(config)\n    icefile_prefix = "aomi-%s" % \\\n                     os.path.basename(os.path.dirname(opt.secretfile))\n    if opt.icefile_prefix:\n        icefile_prefix = opt.icefile_prefix\n\n    timestamp = time.strftime("%H%M%S-%m-%d-%Y",\n                              datetime.datetime.now().timetuple())\n    ice_file = "%s/%s-%s.ice" % (dest_dir, icefile_prefix, timestamp)\n    if not encrypt(zip_filename, ice_file, pgp_keys):\n        raise aomi.exceptions.GPG("Unable to encrypt zipfile")\n\n    return ice_file',
     'def transform(self, sents):\n        \n\n        def convert(tokens):\n            return torch.tensor([self.vocab.stoi[t] for t in tokens], dtype=torch.long)\n\n        if self.vocab is None:\n            raise Exception(\n                "Must run .fit() for .fit_transform() before " "calling .transform()."\n            )\n\n        seqs = sorted([convert(s) for s in sents], key=lambda x: -len(x))\n        X = torch.LongTensor(pad_sequence(seqs, batch_first=True))\n        return X',
 ]
+# Generate embeddings / 埋め込みベクトルの生成
 embeddings = model.encode(sentences)
+print(embeddings.shape)  # Output: [3, 768]
+# Calculate similarity scores / 類似度スコアの計算
 similarities = model.similarity(embeddings, embeddings)
+print(similarities.shape)  # Output: [3, 3]
 ```
 ---
+### Library Versions / ライブラリバージョン
 - Python: 3.11.11
 - Sentence Transformers: 3.4.1
 ---
+### Citation / 引用情報
 #### Sentence Transformers
 ```bibtex
     archivePrefix={arXiv},
     primaryClass={cs.CL}
 }
+```