jslin09
/

gemma2-2b-ner

@@ -16,10 +16,45 @@ widget:
 <!-- Provide a quick summary of what the model is/does. -->
 本模型基於 [Gemma2:2b](https://huggingface.co/google/gemma-2-2b) 進行微調，目的是讓其依據台灣刑法學中常用的「刑法三階理論」，針對大型語言模型生成的詐欺罪「犯罪事實」段落，依照詐欺罪法條所規定的構成要件進行標註。具備生成詐欺罪「犯罪事實」的模型，可以參考以 BLOOM 560M 為基礎的[BLOOM 560M Fraud](https://huggingface.co/jslin09/bloom-560m-finetuned-fraud)微調模型，或是以 Gemma2 為基礎的[Gemma2:2b Fraud](https://huggingface.co/jslin09/gemma2-2b-fraud)微調模型。
 目前可以識別出來的標註標籤有以下七種具名實體，無法識別出來的構成要件要素具名實體，則會傳回 None。
 <pre>
-<code>
 from colorama import Fore, Back, Style
 elements = {'LEO_SOC': ('犯罪主體', 'Subject of Crime'),
@@ -38,7 +73,7 @@ tag_color = {'LEO_SOC': Fore.BLACK + Back.RED,
              'LEO_ROH': Fore.BLACK + Back.BLUE,
              'LEO_ATP': Fore.WHITE + Back.BLACK,
             }
-</code>
 </pre>
 為了要將本模型標註出來的結果以更明顯的方式識別，可以參考以下的程式碼，將本大型語言模型生成的標註結果以及所標註的標籤，同時送入以下的函數，就可以將結果以 colorama 的方式著色標註。
@@ -89,42 +124,6 @@ def tag_in_color(response_content, tag):
   </code>
 </pre>
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-本模型目前在識別出詐欺罪犯罪事實構成要件要素的平均準確率（percision）及召回率（recall）分別為0.98及0.75。從本模型訓練初期的語料資料錄為 979 筆開始，採用強化學習的流程，將生成的標註資料，採用人工對齊的方式修正後再投入語料庫中進行訓練。最終訓練用的語料計有 2577 筆，經過微調 3 個回合，就完成了本模型。以下是訓練過程各代的準確率及召回率的變化。
-|版次|資料量|準確率|召回率|
-|---|---|---|---|
-|v1|979|0.272727273|0.218623482|
-|v2|1538|0.725888325|0.581300813|
-|v3|1886|0.717277487|0.465986395|
-|v4|2173|0.826086957|0.550724638|
-|v5|2577|0.983606557|0.75|
-- **Developed by:** [Chun-Hsien Lin](https://huggingface.co/jslin09)
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** Traditional Chinese
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [Gemma2-2b](https://huggingface.co/google/gemma-2-2b)
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
@@ -140,6 +139,7 @@ def tag_in_color(response_content, tag):
 ### Out-of-Scope Use
 <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 [More Information Needed]

 <!-- Provide a quick summary of what the model is/does. -->
 本模型基於 [Gemma2:2b](https://huggingface.co/google/gemma-2-2b) 進行微調，目的是讓其依據台灣刑法學中常用的「刑法三階理論」，針對大型語言模型生成的詐欺罪「犯罪事實」段落，依照詐欺罪法條所規定的構成要件進行標註。具備生成詐欺罪「犯罪事實」的模型，可以參考以 BLOOM 560M 為基礎的[BLOOM 560M Fraud](https://huggingface.co/jslin09/bloom-560m-finetuned-fraud)微調模型，或是以 Gemma2 為基礎的[Gemma2:2b Fraud](https://huggingface.co/jslin09/gemma2-2b-fraud)微調模型。
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+本模型目前在識別出詐欺罪犯罪事實構成要件要素的平均準確率（percision）及召回率（recall）分別為0.98及0.75。從本模型訓練初期的語料資料錄為 979 筆開始，採用強化學習的流程，將生成的標註資料，採用人工對齊的方式修正後再投入語料庫中進行訓練。最終訓練用的語料計有 2577 筆，經過微調 3 個回合，就完成了本模型。以下是訓練過程各代的準確率及召回率的變化。
+|版次|資料量|準確率|召回率|
+|---|---|---|---|
+|v1|979|0.272727273|0.218623482|
+|v2|1538|0.725888325|0.581300813|
+|v3|1886|0.717277487|0.465986395|
+|v4|2173|0.826086957|0.550724638|
+|v5|2577|0.983606557|0.75|
+- **Developed by:** [Chun-Hsien Lin](https://huggingface.co/jslin09)
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** Traditional Chinese
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [Gemma2-2b](https://huggingface.co/google/gemma-2-2b)
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 目前可以識別出來的標註標籤有以下七種具名實體，無法識別出來的構成要件要素具名實體，則會傳回 None。
 <pre>
+  <code>
 from colorama import Fore, Back, Style
 elements = {'LEO_SOC': ('犯罪主體', 'Subject of Crime'),
              'LEO_ROH': Fore.BLACK + Back.BLUE,
              'LEO_ATP': Fore.WHITE + Back.BLACK,
             }
+  </code>
 </pre>
 為了要將本模型標註出來的結果以更明顯的方式識別，可以參考以下的程式碼，將本大型語言模型生成的標註結果以及所標註的標籤，同時送入以下的函數，就可以將結果以 colorama 的方式著色標註。
   </code>
 </pre>
 ### Direct Use
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 ### Out-of-Scope Use
 <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+本模型目前僅能標示依據中���民國刑法規定的「詐欺罪」所擬撰（或是語言模型生成）之「犯罪事實」中的構成要件要素，若要具備標註其餘各種不同的犯罪構成要件要素之標註能力，則是後續可以發展以及擴增語料庫的方向。
 [More Information Needed]