Gavin-chen
/

captcha_i2t

Transformers

Safetensors

captcha_transformer

Model card Files Files and versions

xet

Community

Gavin-chen commited on Oct 28, 2025

Commit

859beab

verified ·

1 Parent(s): 6bbe44e

Update README.md

Browse files

Files changed (1) hide show

README.md +45 -120

README.md CHANGED Viewed

@@ -1,6 +1,8 @@
 ---
 library_name: transformers
-tags: []
 ---
 # Model Card for Model ID
@@ -13,98 +15,77 @@ tags: []
 ### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
 ## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
 ### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
 ### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
 ### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
 ### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 #### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
 #### Testing Data
@@ -113,30 +94,11 @@ Use the code below to get started with the model.
 [More Information Needed]
 #### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
 #### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
 ## Environmental Impact
@@ -144,56 +106,19 @@ Use the code below to get started with the model.
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
 ### Model Architecture and Objective
 [More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
 #### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
 library_name: transformers
+license: mit
+datasets:
+- gary109/captcha-synth-v3
 ---
 # Model Card for Model ID
 ### Model Description
+本模型結合了卷積神經網絡 (CNN) 作為**視覺特徵提取器**和 Transformer Encoder 作為**序列解碼器**，旨在解決光學字元辨識 (OCR) 中的驗證碼識別任務。
+CNN Backbone 負責從輸入的灰階驗證碼圖片中提取豐富的空間特徵，而 Transformer Encoder 則利用自註意力機制 (Self-Attention) 來理解這些特徵的序列關係和上下文資訊，最終輸出每個時間步對應各個字元（包含 CTC Blank Token）的機率分佈。
+模型使用 CTC Loss 進行訓練，使其能夠在不知道確切字元對齊位置的情況下學習序列預測。
+訓練完成時，模型能在資料集作者提供的驗證集中達到91.14%的準確度
+- **Developed by:** [me]
+- 沒填的部分就是作者沒看懂要填什麼
 ## Uses
 ### Direct Use
+此模型可以直接用於識別與 gary109/captcha-synth-v3 數據集中風格類似的驗證碼圖片。
 ### Downstream Use [optional]
+此模型可以作為更複雜系統的一部分，例如自動化測試流程或輔助工具。也可以在其基礎上，使用特定風格的驗證碼數據進行進一步的微調（例如使用 LoRA）。
 ### Out-of-Scope Use
+* 此模型**不適用於**通用的 OCR 任務（例如掃描文件）、手寫文字識別。
+* 對於與訓練數據風格迥異（例如完全不同的字體、雜訊模式、背景）的驗證碼，性能可能會顯著下降。
+* **道德考量**：此模型**不應**被用於惡意繞過網站的安全機制或進行任何形式的濫用。開發和使用此類技術應遵守相關法律法規和道德準則。
+## **Bias, Risks, and Limitations**
+* **性能偏差**：模型性能高度依賴於輸入圖片與訓練數據的相似性。對於訓練集中未出現或罕見的字元樣式、雜訊類型，模型可能表現不佳。
+* **數據集偏差**：gary109/captcha-synth-v3 數據集的生成方式可能引入潛在偏差（例如某些字元組合更常見）。
+* **安全性風險**：如果被用於攻擊性目的，可能繞過基於 CAPTCHA 的人機驗證，構成安全風險。
+* **魯棒性限制**：儘管使用了數據增強，模型對於極端的圖像失真、遮擋或對抗性攻擊可能仍然比較脆弱。
 ### Recommendations
+強烈建議使用者在使用此模型前，充分了解其能力邊界和潛在風險。對於任何安全敏感的應用，不應依賴此模型作為唯一的防護措施。建議在使用或微調此模型時，對目標數據進行充分的評估和錯誤分析。
 ## How to Get Started with the Model
+稍後會上傳訓練時使用的程式檔案
 ## Training Details
 ### Training Data
+模型主要在 [gary109/captcha-synth-v3](https://www.google.com/search?q=https://huggingface.co/datasets/gary109/captcha-synth-v3) 數據集的 train split (約 120 萬張圖片) 上進行訓練。
+該數據集包含帶有標籤的合成驗證碼圖片。
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 #### Preprocessing [optional]
+訓練和驗證數據都經過了以下預處理：
+1. **灰階轉換**：將圖片轉換為單通道灰階圖。
+2. **保持長寬比縮放與填充 (PadAndResize)**：將圖片縮放到 50x200，同時保持原始長寬比，不足部分用黑色 (0) 填充。
+3. **轉換為 Tensor**。
+4. **歸一化**：將像素值歸一化到 \[-1, 1\] 範圍。
+在微調階段，訓練集還額外應用了**數據增強**，包括：
+* RandomAffine: 隨機旋轉 (±8°)、平移 (±10%)、縮放 (±10%)、錯切 (±5°)。
+* RandomPerspective: 隨機透視變換。
+* ColorJitter: 隨機調整亮度和對比度。
+* RandomErasing: 隨機擦除圖片的一小塊區域。
+#### Training Hyperparameters
+見config
 #### Testing Data
 [More Information Needed]
 #### Factors
+未進行特定子群體或領域的分解評估。
 #### Metrics
+主要評估指標是 **完全匹配準確率 (Exact Match Accuracy)**：模型輸出的文字序列與真實標籤完全一致的樣本比例。同時，在分析中也考慮了錯誤類型（長度不匹配、替換錯誤、複雜錯誤）和字元替換混淆矩陣。
 ## Environmental Impact
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** RTX 5070 Ti
+- **Hours used:** 5
 ### Model Architecture and Objective
+模型採用 CNN 作為視覺特徵提取器，隨後是一個多層 Transformer Encoder 負責序列建模。目標是通過 CTCLoss 最小化預測序列與真實標籤之間的差異。
 [More Information Needed]
 #### Software
+* Python 3.13.6
+* PyTorch 2.8.0+cu129
+* Transformers 4.57.0
+* Datasets 4.3.0
+* CUDA 12.9