Text Classification
Transformers
Safetensors
English
bert
finance
ai-detector
sequence-classification
text-embeddings-inference
Instructions to use msperlin/finbert-ai-detector with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use msperlin/finbert-ai-detector with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="msperlin/finbert-ai-detector")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("msperlin/finbert-ai-detector") model = AutoModelForSequenceClassification.from_pretrained("msperlin/finbert-ai-detector") - Notebooks
- Google Colab
- Kaggle
Refactor code structure for improved readability and maintainability
Browse files- README.md +19 -0
- figs/confusion_matrix.png +0 -0
README.md
CHANGED
|
@@ -36,6 +36,25 @@ The model was trained on a custom dataset compiled from human-written financial
|
|
| 36 |
- **Data Generation:** Actual human texts from corporate annual reports were compiled. State-of-the-art Large Language Models (LLMs), including OpenAI's ChatGPT, Google's Gemini, and Anthropic's Claude, were then prompted to rewrite these sections or generate similar artificial financial texts.
|
| 37 |
- **Training Method:** The base `finbert-pretrain` model—already pre-trained on a large corpus of financial text—was fine-tuned on this mixed dataset to classify whether a given segment of text is human-written or generated by an AI.
|
| 38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
## Uses
|
| 40 |
|
| 41 |
This model is intended for researchers, financial analysts, and auditors who want to verify the authenticity of corporate disclosures and determine if a financial text (like an annual report or an earnings call transcript) was written by an AI or a human.
|
|
|
|
| 36 |
- **Data Generation:** Actual human texts from corporate annual reports were compiled. State-of-the-art Large Language Models (LLMs), including OpenAI's ChatGPT, Google's Gemini, and Anthropic's Claude, were then prompted to rewrite these sections or generate similar artificial financial texts.
|
| 37 |
- **Training Method:** The base `finbert-pretrain` model—already pre-trained on a large corpus of financial text—was fine-tuned on this mixed dataset to classify whether a given segment of text is human-written or generated by an AI.
|
| 38 |
|
| 39 |
+
## Performance
|
| 40 |
+
|
| 41 |
+
Total cases (AI & Human): 6000
|
| 42 |
+
Total cases (AI): 3000
|
| 43 |
+
|
| 44 |
+
Estimation cases: 4200
|
| 45 |
+
Test cases: 1800
|
| 46 |
+
|
| 47 |
+
| Metric | Value |
|
| 48 |
+
|-----------|---------|
|
| 49 |
+
| accuracy | 89.16% |
|
| 50 |
+
| f1 | 88.57% |
|
| 51 |
+
| precision | 92.64% |
|
| 52 |
+
| recall | 84.84% |
|
| 53 |
+
|
| 54 |
+
### Confusion Matrix
|
| 55 |
+
|
| 56 |
+

|
| 57 |
+
|
| 58 |
## Uses
|
| 59 |
|
| 60 |
This model is intended for researchers, financial analysts, and auditors who want to verify the authenticity of corporate disclosures and determine if a financial text (like an annual report or an earnings call transcript) was written by an AI or a human.
|
figs/confusion_matrix.png
ADDED
|