NikG100 commited on
Commit
314a702
·
verified ·
1 Parent(s): 18b0a51

Upload 7 files

Browse files
README.md ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # RoBERTa-Base Quantized Model for Intent Classification in Banking Systems
2
+
3
+ This repository contains a fine-tuned RoBERTa-Base model for **intent classification** on the **Banking77** dataset. The model identifies user intent from natural language queries in the context of banking services.
4
+
5
+ ## Model Details
6
+
7
+ - **Model Architecture:** RoBERTa Base
8
+ - **Task:** Intent Classification
9
+ - **Dataset:** Banking77
10
+ - **Use Case:** Detecting user intents in banking conversations
11
+ - **Fine-tuning Framework:** Hugging Face Transformers
12
+
13
+ ## Usage
14
+
15
+ ### Installation
16
+
17
+ ```bash
18
+ pip install transformers torch datasets
19
+ ```
20
+
21
+ ### Loading the Model
22
+
23
+ ```python
24
+ from transformers import RobertaTokenizerFast, RobertaForSequenceClassification
25
+ import torch
26
+ from datasets import load_dataset
27
+
28
+ # Load tokenizer and model
29
+ tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")
30
+ model = RobertaForSequenceClassification.from_pretrained("path_to_your_fine_tuned_model")
31
+ model.eval()
32
+
33
+ # Sample input
34
+ text = "I am still waiting on my card?"
35
+
36
+ # Tokenize and predict
37
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
38
+ with torch.no_grad():
39
+ outputs = model(**inputs)
40
+ predicted_class = torch.argmax(outputs.logits, dim=1).item()
41
+
42
+ # Load label mapping from dataset
43
+ label_map = load_dataset("PolyAI/banking77")["train"].features["label"].int2str
44
+ predicted_label = label_map(predicted_class)
45
+
46
+ print(f"Predicted Intent: {predicted_label}")
47
+ ```
48
+
49
+ ## Performance Metrics
50
+
51
+ - **Accuracy:** 0.927922
52
+ - **Precision:** 0.931764
53
+ - **Recall:** 0.927922
54
+ - **F1 Score:** 0.927976
55
+
56
+ ## Fine-Tuning Details
57
+
58
+ ### Dataset
59
+
60
+ The Banking77 dataset contains 13,083 labeled queries across 77 banking-related intents, including tasks like checking balances, transferring money, and reporting fraud.
61
+
62
+ ### Training Configuration
63
+
64
+ - Number of epochs: 5
65
+ - Batch size: 16
66
+ - Evaluation strategy: epoch
67
+ - Learning rate: 2e-5
68
+
69
+ ## Repository Structure
70
+
71
+ ```
72
+ .
73
+ ├── config.json
74
+ ├── tokenizer_config.json
75
+ ├── special_tokens_map.json
76
+ ├── tokenizer.json
77
+ ├── model.safetensors # Fine-tuned RoBERTa model
78
+ ├── README.md # Documentation
79
+ ```
80
+
81
+ ## Limitations
82
+
83
+ - The model may not generalize well to domains outside the fine-tuning dataset.
84
+
85
+ - Quantization may result in minor accuracy degradation compared to full-precision models.
86
+
87
+ ## Contributing
88
+
89
+ Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:678fb0af2cc1a0e85a5f865b893f867a4898ab1230ee6da817c4b61307999eda
3
+ size 249433778
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": true,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": true,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": true,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<s>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<pad>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "</s>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "50264": {
37
+ "content": "<mask>",
38
+ "lstrip": true,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "<s>",
46
+ "clean_up_tokenization_spaces": false,
47
+ "cls_token": "<s>",
48
+ "eos_token": "</s>",
49
+ "errors": "replace",
50
+ "extra_special_tokens": {},
51
+ "mask_token": "<mask>",
52
+ "max_length": 128,
53
+ "model_max_length": 512,
54
+ "pad_to_multiple_of": null,
55
+ "pad_token": "<pad>",
56
+ "pad_token_type_id": 0,
57
+ "padding_side": "right",
58
+ "sep_token": "</s>",
59
+ "stride": 0,
60
+ "tokenizer_class": "RobertaTokenizer",
61
+ "trim_offsets": true,
62
+ "truncation_side": "right",
63
+ "truncation_strategy": "longest_first",
64
+ "unk_token": "<unk>"
65
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff