Upload 7 files

Browse files

Files changed (7) hide show

README.md +103 -0
config.json +155 -0
model.safetensors +3 -0
special_tokens_map.json +37 -0
tokenizer.json +0 -0
tokenizer_config.json +63 -0
vocab.txt +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,103 @@

+---
+library_name: transformers
+tags:
+- intent-classificaton
+- text-classification
+license: apache-2.0
+language:
+- en
+base_model:
+- google-bert/bert-base-uncased
+pipeline_tag: text-classification
+---
+# Model Card for Model ID
+This is a fine-tuned BERT-based model for intent classification, capable of categorizing intents into 82 distinct labels. It was trained on a consolidated dataset of multilingual intent datasets.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
+model = AutoModelForSequenceClassification.from_pretrained("yeniguno/bert-uncased-intent-classification")
+tokenizer = AutoTokenizer.from_pretrained("yeniguno/bert-uncased-intent-classification")
+pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
+text = "Play the song, Sam."
+prediction = pipe(text)
+print(prediction)
+# [{'label': 'play_music', 'score': 0.9997674822807312}]
+```
+## Uses
+This model is intended for:
+Natural Language Understanding (NLU) tasks. Classifying user intents for applications such as:
+- Voice assistants
+- Chatbots
+- Customer support automation
+- Conversational AI systems
+## Bias, Risks, and Limitations
+The model's performance may degrade on intents that are underrepresented in the training data. Not optimized for languages other than English. Domain-specific intents not included in the dataset may require additional fine-tuning.
+## Training Details
+### Training Data
+his model was trained on a combination of intent datasets from various sources:
+Datasets Used:
+- mteb/amazon_massive_intent
+- mteb/mtop_intent
+- sonos-nlu-benchmark/snips_built_in_intents
+- Mozilla/smart_intent_dataset
+- Bhuvaneshwari/intent_classification
+- clinc/clinc_oos
+Each dataset was preprocessed, and intent labels were consolidated into 82 unique classes.
+Dataset Sizes:
+- Train size: 138228
+- Validation size: 17279
+- Test size: 17278
+### Training Procedure
+The model was fine-tuned with the following hyperparameters:
+Base Model: bert-base-uncased Learning Rate: 3e-5 Batch Size: 32 Epochs: 4 Weight Decay: 0.01 Evaluation Strategy: Per epoch Mixed Precision: FP32 Hardware: A100
+## Evaluation
+### Results
+#### Training and Validation:
+| Epoch | Training Loss | Validation Loss | Accuracy | F1 Score | Precision | Recall |
+|-------|---------------|-----------------|----------|----------|-----------|--------|
+| 1     | 0.1143        | 0.1014          | 97.38%   | 97.33%   | 97.36%    | 97.38% |
+| 2     | 0.0638        | 0.0833          | 97.78%   | 97.79%   | 97.83%    | 97.78% |
+| 3     | 0.0391        | 0.0946          | 97.98%   | 97.98%   | 97.99%    | 97.98% |
+| 4     | 0.0122        | 0.1013          | 98.04%   | 98.04%   | 98.05%    | 98.04% |
+#### Test Results:
+| Metric      | Value    |
+|-------------|----------|
+| **Loss**    | 0.0814   |
+| **Accuracy**| 98.37%   |
+| **F1 Score**| 98.37%   |
+| **Precision**| 98.38%  |
+| **Recall**  | 98.37%   |

config.json ADDED Viewed

	@@ -0,0 +1,155 @@

+{
+  "_name_or_path": "./fine_tuned_bert",
+  "architectures": [
+    "BertForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "get_message",
+    "1": "get_weather",
+    "2": "alarm_query",
+    "3": "send_message",
+    "4": "get_recipe",
+    "5": "set_availability",
+    "6": "delete_reminder",
+    "7": "get_news",
+    "8": "create_alarm",
+    "9": "get_reminder",
+    "10": "create_reminder",
+    "11": "event_query",
+    "12": "play_music",
+    "13": "create_call",
+    "14": "end_call",
+    "15": "create_timer",
+    "16": "update_call",
+    "17": "update_reminder",
+    "18": "get_contact",
+    "19": "get_timer",
+    "20": "delete_alarm",
+    "21": "add_timer",
+    "22": "get_location",
+    "23": "delete_timer",
+    "24": "music_query",
+    "25": "update_alarm",
+    "26": "information_query",
+    "27": "help_query",
+    "28": "weather_query",
+    "29": "navigation_query",
+    "30": "purchase_intent",
+    "31": "translation_intent",
+    "32": "travel_intent",
+    "33": "add_to_playlist",
+    "34": "rate_book",
+    "35": "search_event",
+    "36": "book_restaurant",
+    "37": "set_meeting",
+    "38": "search_creative_work",
+    "39": "cancellation",
+    "40": "affirmation",
+    "41": "excitement_emotion",
+    "42": "insurance_query",
+    "43": "fun_fact",
+    "44": "time_query",
+    "45": "update_cart",
+    "46": "oil_change_query",
+    "47": "set_reservation",
+    "48": "confirm_reservation",
+    "49": "next_song",
+    "50": "suggest_restaurant",
+    "51": "pay_bill",
+    "52": "reminder_query",
+    "53": "check_availability",
+    "54": "accept_reservations",
+    "55": "order_query",
+    "56": "meeting_query",
+    "57": "book_hotel",
+    "58": "cart_query",
+    "59": "location_query",
+    "60": "cancel_reservation",
+    "61": "traffic_query"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "accept_reservations": 54,
+    "add_timer": 21,
+    "add_to_playlist": 33,
+    "affirmation": 40,
+    "alarm_query": 2,
+    "book_hotel": 57,
+    "book_restaurant": 36,
+    "cancel_reservation": 60,
+    "cancellation": 39,
+    "cart_query": 58,
+    "check_availability": 53,
+    "confirm_reservation": 48,
+    "create_alarm": 8,
+    "create_call": 13,
+    "create_reminder": 10,
+    "create_timer": 15,
+    "delete_alarm": 20,
+    "delete_reminder": 6,
+    "delete_timer": 23,
+    "end_call": 14,
+    "event_query": 11,
+    "excitement_emotion": 41,
+    "fun_fact": 43,
+    "get_contact": 18,
+    "get_location": 22,
+    "get_message": 0,
+    "get_news": 7,
+    "get_recipe": 4,
+    "get_reminder": 9,
+    "get_timer": 19,
+    "get_weather": 1,
+    "help_query": 27,
+    "information_query": 26,
+    "insurance_query": 42,
+    "location_query": 59,
+    "meeting_query": 56,
+    "music_query": 24,
+    "navigation_query": 29,
+    "next_song": 49,
+    "oil_change_query": 46,
+    "order_query": 55,
+    "pay_bill": 51,
+    "play_music": 12,
+    "purchase_intent": 30,
+    "rate_book": 34,
+    "reminder_query": 52,
+    "search_creative_work": 38,
+    "search_event": 35,
+    "send_message": 3,
+    "set_availability": 5,
+    "set_meeting": 37,
+    "set_reservation": 47,
+    "suggest_restaurant": 50,
+    "time_query": 44,
+    "traffic_query": 61,
+    "translation_intent": 31,
+    "travel_intent": 32,
+    "update_alarm": 25,
+    "update_call": 16,
+    "update_cart": 45,
+    "update_reminder": 17,
+    "weather_query": 28
+  },
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "problem_type": "single_label_classification",
+  "torch_dtype": "float32",
+  "transformers_version": "4.47.1",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:da778faff069f13a103e5cc828ddef95883522b48eac44742abf37b6a83ac8da
+size 438143208

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "max_length": 128,
+  "model_max_length": 512,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff