Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,97 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- audio-classification
|
| 4 |
+
- sound-event-detection
|
| 5 |
+
- wav2vec2
|
| 6 |
+
- urban-acoustics
|
| 7 |
+
- deep-learning
|
| 8 |
+
datasets:
|
| 9 |
+
- UrbanSoundscape_EventDetection_Metadata
|
| 10 |
+
license: apache-2.0
|
| 11 |
+
model-index:
|
| 12 |
+
- name: UrbanSound_EventDetection_Wav2Vec2
|
| 13 |
+
results:
|
| 14 |
+
- task:
|
| 15 |
+
name: Audio Classification
|
| 16 |
+
type: audio-classification
|
| 17 |
+
metrics:
|
| 18 |
+
- type: accuracy
|
| 19 |
+
value: 0.945
|
| 20 |
+
name: Event Detection Accuracy
|
| 21 |
+
- type: f1_macro
|
| 22 |
+
value: 0.938
|
| 23 |
+
name: Macro F1 Score
|
| 24 |
+
---
|
| 25 |
+
|
| 26 |
+
# UrbanSound_EventDetection_Wav2Vec2
|
| 27 |
+
|
| 28 |
+
## 👂 Overview
|
| 29 |
+
|
| 30 |
+
The **UrbanSound_EventDetection_Wav2Vec2** is a highly efficient model based on the pre-trained **Wav2Vec2** architecture, fine-tuned specifically for classifying momentary and continuous sound events within urban environments. It processes raw audio waveforms to identify one of eight high-priority urban sound classes, focusing on high-impact and potentially anomalous events.
|
| 31 |
+
|
| 32 |
+
## 🧠 Model Architecture
|
| 33 |
+
|
| 34 |
+
This model utilizes the standard Wav2Vec2 pipeline, which operates directly on raw audio data without the need for manual feature extraction (like MFCCs).
|
| 35 |
+
|
| 36 |
+
* **Base Model:** `facebook/wav2vec2-base`
|
| 37 |
+
* **Feature Extractor:** A stack of 1D convolutional layers extracts local features from the raw waveform.
|
| 38 |
+
* **Transformer Encoder:** 12 layers of Transformer blocks capture long-range dependencies and global context within the audio clip.
|
| 39 |
+
* **Classification Head:** A task-specific linear layer is placed on top of the contextualized representations to predict one of the 8 event labels.
|
| 40 |
+
* **Target Classes:** Car\_Horn, Children\_Playing, Dog\_Barking, Machinery\_Hum, Siren\_Emergency, Train\_Whistle, Tire\_Screech, and Glass\_Shattering.
|
| 41 |
+
|
| 42 |
+
## 🎯 Intended Use
|
| 43 |
+
|
| 44 |
+
This model is intended for smart city, safety, and acoustic monitoring systems:
|
| 45 |
+
|
| 46 |
+
1. **Acoustic Surveillance:** Real-time detection of emergency sounds (Siren, Glass Shattering, Tire Screech) for public safety alerting.
|
| 47 |
+
2. **Noise Pollution Monitoring:** Quantifying the occurrence and frequency of specific noise sources (Car Horn, Machinery Hum) in different city zones.
|
| 48 |
+
3. **Urban Planning:** Analyzing soundscape composition to inform policy on zoning and noise mitigation strategies.
|
| 49 |
+
|
| 50 |
+
## ⚠️ Limitations
|
| 51 |
+
|
| 52 |
+
1. **Event Overlap:** The current setup is trained for single-label classification. If multiple sounds occur simultaneously (e.g., Siren + Dog Barking), the model will only output the single most probable event, potentially ignoring others.
|
| 53 |
+
2. **Domain Shift:** The model's performance may degrade if deployed in environments with significantly different background noise profiles (e.g., highly quiet suburbs vs. extremely loud Asian markets).
|
| 54 |
+
3. **Localization:** This model performs *event detection* but does not inherently provide *sound localization* (Direction-of-Arrival or DOA), which would require specialized input features (like ambisonic audio) and a different model head.
|
| 55 |
+
|
| 56 |
+
---
|
| 57 |
+
|
| 58 |
+
### MODEL 2: **MedicalChatbot_IntentClassifier_RoBERTa**
|
| 59 |
+
|
| 60 |
+
This model is a RoBERTa-based model for multi-class classification of user intent within medical dialogue transcripts.
|
| 61 |
+
|
| 62 |
+
#### config.json
|
| 63 |
+
|
| 64 |
+
```json
|
| 65 |
+
{
|
| 66 |
+
"_name_or_path": "roberta-base",
|
| 67 |
+
"architectures": [
|
| 68 |
+
"RobertaForSequenceClassification"
|
| 69 |
+
],
|
| 70 |
+
"hidden_size": 768,
|
| 71 |
+
"model_type": "roberta",
|
| 72 |
+
"num_hidden_layers": 12,
|
| 73 |
+
"vocab_size": 50265,
|
| 74 |
+
"id2label": {
|
| 75 |
+
"0": "Symptom_Reporting",
|
| 76 |
+
"1": "Advice_Seeking",
|
| 77 |
+
"2": "Medication_Query",
|
| 78 |
+
"3": "Appointment_Scheduling",
|
| 79 |
+
"4": "Billing_Query",
|
| 80 |
+
"5": "Causal_Query",
|
| 81 |
+
"6": "Record_Retrieval",
|
| 82 |
+
"7": "Urgency_Assessment"
|
| 83 |
+
},
|
| 84 |
+
"label2id": {
|
| 85 |
+
"Symptom_Reporting": 0,
|
| 86 |
+
"Advice_Seeking": 1,
|
| 87 |
+
"Medication_Query": 2,
|
| 88 |
+
"Appointment_Scheduling": 3,
|
| 89 |
+
"Billing_Query": 4,
|
| 90 |
+
"Causal_Query": 5,
|
| 91 |
+
"Record_Retrieval": 6,
|
| 92 |
+
"Urgency_Assessment": 7
|
| 93 |
+
},
|
| 94 |
+
"num_labels": 8,
|
| 95 |
+
"problem_type": "single_label_classification",
|
| 96 |
+
"transformers_version": "4.36.0"
|
| 97 |
+
}
|