--- tags: - audio-classification - sound-event-detection - wav2vec2 - urban-acoustics - deep-learning datasets: - UrbanSoundscape_EventDetection_Metadata license: apache-2.0 model-index: - name: UrbanSound_EventDetection_Wav2Vec2 results: - task: name: Audio Classification type: audio-classification metrics: - type: accuracy value: 0.945 name: Event Detection Accuracy - type: f1_macro value: 0.938 name: Macro F1 Score --- # UrbanSound_EventDetection_Wav2Vec2 ## 👂 Overview The **UrbanSound_EventDetection_Wav2Vec2** is a highly efficient model based on the pre-trained **Wav2Vec2** architecture, fine-tuned specifically for classifying momentary and continuous sound events within urban environments. It processes raw audio waveforms to identify one of eight high-priority urban sound classes, focusing on high-impact and potentially anomalous events. ## 🧠 Model Architecture This model utilizes the standard Wav2Vec2 pipeline, which operates directly on raw audio data without the need for manual feature extraction (like MFCCs). * **Base Model:** `facebook/wav2vec2-base` * **Feature Extractor:** A stack of 1D convolutional layers extracts local features from the raw waveform. * **Transformer Encoder:** 12 layers of Transformer blocks capture long-range dependencies and global context within the audio clip. * **Classification Head:** A task-specific linear layer is placed on top of the contextualized representations to predict one of the 8 event labels. * **Target Classes:** Car\_Horn, Children\_Playing, Dog\_Barking, Machinery\_Hum, Siren\_Emergency, Train\_Whistle, Tire\_Screech, and Glass\_Shattering. ## 🎯 Intended Use This model is intended for smart city, safety, and acoustic monitoring systems: 1. **Acoustic Surveillance:** Real-time detection of emergency sounds (Siren, Glass Shattering, Tire Screech) for public safety alerting. 2. **Noise Pollution Monitoring:** Quantifying the occurrence and frequency of specific noise sources (Car Horn, Machinery Hum) in different city zones. 3. **Urban Planning:** Analyzing soundscape composition to inform policy on zoning and noise mitigation strategies. ## ⚠️ Limitations 1. **Event Overlap:** The current setup is trained for single-label classification. If multiple sounds occur simultaneously (e.g., Siren + Dog Barking), the model will only output the single most probable event, potentially ignoring others. 2. **Domain Shift:** The model's performance may degrade if deployed in environments with significantly different background noise profiles (e.g., highly quiet suburbs vs. extremely loud Asian markets). 3. **Localization:** This model performs *event detection* but does not inherently provide *sound localization* (Direction-of-Arrival or DOA), which would require specialized input features (like ambisonic audio) and a different model head. --- ### MODEL 2: **MedicalChatbot_IntentClassifier_RoBERTa** This model is a RoBERTa-based model for multi-class classification of user intent within medical dialogue transcripts. #### config.json ```json { "_name_or_path": "roberta-base", "architectures": [ "RobertaForSequenceClassification" ], "hidden_size": 768, "model_type": "roberta", "num_hidden_layers": 12, "vocab_size": 50265, "id2label": { "0": "Symptom_Reporting", "1": "Advice_Seeking", "2": "Medication_Query", "3": "Appointment_Scheduling", "4": "Billing_Query", "5": "Causal_Query", "6": "Record_Retrieval", "7": "Urgency_Assessment" }, "label2id": { "Symptom_Reporting": 0, "Advice_Seeking": 1, "Medication_Query": 2, "Appointment_Scheduling": 3, "Billing_Query": 4, "Causal_Query": 5, "Record_Retrieval": 6, "Urgency_Assessment": 7 }, "num_labels": 8, "problem_type": "single_label_classification", "transformers_version": "4.36.0" }