Create README.md

b0415e8 verified 29 days ago

3.92 kB

	---
	tags:
	- audio-classification
	- sound-event-detection
	- wav2vec2
	- urban-acoustics
	- deep-learning
	datasets:
	- UrbanSoundscape_EventDetection_Metadata
	license: apache-2.0
	model-index:
	- name: UrbanSound_EventDetection_Wav2Vec2
	results:
	- task:
	name: Audio Classification
	type: audio-classification
	metrics:
	- type: accuracy
	value: 0.945
	name: Event Detection Accuracy
	- type: f1_macro
	value: 0.938
	name: Macro F1 Score
	---

	# UrbanSound_EventDetection_Wav2Vec2

	## 👂 Overview

	The UrbanSound_EventDetection_Wav2Vec2 is a highly efficient model based on the pre-trained Wav2Vec2 architecture, fine-tuned specifically for classifying momentary and continuous sound events within urban environments. It processes raw audio waveforms to identify one of eight high-priority urban sound classes, focusing on high-impact and potentially anomalous events.

	## 🧠 Model Architecture

	This model utilizes the standard Wav2Vec2 pipeline, which operates directly on raw audio data without the need for manual feature extraction (like MFCCs).

	* Base Model: `facebook/wav2vec2-base`
	* Feature Extractor: A stack of 1D convolutional layers extracts local features from the raw waveform.
	* Transformer Encoder: 12 layers of Transformer blocks capture long-range dependencies and global context within the audio clip.
	* Classification Head: A task-specific linear layer is placed on top of the contextualized representations to predict one of the 8 event labels.
	* Target Classes: Car\_Horn, Children\_Playing, Dog\_Barking, Machinery\_Hum, Siren\_Emergency, Train\_Whistle, Tire\_Screech, and Glass\_Shattering.

	## 🎯 Intended Use

	This model is intended for smart city, safety, and acoustic monitoring systems:

	1. Acoustic Surveillance: Real-time detection of emergency sounds (Siren, Glass Shattering, Tire Screech) for public safety alerting.
	2. Noise Pollution Monitoring: Quantifying the occurrence and frequency of specific noise sources (Car Horn, Machinery Hum) in different city zones.
	3. Urban Planning: Analyzing soundscape composition to inform policy on zoning and noise mitigation strategies.

	## ⚠️ Limitations

	1. Event Overlap: The current setup is trained for single-label classification. If multiple sounds occur simultaneously (e.g., Siren + Dog Barking), the model will only output the single most probable event, potentially ignoring others.
	2. Domain Shift: The model's performance may degrade if deployed in environments with significantly different background noise profiles (e.g., highly quiet suburbs vs. extremely loud Asian markets).
	3. Localization: This model performs event detection but does not inherently provide sound localization (Direction-of-Arrival or DOA), which would require specialized input features (like ambisonic audio) and a different model head.

	---

	### MODEL 2: MedicalChatbot_IntentClassifier_RoBERTa

	This model is a RoBERTa-based model for multi-class classification of user intent within medical dialogue transcripts.

	#### config.json

	```json
	{
	"_name_or_path": "roberta-base",
	"architectures": [
	"RobertaForSequenceClassification"
	],
	"hidden_size": 768,
	"model_type": "roberta",
	"num_hidden_layers": 12,
	"vocab_size": 50265,
	"id2label": {
	"0": "Symptom_Reporting",
	"1": "Advice_Seeking",
	"2": "Medication_Query",
	"3": "Appointment_Scheduling",
	"4": "Billing_Query",
	"5": "Causal_Query",
	"6": "Record_Retrieval",
	"7": "Urgency_Assessment"
	},
	"label2id": {
	"Symptom_Reporting": 0,
	"Advice_Seeking": 1,
	"Medication_Query": 2,
	"Appointment_Scheduling": 3,
	"Billing_Query": 4,
	"Causal_Query": 5,
	"Record_Retrieval": 6,
	"Urgency_Assessment": 7
	},
	"num_labels": 8,
	"problem_type": "single_label_classification",
	"transformers_version": "4.36.0"
	}