Upload folder using huggingface_hub

Browse files

Files changed (12) hide show

README.md +227 -0
config.json +133 -0
onnx/model.onnx +3 -0
onnx/model_bnb4.onnx +3 -0
onnx/model_fp16.onnx +3 -0
onnx/model_int8.onnx +3 -0
onnx/model_q4.onnx +3 -0
onnx/model_q4f16.onnx +3 -0
onnx/model_quantized.onnx +3 -0
onnx/model_uint8.onnx +3 -0
preprocessor_config.json +9 -0
quantize_config.json +18 -0

README.md ADDED Viewed

	@@ -0,0 +1,227 @@

+---
+language: en
+license: mit
+tags:
+- audio
+- audio-classification
+- musical-instruments
+- wav2vec2
+- transformers
+- pytorch
+datasets:
+- custom
+metrics:
+- accuracy
+- roc_auc
+model-index:
+- name: epoch_musical_instruments_identification_2
+  results:
+  - task:
+      type: audio-classification
+      name: Musical Instrument Classification
+    metrics:
+    - type: accuracy
+      value: 0.9333
+      name: Accuracy
+    - type: roc_auc
+      value: 0.9859
+      name: ROC AUC (Macro)
+    - type: loss
+      value: 1.0639
+      name: Validation Loss
+base_model:
+- Bhaveen/Musical-Instrument-Classification
+library_name: transformers.js
+pipeline_tag: audio-classification
+---
+# Musical-Instrument-Classification (ONNX)
+This is an ONNX version of [Bhaveen/Musical-Instrument-Classification](https://huggingface.co/Bhaveen/Musical-Instrument-Classification). It was automatically converted and uploaded using [this Hugging Face Space](https://huggingface.co/spaces/onnx-community/convert-to-onnx).
+## Usage with Transformers.js
+See the pipeline documentation for `audio-classification`: https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.AudioClassificationPipeline
+---
+# Musical Instrument Classification Model
+This model is a fine-tuned version of [facebook/wav2vec2-base-960h](https://huggingface.co/facebook/wav2vec2-base-960h) for musical instrument classification. It can identify 9 different musical instruments from audio recordings with high accuracy.
+## Model Description
+- **Model type:** Audio Classification
+- **Base model:** facebook/wav2vec2-base-960h
+- **Language:** Audio (no specific language)
+- **License:** MIT
+- **Fine-tuned on:** Custom musical instrument dataset (200 samples for each class)
+## Performance
+The model achieves excellent performance on the evaluation set after 5 epochs of training:
+- **Final Accuracy:** 93.33%
+- **Final ROC AUC (Macro):** 98.59%
+- **Final Validation Loss:** 1.064
+- **Evaluation Runtime:** 14.18 seconds
+- **Evaluation Speed:** 25.39 samples/second
+### Training Progress
+| Epoch | Training Loss | Validation Loss | ROC AUC | Accuracy |
+|-------|---------------|-----------------|---------|----------|
+| 1     | 1.9872        | 1.8875          | 0.9248  | 0.6639   |
+| 2     | 1.8652        | 1.4793          | 0.9799  | 0.8000   |
+| 3     | 1.3868        | 1.2311          | 0.9861  | 0.8194   |
+| 4     | 1.3242        | 1.1121          | 0.9827  | 0.9250   |
+| 5     | 1.1869        | 1.0639          | 0.9859  | 0.9333   |
+## Supported Instruments
+The model can classify the following 9 musical instruments:
+1. **Acoustic Guitar**
+2. **Bass Guitar**
+3. **Drum Set**
+4. **Electric Guitar**
+5. **Flute**
+6. **Hi-Hats**
+7. **Keyboard**
+8. **Trumpet**
+9. **Violin**
+## Usage
+### Quick Start with Pipeline
+```python
+from transformers import pipeline
+import torchaudio
+# Load the classification pipeline
+classifier = pipeline("audio-classification", model="Bhaveen/epoch_musical_instruments_identification_2")
+# Load and preprocess audio
+audio, rate = torchaudio.load("your_audio_file.wav")
+transform = torchaudio.transforms.Resample(rate, 16000)
+audio = transform(audio).numpy().reshape(-1)[:48000]
+# Classify the audio
+result = classifier(audio)
+print(result)
+```
+### Using Transformers Directly
+```python
+from transformers import AutoFeatureExtractor, AutoModelForAudioClassification
+import torchaudio
+import torch
+# Load model and feature extractor
+model_name = "Bhaveen/epoch_musical_instruments_identification_2"
+feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)
+model = AutoModelForAudioClassification.from_pretrained(model_name)
+# Load and preprocess audio
+audio, rate = torchaudio.load("your_audio_file.wav")
+transform = torchaudio.transforms.Resample(rate, 16000)
+audio = transform(audio).numpy().reshape(-1)[:48000]
+# Extract features and make prediction
+inputs = feature_extractor(audio, sampling_rate=16000, return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**inputs)
+    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
+    predicted_class = torch.argmax(predictions, dim=-1)
+print(f"Predicted instrument: {model.config.id2label[predicted_class.item()]}")
+```
+## Training Details
+### Dataset and Preprocessing
+- **Custom dataset** with audio recordings of 9 musical instruments
+- **Train/Test Split:** 80/20 using file numbering (files < 160 for training)
+- **Data Balancing:** Random oversampling applied to minority classes
+- **Audio Preprocessing:**
+  - Resampling to 16,000 Hz
+  - Fixed length of 48,000 samples (3 seconds)
+  - Truncation of longer audio files
+### Training Configuration
+```python
+# Training hyperparameters
+batch_size = 1
+gradient_accumulation_steps = 4
+learning_rate = 5e-6
+num_train_epochs = 5
+warmup_steps = 50
+weight_decay = 0.02
+```
+### Model Architecture
+- **Base Model:** facebook/wav2vec2-base-960h
+- **Classification Head:** Added for 9-class classification
+- **Parameters:** ~95M trainable parameters
+- **Features:** Wav2Vec2 audio representations with fine-tuned classification layer
+## Technical Specifications
+- **Audio Format:** WAV files
+- **Sample Rate:** 16,000 Hz
+- **Input Length:** 3 seconds (48,000 samples)
+- **Model Framework:** PyTorch + Transformers
+- **Inference Device:** GPU recommended (CUDA)
+## Evaluation Metrics
+The model uses the following evaluation metrics:
+- **Accuracy:** Standard classification accuracy
+- **ROC AUC:** Macro-averaged ROC AUC with one-vs-rest approach
+- **Multi-class Classification:** Softmax probabilities for all 9 instrument classes
+## Limitations and Considerations
+1. **Audio Duration:** Model expects exactly 3-second audio clips (truncates longer, may not work well with shorter)
+2. **Single Instrument Focus:** Optimized for single instrument classification, mixed instruments may produce uncertain results
+3. **Audio Quality:** Performance depends on audio quality and recording conditions
+4. **Sample Rate:** Input must be resampled to 16kHz for optimal performance
+5. **Domain Specificity:** Trained on specific instrument recordings, may not generalize to all variants or playing styles
+## Training Environment
+- **Platform:** Google Colab
+- **GPU:** CUDA-enabled device
+- **Libraries:**
+  - transformers==4.28.1
+  - torchaudio==0.12
+  - datasets
+  - evaluate
+  - imblearn
+## Model Files
+The repository contains:
+- Model weights and configuration
+- Feature extractor configuration
+- Training logs and metrics
+- Label mappings (id2label, label2id)
+---
+*Model trained as part of a hackathon project*

config.json ADDED Viewed

	@@ -0,0 +1,133 @@

+{
+  "_attn_implementation_autoset": true,
+  "_name_or_path": "Bhaveen/Musical-Instrument-Classification",
+  "activation_dropout": 0.1,
+  "adapter_attn_dim": null,
+  "adapter_kernel_size": 3,
+  "adapter_stride": 2,
+  "add_adapter": false,
+  "apply_spec_augment": true,
+  "architectures": [
+    "Wav2Vec2ForSequenceClassification"
+  ],
+  "attention_dropout": 0.1,
+  "bos_token_id": 1,
+  "classifier_proj_size": 256,
+  "codevector_dim": 256,
+  "contrastive_logits_temperature": 0.1,
+  "conv_bias": false,
+  "conv_dim": [
+    512,
+    512,
+    512,
+    512,
+    512,
+    512,
+    512
+  ],
+  "conv_kernel": [
+    10,
+    3,
+    3,
+    3,
+    3,
+    2,
+    2
+  ],
+  "conv_stride": [
+    5,
+    2,
+    2,
+    2,
+    2,
+    2,
+    2
+  ],
+  "ctc_loss_reduction": "sum",
+  "ctc_zero_infinity": false,
+  "diversity_loss_weight": 0.1,
+  "do_stable_layer_norm": false,
+  "eos_token_id": 2,
+  "feat_extract_activation": "gelu",
+  "feat_extract_dropout": 0.0,
+  "feat_extract_norm": "group",
+  "feat_proj_dropout": 0.1,
+  "feat_quantizer_dropout": 0.0,
+  "final_dropout": 0.1,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout": 0.1,
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "Acoustic_Guitar",
+    "1": "Bass_Guitar",
+    "2": "Drum_set",
+    "3": "Electro_Guitar",
+    "4": "flute",
+    "5": "Hi_Hats",
+    "6": "Keyboard",
+    "7": "Trumpet",
+    "8": "Violin"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "LABEL_0": 0,
+    "LABEL_1": 1,
+    "LABEL_2": 2,
+    "LABEL_3": 3,
+    "LABEL_4": 4,
+    "LABEL_5": 5,
+    "LABEL_6": 6,
+    "LABEL_7": 7,
+    "LABEL_8": 8
+  },
+  "layer_norm_eps": 1e-05,
+  "layerdrop": 0.1,
+  "mask_feature_length": 10,
+  "mask_feature_min_masks": 0,
+  "mask_feature_prob": 0.0,
+  "mask_time_length": 10,
+  "mask_time_min_masks": 2,
+  "mask_time_prob": 0.05,
+  "model_type": "wav2vec2",
+  "num_adapter_layers": 3,
+  "num_attention_heads": 12,
+  "num_codevector_groups": 2,
+  "num_codevectors_per_group": 320,
+  "num_conv_pos_embedding_groups": 16,
+  "num_conv_pos_embeddings": 128,
+  "num_feat_extract_layers": 7,
+  "num_hidden_layers": 12,
+  "num_negatives": 100,
+  "output_hidden_size": 768,
+  "pad_token_id": 0,
+  "proj_codevector_dim": 256,
+  "tdnn_dilation": [
+    1,
+    2,
+    3,
+    1,
+    1
+  ],
+  "tdnn_dim": [
+    512,
+    512,
+    512,
+    512,
+    1500
+  ],
+  "tdnn_kernel": [
+    5,
+    3,
+    3,
+    1,
+    1
+  ],
+  "torch_dtype": "float32",
+  "transformers_version": "4.49.0",
+  "use_weighted_layer_sum": false,
+  "vocab_size": 32,
+  "xvector_output_dim": 512
+}

onnx/model.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3adf7396021cb5f8a0cb45fce04aedba32a78ae3bc3a19226f9385608f8e1832
+size 378610509

onnx/model_bnb4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2aaa291bdfa7c367620e43e5f810258f017d17db3254b03b1239c48db47557e0
+size 84631623

onnx/model_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a5da1ced37d21aaf885edf4f7ee33fc5f28b8469211a251f316964bb9837ad6d
+size 189468524

onnx/model_int8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a826b4e026bde44dd279e220cf4b3281b270eda1c7f16e2027c9bea7d7a325f1
+size 95389281

onnx/model_q4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:db6034aaf2f9d28177d0d42311dab016440609a93d7afaa563f2c5ae71cc8fc8
+size 89976361

onnx/model_q4f16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:345b9a2c85a0a59d51ad9058f24e1a322fadc4d18a642a5b2aadd63ee29d13ef
+size 66538151

onnx/model_quantized.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:82415992d0636e8ba045584a26b26717ff68aaa22114f73805edd308a7b39e6e
+size 95389322

onnx/model_uint8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:82415992d0636e8ba045584a26b26717ff68aaa22114f73805edd308a7b39e6e
+size 95389322

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "do_normalize": true,
+  "feature_extractor_type": "Wav2Vec2FeatureExtractor",
+  "feature_size": 1,
+  "padding_side": "right",
+  "padding_value": 0.0,
+  "return_attention_mask": false,
+  "sampling_rate": 16000
+}

quantize_config.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+    "modes": [
+        "fp16",
+        "q8",
+        "int8",
+        "uint8",
+        "q4",
+        "q4f16",
+        "bnb4"
+    ],
+    "per_channel": false,
+    "reduce_range": false,
+    "block_size": null,
+    "is_symmetric": true,
+    "accuracy_level": null,
+    "quant_type": 1,
+    "op_block_list": null
+}