samanthajmichael
/

siamese_model.h5

 ---
+language: en
+license: other
+library_name: tensorflow
+tags:
+- computer-vision
+- video-processing
+- siamese-network
+- match-cut-detection
+datasets:
+- custom
+metrics:
+- accuracy
+model-index:
+- name: siamese_model
+  results:
+  - task:
+      type: image-similarity
+      subtype: match-cut-detection
+    metrics:
+      - type: accuracy
+        value: 0.956
+        name: Test Accuracy
+---: Test Accuracy
+---
+# Model Card for samanthajmichael/siamese_model.h5
+This Siamese neural network model detects match cuts in video sequences by analyzing the visual similarity between frame pairs using optical flow features.
+## Model Details
+### Model Description
+The model uses a Siamese architecture to compare pairs of video frames and determine if they constitute a match cut - a film editing technique where visually similar frames are used to create a seamless transition between scenes. The model processes optical flow representations of video frames to focus on motion patterns rather than raw pixel values.
+- **Developed by:** samanthajmichael
+- **Model type:** Siamese Neural Network
+- **Language(s):** Not applicable (Computer Vision)
+- **License:** Not specified
+- **Finetuned from model:** EfficientNetB0 (used for initial feature extraction)
+### Model Sources
+- **Repository:** https://github.com/lasyaEd/ml_project
+- **Demo:** Available as a Streamlit application for analyzing YouTube videos
+## Uses
+### Direct Use
+The model can be used to:
+1. Detect match cuts in video sequences
+2. Find visually similar sections within videos
+3. Analyze motion patterns between frame pairs
+4. Support video editing and content analysis tasks
+### Downstream Use
+The model can be integrated into:
+- Video editing software for automated transition detection
+- Content analysis tools for finding visual patterns
+- YouTube video analysis applications (as demonstrated in the provided Streamlit app)
+- Film studies tools for analyzing editing techniques
+### Out-of-Scope Use
+This model is not designed for:
+- Real-time video processing
+- General object detection or recognition
+- Scene classification without motion analysis
+- Processing single frames in isolation
+## Bias, Risks, and Limitations
+- The model's performance depends on the quality of optical flow extraction
+- May be sensitive to video resolution and frame rate
+- Performance may vary based on video content type and editing style
+- Not optimized for real-time processing of high-resolution videos
+### Recommendations
+Users should:
+- Ensure input frames are properly preprocessed to 224x224 resolution
+- Use high-quality video sources for best results
+- Consider the model's confidence scores when making final decisions
+- Validate results in the context of their specific use case
+## How to Get Started with the Model
+```python
+from huggingface_hub import from_pretrained_keras
+import tensorflow as tf
+# Load the model
+model = from_pretrained_keras("samanthajmichael/siamese_model.h5")
+# Preprocess your frame pairs (ensure 224x224 resolution)
+# frames should be normalized to [0,1]
+frame1 = preprocess_frame(frame1)  # Shape: (224, 224, 3)
+frame2 = preprocess_frame(frame2)  # Shape: (224, 224, 3)
+# Get similarity prediction
+prediction = model.predict([np.array([frame1]), np.array([frame2])])
+```
+## Training Details
+### Training Data
+- Training set: 14,264 frame pairs
+- Test set: 3,566 frame pairs
+- Data derived from video frames with optical flow features
+- Labels generated based on visual similarity thresholds
+### Training Procedure
+#### Training Hyperparameters
+- **Training regime:** fp32
+- Optimizer: Adam
+- Loss function: Binary Cross-Entropy
+- Batch size: 64
+- Early stopping patience: 3
+- Input shape: (224, 224, 3)
+### Model Architecture
+- Base network:
+  - Conv2D (32 filters) + ReLU + MaxPooling2D
+  - Conv2D (64 filters) + ReLU + MaxPooling2D
+  - Conv2D (128 filters) + ReLU + MaxPooling2D
+  - Flatten
+  - Dense (128 units)
+- Similarity computed using absolute difference
+- Final dense layer with sigmoid activation
+## Evaluation
+### Testing Data, Factors & Metrics
+- Evaluation performed on 3,566 frame pairs
+- Balanced dataset of match and non-match pairs
+- Primary metric: Binary classification accuracy
+### Results
+- Test accuracy: 95.60%
+- Test loss: 0.1675
+- Model shows strong performance in distinguishing match cuts from non-matches
+## Environmental Impact
+- Trained on Google Colab
+- Training completed in 4 epochs with early stopping
+- Relatively lightweight model with 12.9M parameters
+## Technical Specifications
+### Compute Infrastructure
+- Training platform: Google Colab
+- GPU requirements: Standard GPU runtime
+- Inference can be performed on CPU for smaller workloads
+### Model Architecture and Objective
+Total parameters: 12,938,561 (49.36 MB)
+- All parameters are trainable
+- Model objective: Binary classification of frame pair similarity
+## Model Card Contact
+For questions about the model, please contact samanthajmichael through GitHub or Hugging Face.
+---
 language:
 - en
 tags: