leonvanbokhorst
/

topic-drift-detector

@@ -24,10 +24,10 @@ model-index:
     metrics:
       - name: Test RMSE
         type: rmse
-        value: 0.0139
       - name: Test R²
         type: r2
-        value: 0.8766
       - name: Test Loss
         type: loss
         value: 0.0002
@@ -35,35 +35,83 @@ model-index:
 # Topic Drift Detector Model
-## Version: v20241225_162244
-This model detects topic drift in conversations using an enhanced attention-based architecture. Trained on the [leonvanbokhorst/topic-drift-v2](https://huggingface.co/datasets/leonvanbokhorst/topic-drift-v2) dataset.
 ## Model Architecture
-- Multi-head attention mechanism (4 heads)
-- Bidirectional LSTM (3 layers) for pattern detection
-- Dynamic weight generation
-- Semantic bridge detection
-- Hidden dimension: 512
-- Dropout rate: 0.2
 ## Performance Metrics
 ```txt
 === Full Training Results ===
-Best Validation RMSE: 0.0133
-Best Validation R²: 0.8873
 === Test Set Results ===
 Loss: 0.0002
-RMSE: 0.0139
-R²: 0.8766
 ```
-## Training Curves
-![Training Curves](plots/v20241225_162244/training_curves.png)
-## Usage
 ```python
 import torch
 from transformers import AutoModel, AutoTokenizer
@@ -73,7 +121,7 @@ base_model = AutoModel.from_pretrained('BAAI/bge-m3')
 tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-m3')
 # Load topic drift detector
-model = torch.load('models/v20241225_162244/topic_drift_model.pt')
 model.eval()
 # Prepare conversation window (8 turns)
@@ -103,18 +151,22 @@ print(f"Topic drift score: {drift_scores.item():.4f}")
 # Higher scores indicate more topic drift
 ```
-## Training Details
-- Dataset: [leonvanbokhorst/topic-drift-v2](https://huggingface.co/datasets/leonvanbokhorst/topic-drift-v2)
-- Window size: 8 turns
-- Batch size: 32
-- Learning rate: 0.0001
-- Early stopping patience: 10
-- Total epochs: 70 (early stopped)
-- Training framework: PyTorch
-- Base embeddings: BAAI/bge-m3
 ## Limitations
 - Works best with English conversations
 - Requires exactly 8 turns of conversation
 - Each turn should be between 1-512 tokens
 - Relies on BAAI/bge-m3 embeddings

     metrics:
       - name: Test RMSE
         type: rmse
+        value: 0.0144
       - name: Test R²
         type: r2
+        value: 0.8666
       - name: Test Loss
         type: loss
         value: 0.0002
 # Topic Drift Detector Model
+## Version: v20241225_184257
+This model detects topic drift in conversations using an enhanced hierarchical attention-based architecture. Trained on the [leonvanbokhorst/topic-drift-v2](https://huggingface.co/datasets/leonvanbokhorst/topic-drift-v2) dataset.
 ## Model Architecture
+- Multi-head attention mechanism (4 heads, head dimension 128)
+- Hierarchical pattern detection with multi-scale analysis
+- Explicit transition point detection with linguistic markers
+- Pattern-aware self-attention mechanism
+- Dynamic window augmentation
+- Contrastive learning with pattern-aware sampling
+- Adversarial training with pattern-aware perturbations
+### Key Components:
+1. **Embedding Processor**:
+   - Input dimension: 1024
+   - Hidden dimension: 512
+   - Dropout rate: 0.35
+   - PreNorm layers with residual connections
+2. **Attention Blocks**:
+   - 3 layers of attention
+   - 4 attention heads
+   - Feed-forward dimension: 2048
+   - Learned position encodings
+3. **Pattern Detection**:
+   - Hierarchical LSTM layers
+   - Bidirectional processing
+   - Multi-scale pattern analysis
+   - Pattern classification with 7 types
+4. **Transition Detection**:
+   - Linguistic marker attention
+   - Explicit transition scoring
+   - Marker-based context integration
 ## Performance Metrics
 ```txt
 === Full Training Results ===
+Best Validation RMSE: 0.0142
+Best Validation R²: 0.8711
 === Test Set Results ===
 Loss: 0.0002
+RMSE: 0.0144
+R²: 0.8666
 ```
+## Training Details
+- Dataset: 6400 conversations (5120 train, 640 val, 640 test)
+- Window size: 8 turns
+- Batch size: 32
+- Learning rate: 0.0001 with cosine decay
+- Warmup steps: 100
+- Early stopping patience: 15
+- Max gradient norm: 1.0
+- Mixed precision training (AMP)
+- Base embeddings: BAAI/bge-m3
+### Training Enhancements:
+1. **Dynamic Window Augmentation**:
+   - Adaptive window sizes
+   - Interpolation-based resizing
+   - Maintains temporal consistency
+2. **Contrastive Learning**:
+   - Pattern-aware positive/negative sampling
+   - Temperature-scaled similarities
+   - Weighted combination of embeddings
+3. **Adversarial Training**:
+   - Pattern-aware perturbations
+   - Self-distillation loss
+   - Epsilon ball projection
+## Usage Example
 ```python
 import torch
 from transformers import AutoModel, AutoTokenizer
 tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-m3')
 # Load topic drift detector
+model = torch.load('models/v20241225_184257/topic_drift_model.pt')
 model.eval()
 # Prepare conversation window (8 turns)
 # Higher scores indicate more topic drift
 ```
+## Pattern Types
+The model detects 7 distinct pattern types:
+1. "maintain" - No significant drift
+2. "gentle_wave" - Subtle topic evolution
+3. "single_peak" - One clear transition
+4. "multi_peak" - Multiple transitions
+5. "ascending" - Gradually increasing drift
+6. "descending" - Gradually decreasing drift
+7. "abrupt" - Sudden topic change
 ## Limitations
 - Works best with English conversations
 - Requires exactly 8 turns of conversation
 - Each turn should be between 1-512 tokens
 - Relies on BAAI/bge-m3 embeddings
+- May be sensitive to conversation style variations
+## Training Curves
+![Training Curves](plots/v20241225_184257/training_curves.png)