---
library_name: pytorch
tags:
  - time-series
  - anomaly-detection
  - transformer
datasets:
  - MSL
  - SMAP
  - SWaT
  - WADI
pipeline_tag: other
---

# AnomalyBERT

Pre-trained checkpoints for **AnomalyBERT** — a self-supervised Transformer model for time series anomaly detection based on a data degradation scheme.

> Paper: [Self-supervised Transformer for Time Series Anomaly Detection using Data Degradation Scheme](https://arxiv.org/abs/2205.04775)
>
> Original code: [Jhryu30/AnomalyBERT](https://github.com/Jhryu30/AnomalyBERT)

## Model Architecture

AnomalyBERT uses a Transformer encoder architecture with:

- Linear patch embedding
- Relative position embedding
- Pre-norm encoder layers (LayerNorm → Attention/FFN)
- MLP head for reconstruction

The model learns normal patterns via masked data degradation during training, and detects anomalies by measuring reconstruction error at inference time.

## Checkpoints

Each dataset directory contains `config.json` (hyperparameters) and `model.safetensors` (weights).

| Dataset | input_d_data | patch_size | d_embed | n_layer | n_head | max_seq_len | Parameters |
|---------|-------------|------------|---------|---------|--------|-------------|------------|
| MSL     | 55          | 2          | 512     | 6       | 8      | 512         | ~19M       |
| SMAP    | 25          | 4          | 512     | 6       | 8      | 512         | ~19M       |
| SWaT    | 50          | 14         | 512     | 6       | 8      | 512         | ~19M       |
| WADI    | 122         | 8          | 512     | 6       | 8      | 512         | ~19M       |

## Usage

```python
import json
from pathlib import Path

import torch
from safetensors.torch import load_file

from models.anomaly_transformer import get_anomaly_transformer


def load_model(dataset_dir: str) -> torch.nn.Module:
    """Load an AnomalyBERT model from config + safetensors."""
    dataset_path = Path(dataset_dir)

    with open(dataset_path / 'config.json') as f:
        config = json.load(f)

    model = get_anomaly_transformer(
        input_d_data=config['input_d_data'],
        output_d_data=config['output_d_data'],
        patch_size=config['patch_size'],
        d_embed=config['d_embed'],
        hidden_dim_rate=config['hidden_dim_rate'],
        max_seq_len=config['max_seq_len'],
        positional_encoding=config['positional_encoding'],
        relative_position_embedding=config['relative_position_embedding'],
        transformer_n_layer=config['transformer_n_layer'],
        transformer_n_head=config['transformer_n_head'],
        dropout=config['dropout'],
    )

    state_dict = load_file(str(dataset_path / 'model.safetensors'))
    model.load_state_dict(state_dict)
    model.eval()
    return model


# Example: load the MSL model
model = load_model('MSL')

# Inference
# x shape: (batch, patch_size * max_seq_len, input_d_data)
x = torch.randn(1, 1024, 55)
with torch.no_grad():
    output = model(x)
# output shape: (batch, patch_size * max_seq_len, output_d_data)
```

## File Structure

```
├── MSL/
│   ├── config.json
│   └── model.safetensors
├── SMAP/
│   ├── config.json
│   └── model.safetensors
├── SWaT/
│   ├── config.json
│   └── model.safetensors
├── WADI/
│   ├── config.json
│   └── model.safetensors
├── convert_to_hf.py        # Conversion script (.pt -> safetensors)
├── inspect_pt.py            # Checkpoint inspection script
└── verify_conversion.py     # Conversion verification script
```

## Citation

```bibtex
@article{jeong2023anomalybert,
  title={AnomalyBERT: Self-Supervised Transformer for Time Series Anomaly Detection using Data Degradation Scheme},
  author={Jeong, Yungi and Yang, Eunseok and Ryu, Jung Hyun and Park, Imseong and Kang, Myungjoo},
  journal={arXiv preprint arXiv:2305.04468},
  year={2023}
}
```