saracandu
/

stlenc

Feature Extraction

Generated from Trainer

Model card Files Files and versions

saracandu commited on Dec 21, 2025

Commit

707df35

·

verified ·

1 Parent(s): 89682b5

model card

Files changed (1) hide show

README.md +59 -0

README.md ADDED Viewed

	@@ -0,0 +1,59 @@

+---
+language:
+- en
+library_name: transformers
+tags:
+- stl
+- formal-methods
+- signal-temporal-logic
+- encoder
+- pytorch
+- kernel-methods
+license: mit
+base_model: saracandu/stlenc
+model_type: stl_encoder
+pipeline_tag: feature-extraction
+---
+# STL Encoder (Neural Backbone)
+This repository contains the neural encoder architecture for the **STLEnc** project. The model is designed to map **Signal Temporal Logic (STL)** formulae into a 1024-dimensional latent embedding space.
+## Model Description
+This model is a neural approximation of the kernel-based framework introduced by **Gallo et al.** in [*"A Kernel-Based Approach to Signal Temporal Logic"* (2020)](https://arxiv.org/abs/2009.05484).
+In the original framework, STL formulae are embedded into a Reproducing Kernel Hilbert Space (RKHS) using a recursive kernel that accounts for the syntax and temporal intervals of the logic. Our approach replaces the traditional kernel-based projection with a **Transformer-based encoder**.
+By using a fixed **anchor set** of formulae (as suggested in kernel approximation methods), the Transformer is trained to learn a mapping that mimics the kernel's distance properties. This allows for:
+- **Scalability**: Faster computation compared to recursive kernel evaluations.
+- **Continuity**: A smooth latent space suitable for optimization and deep learning tasks.
+- **Architecture**: Custom Transformer Encoder (12 layers, 16 attention heads).
+- **Tokenizer**: Custom *longest-match* tokenizer optimized for STL symbols, temporal intervals, and numeric predicates.
+- **Output**: 1024-dimensional embeddings via `[CLS]` token pooling.
+## Training Data
+The model is designed to be trained on the [saracandu/stl_formulae](https://huggingface.co/datasets/saracandu/stl_formulae) dataset, which contains a large-scale collection of STL expressions and their corresponding kernel-derived embeddings.
+## Usage
+```python
+from transformers import AutoModel, AutoTokenizer
+import torch
+repo_id = "saracandu/stlenc"
+# Load Tokenizer & Model
+tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
+model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
+# Example: Encode an STL formula
+formula = "always[0, 10] (x > 0) and eventually[5, 20] (y < -1)"
+inputs = tokenizer(formula, return_tensors="pt")
+with torch.no_grad():
+    embedding = model(**inputs)
+print(embedding.shape)  # [1, 1024]