saracandu commited on
Commit
707df35
·
verified ·
1 Parent(s): 89682b5

model card

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ library_name: transformers
5
+ tags:
6
+ - stl
7
+ - formal-methods
8
+ - signal-temporal-logic
9
+ - encoder
10
+ - pytorch
11
+ - kernel-methods
12
+ license: mit
13
+ base_model: saracandu/stlenc
14
+ model_type: stl_encoder
15
+ pipeline_tag: feature-extraction
16
+ ---
17
+
18
+ # STL Encoder (Neural Backbone)
19
+
20
+ This repository contains the neural encoder architecture for the **STLEnc** project. The model is designed to map **Signal Temporal Logic (STL)** formulae into a 1024-dimensional latent embedding space.
21
+
22
+ ## Model Description
23
+
24
+ This model is a neural approximation of the kernel-based framework introduced by **Gallo et al.** in [*"A Kernel-Based Approach to Signal Temporal Logic"* (2020)](https://arxiv.org/abs/2009.05484).
25
+
26
+ In the original framework, STL formulae are embedded into a Reproducing Kernel Hilbert Space (RKHS) using a recursive kernel that accounts for the syntax and temporal intervals of the logic. Our approach replaces the traditional kernel-based projection with a **Transformer-based encoder**.
27
+
28
+ By using a fixed **anchor set** of formulae (as suggested in kernel approximation methods), the Transformer is trained to learn a mapping that mimics the kernel's distance properties. This allows for:
29
+ - **Scalability**: Faster computation compared to recursive kernel evaluations.
30
+ - **Continuity**: A smooth latent space suitable for optimization and deep learning tasks.
31
+
32
+ - **Architecture**: Custom Transformer Encoder (12 layers, 16 attention heads).
33
+ - **Tokenizer**: Custom *longest-match* tokenizer optimized for STL symbols, temporal intervals, and numeric predicates.
34
+ - **Output**: 1024-dimensional embeddings via `[CLS]` token pooling.
35
+
36
+ ## Training Data
37
+
38
+ The model is designed to be trained on the [saracandu/stl_formulae](https://huggingface.co/datasets/saracandu/stl_formulae) dataset, which contains a large-scale collection of STL expressions and their corresponding kernel-derived embeddings.
39
+
40
+ ## Usage
41
+
42
+ ```python
43
+ from transformers import AutoModel, AutoTokenizer
44
+ import torch
45
+
46
+ repo_id = "saracandu/stlenc"
47
+
48
+ # Load Tokenizer & Model
49
+ tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
50
+ model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
51
+
52
+ # Example: Encode an STL formula
53
+ formula = "always[0, 10] (x > 0) and eventually[5, 20] (y < -1)"
54
+ inputs = tokenizer(formula, return_tensors="pt")
55
+
56
+ with torch.no_grad():
57
+ embedding = model(**inputs)
58
+
59
+ print(embedding.shape) # [1, 1024]