|
|
--- |
|
|
pipeline_tag: text-classification |
|
|
tags: |
|
|
- memory |
|
|
- text-classification |
|
|
- roberta |
|
|
- cognitive-nlp |
|
|
- noetiv |
|
|
license: mit |
|
|
library_name: transformers |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- accuracy |
|
|
--- |
|
|
|
|
|
### 🧠 About NOETIV |
|
|
|
|
|
This project is part of the **NOETIV** initiative — a modular AI platform for healthcare proffesionals. |
|
|
🔗 Visit us at [noetiv.com](https://www.noetiv.com) |
|
|
|
|
|
# 🧠 MemoryBERT |
|
|
|
|
|
A RoBERTa-based transformer model for **Cognitive Memory Recognition (CMR)** – classifying natural language into six memory categories inspired by cognitive science. |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧭 Overview |
|
|
|
|
|
MemoryBERT is fine-tuned to classify user-generated text into: |
|
|
- **Episodic memory** |
|
|
- **Semantic memory** |
|
|
- **Spatial memory** |
|
|
- **Emotional memory** |
|
|
- **Associative memory** |
|
|
- **Non-memory** |
|
|
|
|
|
This model supports research into memory-type classification, schema formation, and personalized AI interaction systems. |
|
|
|
|
|
## 🧪 Model Details |
|
|
|
|
|
- **Base model**: `roberta-base` |
|
|
- **Task**: Multi-class sequence classification |
|
|
- **Classes**: 6 |
|
|
- **Max sequence length**: 128 tokens |
|
|
- **Training epochs**: 1.5 |
|
|
- **Label smoothing**: 0.1 |
|
|
- **Loss function**: CrossEntropyLoss |
|
|
- **Optimizer**: AdamW |
|
|
- **Batch size**: 8 |
|
|
|
|
|
--- |
|
|
|
|
|
## 📊 Evaluation Results |
|
|
|
|
|
On a synthetic 400-example test set balanced across classes: |
|
|
|
|
|
| Class | Precision | Recall | F1-score | Support | |
|
|
|---------------|-----------|--------|----------|---------| |
|
|
| Associative | 1.00 | 1.00 | 1.00 | 39 | |
|
|
| Emotional | 1.00 | 1.00 | 1.00 | 40 | |
|
|
| Episodic | 1.00 | 1.00 | 1.00 | 39 | |
|
|
| Non-memory | 1.00 | 1.00 | 1.00 | 200 | |
|
|
| Semantic | 1.00 | 1.00 | 1.00 | 40 | |
|
|
| Spatial | 1.00 | 1.00 | 1.00 | 42 | |
|
|
|
|
|
- **Macro F1**: 1.00 |
|
|
- **Eval loss**: 0.423 |
|
|
- **Epochs**: 1.5 |
|
|
- **Accuracy**: 100% |
|
|
|
|
|
> ⚠️ Note: These results are from a synthetic dataset — further real-world validation is ongoing and expansion of baseline dataset used for version 1 of memoryBERT |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧠 Dataset |
|
|
|
|
|
MemoryBERT was trained on a synthetic dataset of 4,000 curated examples (2,000 memory and 2,000 non-memory) |
|
|
|
|
|
Each entry is labeled with one of six memory types and tagged by domain and span group. |
|
|
|
|
|
--- |
|
|
|
|
|
## 🚀 Usage |
|
|
|
|
|
```python |
|
|
from transformers import RobertaTokenizer, RobertaForSequenceClassification |
|
|
|
|
|
model = RobertaForSequenceClassification.from_pretrained("DimitriosPanagoulias/MemoryBERT") |
|
|
tokenizer = RobertaTokenizer.from_pretrained("DimitriosPanagoulias/MemoryBERT") |
|
|
|
|
|
def predict_memory_type(text): |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128) |
|
|
outputs = model(**inputs) |
|
|
predicted_id = outputs.logits.argmax(dim=-1).item() |
|
|
return model.config.id2label[predicted_id] |
|
|
|
|
|
predict_memory_type("Without a map, I navigated the winding back roads to reach my childhood home.") |
|
|
``` |
|
|
or via huggingface pipeline |
|
|
```python |
|
|
# Use a pipeline as a high-level helper |
|
|
from transformers import pipeline |
|
|
import torch |
|
|
device = 0 if torch.cuda.is_available() else -1 # 0 = GPU, -1 = CPU |
|
|
pipe = pipeline("text-classification", model="DimitriosPanagoulias/MemoryBERT", device=device) |
|
|
pipe("I remember the long walk to my childhood school.") |
|
|
``` |
|
|
outputs: |
|
|
```bash |
|
|
[{'label': 'episodic', 'score': 0.9272529482841492}] |
|
|
``` |
|
|
|
|
|
## Authors |
|
|
|
|
|
- **Dimitrios P. Panagoulias**, Department of Informatics, University of Piraeus |
|
|
- **Persephone Papatheodosiou**, Sleep Research Unit, Department of Psychiatry, National and Kapodistrian University of Athens |
|
|
- **Anastasios Bonakis**, Second Department of Neurology, National and Kapodistrian University of Athens |
|
|
- **Dimitris Dikeos**, Sleep Research Unit, Department of Psychiatry, National and Kapodistrian University of Athens |
|
|
- **Maria Virvou**, Lab of Software Engineering, Department of Informatics, University of Piraeus |
|
|
- **George A. Tsihrintzis**, Lab of Pattern Recognition and Machine Learning – Multimedia Systems, Department of Informatics, University of Piraeus |
|
|
|
|
|
## Citation |
|
|
|
|
|
You can cite either one or both of the following previous related work: |
|
|
|
|
|
- Panagoulias, D.P. et al. “Memory and Schema in Human–Generative Artificial Intelligence Interactions.” |
|
|
2024 IEEE ICTAI Conference (in press) |
|
|
|
|
|
Available at: https://ieeexplore.ieee.org/document/10849404 |
|
|
|
|
|
- Panagoulias, D.P. et al. Mathematical representation of memory and schema for improving human-generative AI interactions.” |
|
|
2024 IEEE IISA Conference (in press) |
|
|
|
|
|
Available at: https://ieeexplore.ieee.org/document/10786703 |