MemoryBERT / README.md

Update README.md

425f80a verified 4 months ago

4.55 kB

	---
	pipeline_tag: text-classification
	tags:
	- memory
	- text-classification
	- roberta
	- cognitive-nlp
	- noetiv
	license: mit
	library_name: transformers
	language:
	- en
	metrics:
	- accuracy
	---

	### 🧠 About NOETIV

	This project is part of the NOETIV initiative — a modular AI platform for healthcare proffesionals.
	🔗 Visit us at [noetiv.com](https://www.noetiv.com)

	# 🧠 MemoryBERT

	A RoBERTa-based transformer model for Cognitive Memory Recognition (CMR) – classifying natural language into six memory categories inspired by cognitive science.

	---

	## 🧭 Overview

	MemoryBERT is fine-tuned to classify user-generated text into:
	- Episodic memory
	- Semantic memory
	- Spatial memory
	- Emotional memory
	- Associative memory
	- Non-memory

	This model supports research into memory-type classification, schema formation, and personalized AI interaction systems.

	## 🧪 Model Details

	- Base model: `roberta-base`
	- Task: Multi-class sequence classification
	- Classes: 6
	- Max sequence length: 128 tokens
	- Training epochs: 1.5
	- Label smoothing: 0.1
	- Loss function: CrossEntropyLoss
	- Optimizer: AdamW
	- Batch size: 8

	---

	## 📊 Evaluation Results

	On a synthetic 400-example test set balanced across classes:

	\| Class \| Precision \| Recall \| F1-score \| Support \|
	\|---------------\|-----------\|--------\|----------\|---------\|
	\| Associative \| 1.00 \| 1.00 \| 1.00 \| 39 \|
	\| Emotional \| 1.00 \| 1.00 \| 1.00 \| 40 \|
	\| Episodic \| 1.00 \| 1.00 \| 1.00 \| 39 \|
	\| Non-memory \| 1.00 \| 1.00 \| 1.00 \| 200 \|
	\| Semantic \| 1.00 \| 1.00 \| 1.00 \| 40 \|
	\| Spatial \| 1.00 \| 1.00 \| 1.00 \| 42 \|

	- Macro F1: 1.00
	- Eval loss: 0.423
	- Epochs: 1.5
	- Accuracy: 100%

	> ⚠️ Note: These results are from a synthetic dataset — further real-world validation is ongoing and expansion of baseline dataset used for version 1 of memoryBERT

	---

	## 🧠 Dataset

	MemoryBERT was trained on a synthetic dataset of 4,000 curated examples (2,000 memory and 2,000 non-memory)

	Each entry is labeled with one of six memory types and tagged by domain and span group.

	---

	## 🚀 Usage

	```python
	from transformers import RobertaTokenizer, RobertaForSequenceClassification

	model = RobertaForSequenceClassification.from_pretrained("DimitriosPanagoulias/MemoryBERT")
	tokenizer = RobertaTokenizer.from_pretrained("DimitriosPanagoulias/MemoryBERT")

	def predict_memory_type(text):
	inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
	outputs = model(**inputs)
	predicted_id = outputs.logits.argmax(dim=-1).item()
	return model.config.id2label[predicted_id]

	predict_memory_type("Without a map, I navigated the winding back roads to reach my childhood home.")
	```
	or via huggingface pipeline
	```python
	# Use a pipeline as a high-level helper
	from transformers import pipeline
	import torch
	device = 0 if torch.cuda.is_available() else -1 # 0 = GPU, -1 = CPU
	pipe = pipeline("text-classification", model="DimitriosPanagoulias/MemoryBERT", device=device)
	pipe("I remember the long walk to my childhood school.")
	```
	outputs:
	```bash
	[{'label': 'episodic', 'score': 0.9272529482841492}]
	```

	## Authors

	- Dimitrios P. Panagoulias, Department of Informatics, University of Piraeus
	- Persephone Papatheodosiou, Sleep Research Unit, Department of Psychiatry, National and Kapodistrian University of Athens
	- Anastasios Bonakis, Second Department of Neurology, National and Kapodistrian University of Athens
	- Dimitris Dikeos, Sleep Research Unit, Department of Psychiatry, National and Kapodistrian University of Athens
	- Maria Virvou, Lab of Software Engineering, Department of Informatics, University of Piraeus
	- George A. Tsihrintzis, Lab of Pattern Recognition and Machine Learning – Multimedia Systems, Department of Informatics, University of Piraeus

	## Citation

	You can cite either one or both of the following previous related work:

	- Panagoulias, D.P. et al. “Memory and Schema in Human–Generative Artificial Intelligence Interactions.”
	2024 IEEE ICTAI Conference (in press)

	Available at: https://ieeexplore.ieee.org/document/10849404

	- Panagoulias, D.P. et al. Mathematical representation of memory and schema for improving human-generative AI interactions.”
	2024 IEEE IISA Conference (in press)

	Available at: https://ieeexplore.ieee.org/document/10786703