evelabs
/

morphism

@@ -34,6 +34,8 @@ The model's predicted embeddings capture 62 dimensions of meaningful variance (a
 You will need a 16-channel [OpenBCI Cyton](https://shop.openbci.com/products/the-complete-headset-eeg?variant=44401726128368) device. The model may be compatible with other EEG devices; the autoencoder which forms the front-end of the model performs some work which should enhance cross-device capability, but the pipeline remains untested on non-OpenBCI devices.
 Download the model weights, and this repository. Then create a retrieval index:
 ```bash
@@ -56,10 +58,19 @@ python3 morphism.py decode -f session.eeg \
 All subcommands support `--help` for full option details.
-You will see a real-time retrieval of documents from your index that the model thinks are relevant to you. Due to the fact that this model is a research preview, the model's output may be somewhat noisy. Relevance tends to vary between individuals with the model performing more strongly on people who are more similar to the patterns represented in the training data.
 ## How was this created?
 This model works in two stages. The first stage is an autoencoder which represents the neural data in a latent space. The second stage is a semantic mapper, which guesses a semantic vector from the neural vector. This relatively simple architecture is surprisingly effective and lays the groundwork for future developments of this technology.
 The underlying dataset represents a large collection of paired neural measurements and text stimuli, collected by Eve Labs over a period of 20 months on approximately forty subjects. Training data was gathered naturalistically; subjects chatted with LLMs while wearing the headset, with the conversation text serving as paired stimuli.
@@ -70,6 +81,18 @@ We believe that people should be able to understand and use the signals their ow
 On a practical level, new models such as this can be evaluated and understood much faster if the weights are released. Given the complexity of the task at hand, we hope that people can report their experiences and help advance the field.
 ## How can I get involved?
 Join the [Discord](TODO), follow [@evelabsai](https://twitter.com/evelabsai) on Twitter, or send over an email to [hello@evelabs.info](mailto:hello@evelabs.info).

 You will need a 16-channel [OpenBCI Cyton](https://shop.openbci.com/products/the-complete-headset-eeg?variant=44401726128368) device. The model may be compatible with other EEG devices; the autoencoder which forms the front-end of the model performs some work which should enhance cross-device capability, but the pipeline remains untested on non-OpenBCI devices.
+**Dependencies**: `pip install torch numpy faiss-cpu hnswlib transformers tqdm pyserial paramiko`
 Download the model weights, and this repository. Then create a retrieval index:
 ```bash
 All subcommands support `--help` for full option details.
+You will see a real-time retrieval of documents from your index that the model thinks are relevant to you. Due to the fact that this model is a research preview, the model's output may be somewhat noisy. Relevance tends to vary between individuals with the model performing more strongly on people who are more similar to the patterns represented in the training data. The retrieved text is associative, not transcriptive — the model tracks coarse cognitive state rather than sentence-level content, surfacing thematically related passages from your corpus.
 ## How was this created?
+```
+raw EEG (16ch, 1kHz)
+  → windowed segments (~0.6s)
+    → convolutional autoencoder (64-dim bottleneck)
+      → semantic VAE (→ 1024-dim text embedding space)
+        → FAISS nearest-neighbor search
+          → matching text from your corpus
+```
 This model works in two stages. The first stage is an autoencoder which represents the neural data in a latent space. The second stage is a semantic mapper, which guesses a semantic vector from the neural vector. This relatively simple architecture is surprisingly effective and lays the groundwork for future developments of this technology.
 The underlying dataset represents a large collection of paired neural measurements and text stimuli, collected by Eve Labs over a period of 20 months on approximately forty subjects. Training data was gathered naturalistically; subjects chatted with LLMs while wearing the headset, with the conversation text serving as paired stimuli.
 On a practical level, new models such as this can be evaluated and understood much faster if the weights are released. Given the complexity of the task at hand, we hope that people can report their experiences and help advance the field.
+## Project structure
+```
+morphism
+├── morphism.py     # CLI entrypoint
+├── cyton.py        # OpenBCI Cyton+Daisy recording
+├── embed.py        # Text embedding + SQLite storage
+├── decode.py       # EEG → semantic embedding → FAISS search
+├── eegembed.py     # EEG autoencoder streaming
+└── README.md
+```
 ## How can I get involved?
 Join the [Discord](TODO), follow [@evelabsai](https://twitter.com/evelabsai) on Twitter, or send over an email to [hello@evelabs.info](mailto:hello@evelabs.info).