wassemgtk commited on
Commit
4452d64
·
verified ·
1 Parent(s): bbba84f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -3
README.md CHANGED
@@ -1,3 +1,83 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ # JEPA-Style LLM Prototypes
5
+
6
+ Making decoder-only transformers predict state consequences instead of tokens.
7
+
8
+ ## What's This?
9
+
10
+ Three approaches to convert a standard LLM into a world model that predicts "what happens next" given a state and action — like JEPA but for language models.
11
+
12
+ ## Files
13
+
14
+ | File | Description | GPU Time |
15
+ |------|-------------|----------|
16
+ | `jepa_llm_prototypes.ipynb` | **All three options in one notebook** — best for comparing | ~30 min |
17
+ | `jepa_option1_sentence_encoder.ipynb` | Simplest approach using pre-trained sentence embeddings | ~10 min |
18
+ | `jepa_option2_llm_hidden_states.ipynb` | Uses GPT-2 hidden states as state space | ~15 min |
19
+
20
+ ## Quick Start
21
+
22
+ 1. Open any notebook in [Google Colab](https://colab.research.google.com/)
23
+ 2. Set runtime to **GPU** (Runtime → Change runtime type → H100)
24
+ 3. Run all cells
25
+ 4. Watch the model learn to predict state transitions
26
+
27
+ ## The Core Idea
28
+
29
+ ```
30
+ Normal LLM: tokens → transformer → next token
31
+ JEPA-style: (state, action) → transformer → next state embedding
32
+ ```
33
+
34
+ Instead of predicting words, the model predicts what the world looks like after an action.
35
+
36
+ ## Three Approaches
37
+
38
+ **Option 1: Sentence Encoder** (Simplest)
39
+ - Uses `all-MiniLM-L6-v2` for embeddings
40
+ - Trains only a small predictor network
41
+ - Best for: quick testing, limited GPU
42
+
43
+ **Option 2: LLM Hidden States** (Medium)
44
+ - Uses GPT-2's internal representations
45
+ - Trains projection + predictor heads
46
+ - Best for: better accuracy, still fast
47
+
48
+ **Option 3: Autoencoder** (Most Powerful)
49
+ - Learns domain-specific state embeddings
50
+ - Trains encoder + decoder + predictor
51
+ - Best for: production, domain adaptation
52
+
53
+ ## Example
54
+
55
+ ```python
56
+ # Input
57
+ state = "Document is in draft status with 2 sections"
58
+ action = "User submits for review"
59
+
60
+ # Model predicts
61
+ next_state = "Document is pending review" # via embedding similarity
62
+ ```
63
+
64
+ ## Requirements
65
+
66
+ - Python 3.8+
67
+ - PyTorch
68
+ - Transformers
69
+ - Sentence-Transformers (Option 1)
70
+ - GPU recommended (runs on CPU but slow)
71
+
72
+ All dependencies install automatically in the notebooks.
73
+
74
+ ## Next Steps
75
+
76
+ - Swap synthetic data for real enterprise workflow logs
77
+ - Scale up base model (Llama, Mistral)
78
+ - Add multi-step trajectory prediction
79
+ - Integrate with planning/search algorithms
80
+
81
+ ---
82
+
83
+ *Experimental code — have fun breaking it.*