File size: 3,894 Bytes
54aa220
 
 
 
 
 
 
 
 
 
 
0fdce99
 
54aa220
 
 
 
 
 
 
 
 
 
 
 
 
9880607
54aa220
 
 
 
 
 
32559db
 
54aa220
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9880607
54aa220
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
417fffd
 
 
 
 
 
 
 
 
 
 
54aa220
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
---
license: apache-2.0
library_name: mlx
tags:
  - mlx
  - memory-augmented
  - code-generation
  - retrieval-augmented
  - python
  - code-search
pipeline_tag: text-generation
datasets:
  - codeparrot/codeparrot-clean
---

# MALM-165M: Memory-Augmented Language Model

A 165M parameter Memory-Augmented Language Model (MALM) for semantic code search, trained on CodeParrot.

## Quick Start

```bash
# Install dependencies
pip install mlx huggingface_hub numpy

# Download model
huggingface-cli download codelion/malm-165m --local-dir ./malm-165m

# Run semantic search
python malm-165m/inference.py --query "function that sorts a list"
```

**Example output:**

```text
Query: function that sorts a list
------------------------------------------------------------

1. array_sort (score: 0.9526)
   Signature: array_sort(col)
   Docstring: Collection function: sorts the input array in ascending order...

2. sort_array (score: 0.7707)
   Signature: sort_array(col, asc)
   Docstring: Collection function: sorts the input array in ascending or descending order...
```

## Python API

```python
from huggingface_hub import snapshot_download
from pathlib import Path
import sys

# Download and import
model_path = snapshot_download("codelion/malm-165m")
sys.path.insert(0, model_path)

from inference import load_model, search_functions

# Load model
model, tokenizer, functions, config = load_model(Path(model_path))
print(f"Loaded {len(functions)} functions")

# Search
results = search_functions(
    model, tokenizer, functions,
    query="connect to database",
    top_k=5
)

for name, signature, docstring, score in results:
    print(f"{name}: {score:.4f}")
```

## Model Description

MALM combines a transformer with learned memory retrieval for semantic code search:

1. **Query encoder** - Encodes natural language queries into embeddings
2. **Value encoder** - Encodes function signatures/docstrings
3. **Retrieval** - Attention-based lookup from query to memory
4. **Memory bank** - 2000 Python functions from CodeParrot

### Why not mlx-lm?

MALM uses a **memory-augmented** architecture different from standard LLMs:
- Separate query and value encoders for retrieval
- Requires a memory bank of functions
- Inference is retrieval-based, not autoregressive generation

This architecture doesn't fit `mlx-lm generate`, so we provide a custom inference script.

## Architecture

| Component | Parameters |
|-----------|------------|
| Embedding | 11.1M |
| Position Embedding | 0.1M |
| Query Encoder (4 layers) | 28.4M |
| Value Encoder (4 layers) | 28.4M |
| Decoder (12 layers) | 85.1M |
| Output Projection | 11.1M |
| **Total** | **~165M** |

### Configuration

```json
{
  "vocab_size": 14407,
  "d_model": 768,
  "n_heads": 12,
  "n_layers": 12,
  "n_query_layers": 4,
  "max_seq_len": 128,
  "num_parameters": 165123656,
  "num_functions": 2000
}
```

## Files

| File | Description |
|------|-------------|
| `model.npz` | Model weights (MLX-compatible NumPy format) |
| `config.json` | Model configuration |
| `tokenizer.json` | Tokenizer vocabulary |
| `functions.json` | Memory bank of 2000 Python functions |
| `inference.py` | Standalone inference script |

## Training

Trained on CodeParrot with a focus on Python function retrieval:
- Encodes natural language queries into embedding space
- Learns semantic similarity between queries and function signatures
- Uses attention-based retrieval over a memory bank

## Citation

```bibtex
@article{sharma2026malm,
  title={Reverse Engineering a $500M Mystery: From HashHop to Memory-Augmented Language Models},
  author={Sharma, Asankhaya},
  year={2026},
  url={https://huggingface.co/blog/codelion/reverse-engineering-magic-hashhop}
}
```

## Related Work

Part of the [HashHop](https://github.com/codelion/hash-hop) project exploring long-context evaluation and memory-augmented architectures.

## License

Apache 2.0