File size: 3,894 Bytes
54aa220 0fdce99 54aa220 9880607 54aa220 32559db 54aa220 9880607 54aa220 417fffd 54aa220 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
---
license: apache-2.0
library_name: mlx
tags:
- mlx
- memory-augmented
- code-generation
- retrieval-augmented
- python
- code-search
pipeline_tag: text-generation
datasets:
- codeparrot/codeparrot-clean
---
# MALM-165M: Memory-Augmented Language Model
A 165M parameter Memory-Augmented Language Model (MALM) for semantic code search, trained on CodeParrot.
## Quick Start
```bash
# Install dependencies
pip install mlx huggingface_hub numpy
# Download model
huggingface-cli download codelion/malm-165m --local-dir ./malm-165m
# Run semantic search
python malm-165m/inference.py --query "function that sorts a list"
```
**Example output:**
```text
Query: function that sorts a list
------------------------------------------------------------
1. array_sort (score: 0.9526)
Signature: array_sort(col)
Docstring: Collection function: sorts the input array in ascending order...
2. sort_array (score: 0.7707)
Signature: sort_array(col, asc)
Docstring: Collection function: sorts the input array in ascending or descending order...
```
## Python API
```python
from huggingface_hub import snapshot_download
from pathlib import Path
import sys
# Download and import
model_path = snapshot_download("codelion/malm-165m")
sys.path.insert(0, model_path)
from inference import load_model, search_functions
# Load model
model, tokenizer, functions, config = load_model(Path(model_path))
print(f"Loaded {len(functions)} functions")
# Search
results = search_functions(
model, tokenizer, functions,
query="connect to database",
top_k=5
)
for name, signature, docstring, score in results:
print(f"{name}: {score:.4f}")
```
## Model Description
MALM combines a transformer with learned memory retrieval for semantic code search:
1. **Query encoder** - Encodes natural language queries into embeddings
2. **Value encoder** - Encodes function signatures/docstrings
3. **Retrieval** - Attention-based lookup from query to memory
4. **Memory bank** - 2000 Python functions from CodeParrot
### Why not mlx-lm?
MALM uses a **memory-augmented** architecture different from standard LLMs:
- Separate query and value encoders for retrieval
- Requires a memory bank of functions
- Inference is retrieval-based, not autoregressive generation
This architecture doesn't fit `mlx-lm generate`, so we provide a custom inference script.
## Architecture
| Component | Parameters |
|-----------|------------|
| Embedding | 11.1M |
| Position Embedding | 0.1M |
| Query Encoder (4 layers) | 28.4M |
| Value Encoder (4 layers) | 28.4M |
| Decoder (12 layers) | 85.1M |
| Output Projection | 11.1M |
| **Total** | **~165M** |
### Configuration
```json
{
"vocab_size": 14407,
"d_model": 768,
"n_heads": 12,
"n_layers": 12,
"n_query_layers": 4,
"max_seq_len": 128,
"num_parameters": 165123656,
"num_functions": 2000
}
```
## Files
| File | Description |
|------|-------------|
| `model.npz` | Model weights (MLX-compatible NumPy format) |
| `config.json` | Model configuration |
| `tokenizer.json` | Tokenizer vocabulary |
| `functions.json` | Memory bank of 2000 Python functions |
| `inference.py` | Standalone inference script |
## Training
Trained on CodeParrot with a focus on Python function retrieval:
- Encodes natural language queries into embedding space
- Learns semantic similarity between queries and function signatures
- Uses attention-based retrieval over a memory bank
## Citation
```bibtex
@article{sharma2026malm,
title={Reverse Engineering a $500M Mystery: From HashHop to Memory-Augmented Language Models},
author={Sharma, Asankhaya},
year={2026},
url={https://huggingface.co/blog/codelion/reverse-engineering-magic-hashhop}
}
```
## Related Work
Part of the [HashHop](https://github.com/codelion/hash-hop) project exploring long-context evaluation and memory-augmented architectures.
## License
Apache 2.0
|