|
|
--- |
|
|
license: apache-2.0 |
|
|
library_name: mlx |
|
|
tags: |
|
|
- mlx |
|
|
- memory-augmented |
|
|
- code-generation |
|
|
- retrieval-augmented |
|
|
- python |
|
|
- code-search |
|
|
pipeline_tag: text-generation |
|
|
datasets: |
|
|
- codeparrot/codeparrot-clean |
|
|
--- |
|
|
|
|
|
# MALM-165M: Memory-Augmented Language Model |
|
|
|
|
|
A 165M parameter Memory-Augmented Language Model (MALM) for semantic code search, trained on CodeParrot. |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
```bash |
|
|
# Install dependencies |
|
|
pip install mlx huggingface_hub numpy |
|
|
|
|
|
# Download model |
|
|
huggingface-cli download codelion/malm-165m --local-dir ./malm-165m |
|
|
|
|
|
# Run semantic search |
|
|
python malm-165m/inference.py --query "function that sorts a list" |
|
|
``` |
|
|
|
|
|
**Example output:** |
|
|
|
|
|
```text |
|
|
Query: function that sorts a list |
|
|
------------------------------------------------------------ |
|
|
|
|
|
1. array_sort (score: 0.9526) |
|
|
Signature: array_sort(col) |
|
|
Docstring: Collection function: sorts the input array in ascending order... |
|
|
|
|
|
2. sort_array (score: 0.7707) |
|
|
Signature: sort_array(col, asc) |
|
|
Docstring: Collection function: sorts the input array in ascending or descending order... |
|
|
``` |
|
|
|
|
|
## Python API |
|
|
|
|
|
```python |
|
|
from huggingface_hub import snapshot_download |
|
|
from pathlib import Path |
|
|
import sys |
|
|
|
|
|
# Download and import |
|
|
model_path = snapshot_download("codelion/malm-165m") |
|
|
sys.path.insert(0, model_path) |
|
|
|
|
|
from inference import load_model, search_functions |
|
|
|
|
|
# Load model |
|
|
model, tokenizer, functions, config = load_model(Path(model_path)) |
|
|
print(f"Loaded {len(functions)} functions") |
|
|
|
|
|
# Search |
|
|
results = search_functions( |
|
|
model, tokenizer, functions, |
|
|
query="connect to database", |
|
|
top_k=5 |
|
|
) |
|
|
|
|
|
for name, signature, docstring, score in results: |
|
|
print(f"{name}: {score:.4f}") |
|
|
``` |
|
|
|
|
|
## Model Description |
|
|
|
|
|
MALM combines a transformer with learned memory retrieval for semantic code search: |
|
|
|
|
|
1. **Query encoder** - Encodes natural language queries into embeddings |
|
|
2. **Value encoder** - Encodes function signatures/docstrings |
|
|
3. **Retrieval** - Attention-based lookup from query to memory |
|
|
4. **Memory bank** - 2000 Python functions from CodeParrot |
|
|
|
|
|
### Why not mlx-lm? |
|
|
|
|
|
MALM uses a **memory-augmented** architecture different from standard LLMs: |
|
|
- Separate query and value encoders for retrieval |
|
|
- Requires a memory bank of functions |
|
|
- Inference is retrieval-based, not autoregressive generation |
|
|
|
|
|
This architecture doesn't fit `mlx-lm generate`, so we provide a custom inference script. |
|
|
|
|
|
## Architecture |
|
|
|
|
|
| Component | Parameters | |
|
|
|-----------|------------| |
|
|
| Embedding | 11.1M | |
|
|
| Position Embedding | 0.1M | |
|
|
| Query Encoder (4 layers) | 28.4M | |
|
|
| Value Encoder (4 layers) | 28.4M | |
|
|
| Decoder (12 layers) | 85.1M | |
|
|
| Output Projection | 11.1M | |
|
|
| **Total** | **~165M** | |
|
|
|
|
|
### Configuration |
|
|
|
|
|
```json |
|
|
{ |
|
|
"vocab_size": 14407, |
|
|
"d_model": 768, |
|
|
"n_heads": 12, |
|
|
"n_layers": 12, |
|
|
"n_query_layers": 4, |
|
|
"max_seq_len": 128, |
|
|
"num_parameters": 165123656, |
|
|
"num_functions": 2000 |
|
|
} |
|
|
``` |
|
|
|
|
|
## Files |
|
|
|
|
|
| File | Description | |
|
|
|------|-------------| |
|
|
| `model.npz` | Model weights (MLX-compatible NumPy format) | |
|
|
| `config.json` | Model configuration | |
|
|
| `tokenizer.json` | Tokenizer vocabulary | |
|
|
| `functions.json` | Memory bank of 2000 Python functions | |
|
|
| `inference.py` | Standalone inference script | |
|
|
|
|
|
## Training |
|
|
|
|
|
Trained on CodeParrot with a focus on Python function retrieval: |
|
|
- Encodes natural language queries into embedding space |
|
|
- Learns semantic similarity between queries and function signatures |
|
|
- Uses attention-based retrieval over a memory bank |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{sharma2026malm, |
|
|
title={Reverse Engineering a $500M Mystery: From HashHop to Memory-Augmented Language Models}, |
|
|
author={Sharma, Asankhaya}, |
|
|
year={2026}, |
|
|
url={https://huggingface.co/blog/codelion/reverse-engineering-magic-hashhop} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Related Work |
|
|
|
|
|
Part of the [HashHop](https://github.com/codelion/hash-hop) project exploring long-context evaluation and memory-augmented architectures. |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|