---
license: apache-2.0
library_name: mlx
tags:
  - mlx
  - memory-augmented
  - code-generation
  - retrieval-augmented
  - python
  - code-search
pipeline_tag: text-generation
datasets:
  - codeparrot/codeparrot-clean
---

# MALM-165M: Memory-Augmented Language Model

A 165M parameter Memory-Augmented Language Model (MALM) for semantic code search, trained on CodeParrot.

## Quick Start

```bash
# Install dependencies
pip install mlx huggingface_hub numpy

# Download model
huggingface-cli download codelion/malm-165m --local-dir ./malm-165m

# Run semantic search
python malm-165m/inference.py --query "function that sorts a list"
```

**Example output:**

```text
Query: function that sorts a list
------------------------------------------------------------

1. array_sort (score: 0.9526)
   Signature: array_sort(col)
   Docstring: Collection function: sorts the input array in ascending order...

2. sort_array (score: 0.7707)
   Signature: sort_array(col, asc)
   Docstring: Collection function: sorts the input array in ascending or descending order...
```

## Python API

```python
from huggingface_hub import snapshot_download
from pathlib import Path
import sys

# Download and import
model_path = snapshot_download("codelion/malm-165m")
sys.path.insert(0, model_path)

from inference import load_model, search_functions

# Load model
model, tokenizer, functions, config = load_model(Path(model_path))
print(f"Loaded {len(functions)} functions")

# Search
results = search_functions(
    model, tokenizer, functions,
    query="connect to database",
    top_k=5
)

for name, signature, docstring, score in results:
    print(f"{name}: {score:.4f}")
```

## Model Description

MALM combines a transformer with learned memory retrieval for semantic code search:

1. **Query encoder** - Encodes natural language queries into embeddings
2. **Value encoder** - Encodes function signatures/docstrings
3. **Retrieval** - Attention-based lookup from query to memory
4. **Memory bank** - 2000 Python functions from CodeParrot

### Why not mlx-lm?

MALM uses a **memory-augmented** architecture different from standard LLMs:
- Separate query and value encoders for retrieval
- Requires a memory bank of functions
- Inference is retrieval-based, not autoregressive generation

This architecture doesn't fit `mlx-lm generate`, so we provide a custom inference script.

## Architecture

| Component | Parameters |
|-----------|------------|
| Embedding | 11.1M |
| Position Embedding | 0.1M |
| Query Encoder (4 layers) | 28.4M |
| Value Encoder (4 layers) | 28.4M |
| Decoder (12 layers) | 85.1M |
| Output Projection | 11.1M |
| **Total** | **~165M** |

### Configuration

```json
{
  "vocab_size": 14407,
  "d_model": 768,
  "n_heads": 12,
  "n_layers": 12,
  "n_query_layers": 4,
  "max_seq_len": 128,
  "num_parameters": 165123656,
  "num_functions": 2000
}
```

## Files

| File | Description |
|------|-------------|
| `model.npz` | Model weights (MLX-compatible NumPy format) |
| `config.json` | Model configuration |
| `tokenizer.json` | Tokenizer vocabulary |
| `functions.json` | Memory bank of 2000 Python functions |
| `inference.py` | Standalone inference script |

## Training

Trained on CodeParrot with a focus on Python function retrieval:
- Encodes natural language queries into embedding space
- Learns semantic similarity between queries and function signatures
- Uses attention-based retrieval over a memory bank

## Citation

```bibtex
@article{sharma2026malm,
  title={Reverse Engineering a $500M Mystery: From HashHop to Memory-Augmented Language Models},
  author={Sharma, Asankhaya},
  year={2026},
  url={https://huggingface.co/blog/codelion/reverse-engineering-magic-hashhop}
}
```

## Related Work

Part of the [HashHop](https://github.com/codelion/hash-hop) project exploring long-context evaluation and memory-augmented architectures.

## License

Apache 2.0