malm-165m / README.md

Update README.md

417fffd verified 10 days ago

3.89 kB

	---
	license: apache-2.0
	library_name: mlx
	tags:
	- mlx
	- memory-augmented
	- code-generation
	- retrieval-augmented
	- python
	- code-search
	pipeline_tag: text-generation
	datasets:
	- codeparrot/codeparrot-clean
	---

	# MALM-165M: Memory-Augmented Language Model

	A 165M parameter Memory-Augmented Language Model (MALM) for semantic code search, trained on CodeParrot.

	## Quick Start

	```bash
	# Install dependencies
	pip install mlx huggingface_hub numpy

	# Download model
	huggingface-cli download codelion/malm-165m --local-dir ./malm-165m

	# Run semantic search
	python malm-165m/inference.py --query "function that sorts a list"
	```

	Example output:

	```text
	Query: function that sorts a list
	------------------------------------------------------------

	1. array_sort (score: 0.9526)
	Signature: array_sort(col)
	Docstring: Collection function: sorts the input array in ascending order...

	2. sort_array (score: 0.7707)
	Signature: sort_array(col, asc)
	Docstring: Collection function: sorts the input array in ascending or descending order...
	```

	## Python API

	```python
	from huggingface_hub import snapshot_download
	from pathlib import Path
	import sys

	# Download and import
	model_path = snapshot_download("codelion/malm-165m")
	sys.path.insert(0, model_path)

	from inference import load_model, search_functions

	# Load model
	model, tokenizer, functions, config = load_model(Path(model_path))
	print(f"Loaded {len(functions)} functions")

	# Search
	results = search_functions(
	model, tokenizer, functions,
	query="connect to database",
	top_k=5
	)

	for name, signature, docstring, score in results:
	print(f"{name}: {score:.4f}")
	```

	## Model Description

	MALM combines a transformer with learned memory retrieval for semantic code search:

	1. Query encoder - Encodes natural language queries into embeddings
	2. Value encoder - Encodes function signatures/docstrings
	3. Retrieval - Attention-based lookup from query to memory
	4. Memory bank - 2000 Python functions from CodeParrot

	### Why not mlx-lm?

	MALM uses a memory-augmented architecture different from standard LLMs:
	- Separate query and value encoders for retrieval
	- Requires a memory bank of functions
	- Inference is retrieval-based, not autoregressive generation

	This architecture doesn't fit `mlx-lm generate`, so we provide a custom inference script.

	## Architecture

	\| Component \| Parameters \|
	\|-----------\|------------\|
	\| Embedding \| 11.1M \|
	\| Position Embedding \| 0.1M \|
	\| Query Encoder (4 layers) \| 28.4M \|
	\| Value Encoder (4 layers) \| 28.4M \|
	\| Decoder (12 layers) \| 85.1M \|
	\| Output Projection \| 11.1M \|
	\| Total \| ~165M \|

	### Configuration

	```json
	{
	"vocab_size": 14407,
	"d_model": 768,
	"n_heads": 12,
	"n_layers": 12,
	"n_query_layers": 4,
	"max_seq_len": 128,
	"num_parameters": 165123656,
	"num_functions": 2000
	}
	```

	## Files

	\| File \| Description \|
	\|------\|-------------\|
	\| `model.npz` \| Model weights (MLX-compatible NumPy format) \|
	\| `config.json` \| Model configuration \|
	\| `tokenizer.json` \| Tokenizer vocabulary \|
	\| `functions.json` \| Memory bank of 2000 Python functions \|
	\| `inference.py` \| Standalone inference script \|

	## Training

	Trained on CodeParrot with a focus on Python function retrieval:
	- Encodes natural language queries into embedding space
	- Learns semantic similarity between queries and function signatures
	- Uses attention-based retrieval over a memory bank

	## Citation

	```bibtex
	@article{sharma2026malm,
	title={Reverse Engineering a $500M Mystery: From HashHop to Memory-Augmented Language Models},
	author={Sharma, Asankhaya},
	year={2026},
	url={https://huggingface.co/blog/codelion/reverse-engineering-magic-hashhop}
	}
	```

	## Related Work

	Part of the [HashHop](https://github.com/codelion/hash-hop) project exploring long-context evaluation and memory-augmented architectures.

	## License

	Apache 2.0