--- license: apache-2.0 library_name: mlx tags: - mlx - memory-augmented - code-generation - retrieval-augmented - python - code-search pipeline_tag: text-generation datasets: - codeparrot/codeparrot-clean --- # MALM-165M: Memory-Augmented Language Model A 165M parameter Memory-Augmented Language Model (MALM) for semantic code search, trained on CodeParrot. ## Quick Start ```bash # Install dependencies pip install mlx huggingface_hub numpy # Download model huggingface-cli download codelion/malm-165m --local-dir ./malm-165m # Run semantic search python malm-165m/inference.py --query "function that sorts a list" ``` **Example output:** ```text Query: function that sorts a list ------------------------------------------------------------ 1. array_sort (score: 0.9526) Signature: array_sort(col) Docstring: Collection function: sorts the input array in ascending order... 2. sort_array (score: 0.7707) Signature: sort_array(col, asc) Docstring: Collection function: sorts the input array in ascending or descending order... ``` ## Python API ```python from huggingface_hub import snapshot_download from pathlib import Path import sys # Download and import model_path = snapshot_download("codelion/malm-165m") sys.path.insert(0, model_path) from inference import load_model, search_functions # Load model model, tokenizer, functions, config = load_model(Path(model_path)) print(f"Loaded {len(functions)} functions") # Search results = search_functions( model, tokenizer, functions, query="connect to database", top_k=5 ) for name, signature, docstring, score in results: print(f"{name}: {score:.4f}") ``` ## Model Description MALM combines a transformer with learned memory retrieval for semantic code search: 1. **Query encoder** - Encodes natural language queries into embeddings 2. **Value encoder** - Encodes function signatures/docstrings 3. **Retrieval** - Attention-based lookup from query to memory 4. **Memory bank** - 2000 Python functions from CodeParrot ### Why not mlx-lm? MALM uses a **memory-augmented** architecture different from standard LLMs: - Separate query and value encoders for retrieval - Requires a memory bank of functions - Inference is retrieval-based, not autoregressive generation This architecture doesn't fit `mlx-lm generate`, so we provide a custom inference script. ## Architecture | Component | Parameters | |-----------|------------| | Embedding | 11.1M | | Position Embedding | 0.1M | | Query Encoder (4 layers) | 28.4M | | Value Encoder (4 layers) | 28.4M | | Decoder (12 layers) | 85.1M | | Output Projection | 11.1M | | **Total** | **~165M** | ### Configuration ```json { "vocab_size": 14407, "d_model": 768, "n_heads": 12, "n_layers": 12, "n_query_layers": 4, "max_seq_len": 128, "num_parameters": 165123656, "num_functions": 2000 } ``` ## Files | File | Description | |------|-------------| | `model.npz` | Model weights (MLX-compatible NumPy format) | | `config.json` | Model configuration | | `tokenizer.json` | Tokenizer vocabulary | | `functions.json` | Memory bank of 2000 Python functions | | `inference.py` | Standalone inference script | ## Training Trained on CodeParrot with a focus on Python function retrieval: - Encodes natural language queries into embedding space - Learns semantic similarity between queries and function signatures - Uses attention-based retrieval over a memory bank ## Citation ```bibtex @article{sharma2026malm, title={Reverse Engineering a $500M Mystery: From HashHop to Memory-Augmented Language Models}, author={Sharma, Asankhaya}, year={2026}, url={https://huggingface.co/blog/codelion/reverse-engineering-magic-hashhop} } ``` ## Related Work Part of the [HashHop](https://github.com/codelion/hash-hop) project exploring long-context evaluation and memory-augmented architectures. ## License Apache 2.0