--- license: bsd-2-clause task_categories: - text-generation language: - en tags: - common-lisp - macros - code-generation - program-transformation pretty_name: Common Lisp Macro Transformations size_categories: - n<1K --- # Common Lisp Macro Transformations A fine-tuning dataset for training models to generate Common Lisp macros. Each example is a **(before-code) → (macro-definition) → (after-expansion)** triple. ## Idea Instead of fine-tuning a model to "write code", fine-tune it to generate **CL macros** — code that writes code. The model learns to recognize AST patterns and generate transformations, not final output. ## Sources - **Let Over Lambda** — Doug Hoyte's production macro collection (thephoeron/let-over-lambda) - **On Lisp** — Paul Graham's classic Common Lisp macro utilities ## Dataset Structure Each record contains: - `instruction` — Task description with the code pattern to address - `input` — The "before" code showing the pattern that needs a macro - `output` — The `defmacro` form that solves it - `category` — Macro category (capture-management, anaphoric, dispatch, control-flow, DSL, compiler-macro, efficiency, scope) - `technique` — Comma-separated techniques used (gensym, nested-backquote, dlambda, anaphor, code-walking, symbol-macrolet, defsetf, tagbody-go, once-only, macrolet, compiler-macro, recursive-expansion) - `complexity` — basic, intermediate, or advanced - `quality_score` — Classifier score from 0.0 to 1.0 ## Categories | Category | Description | Examples | |---|---|---| | capture-management | Hygienic macro writing utilities | defmacro/g!, defmacro!, with-gensyms | | anaphoric | Deliberate variable capture for conciseness | aif, alambda, alet, aand | | dispatch | Keyword-based dispatch and inter-closure protocols | dlambda, pandoriclet, with-pandoric | | control-flow | New evaluation semantics via macros | nlet-tail, condlet, if-match, choose | | DSL | Domain-specific embedded languages | defunits, _f (generalized setf), dbind | | compiler-macro | Compile-time optimization of function calls | fformat compiler macro | | efficiency | Performance-oriented macro techniques | sortf (sorting networks) | | scope | Lexical scope manipulation | pandoric-eval | ## Use for Fine-tuning The data is in instruction-input-output JSONL format, ready for fine-tuning: ```python from datasets import load_dataset ds = load_dataset("j14i/cl-macros", split="train") ``` Target model size: ≤ 30B parameters (the domain is narrow — pattern matching on ASTs and transformations — so a smaller model suffices).