| --- |
| license: bsd-2-clause |
| task_categories: |
| - text-generation |
| language: |
| - en |
| tags: |
| - common-lisp |
| - macros |
| - code-generation |
| - program-transformation |
| pretty_name: Common Lisp Macro Transformations |
| size_categories: |
| - n<1K |
| --- |
| |
| # Common Lisp Macro Transformations |
|
|
| A fine-tuning dataset for training models to generate Common Lisp macros. Each example is a **(before-code) β (macro-definition) β (after-expansion)** triple. |
|
|
| ## Idea |
|
|
| Instead of fine-tuning a model to "write code", fine-tune it to generate **CL macros** β code that writes code. The model learns to recognize AST patterns and generate transformations, not final output. |
|
|
| ## Sources |
|
|
| - **Let Over Lambda** β Doug Hoyte's production macro collection (thephoeron/let-over-lambda) |
| - **On Lisp** β Paul Graham's classic Common Lisp macro utilities |
|
|
| ## Dataset Structure |
|
|
| Each record contains: |
| - `instruction` β Task description with the code pattern to address |
| - `input` β The "before" code showing the pattern that needs a macro |
| - `output` β The `defmacro` form that solves it |
| - `category` β Macro category (capture-management, anaphoric, dispatch, control-flow, DSL, compiler-macro, efficiency, scope) |
| - `technique` β Comma-separated techniques used (gensym, nested-backquote, dlambda, anaphor, code-walking, symbol-macrolet, defsetf, tagbody-go, once-only, macrolet, compiler-macro, recursive-expansion) |
| - `complexity` β basic, intermediate, or advanced |
| - `quality_score` β Classifier score from 0.0 to 1.0 |
|
|
| ## Categories |
|
|
| | Category | Description | Examples | |
| |---|---|---| |
| | capture-management | Hygienic macro writing utilities | defmacro/g!, defmacro!, with-gensyms | |
| | anaphoric | Deliberate variable capture for conciseness | aif, alambda, alet, aand | |
| | dispatch | Keyword-based dispatch and inter-closure protocols | dlambda, pandoriclet, with-pandoric | |
| | control-flow | New evaluation semantics via macros | nlet-tail, condlet, if-match, choose | |
| | DSL | Domain-specific embedded languages | defunits, _f (generalized setf), dbind | |
| | compiler-macro | Compile-time optimization of function calls | fformat compiler macro | |
| | efficiency | Performance-oriented macro techniques | sortf (sorting networks) | |
| | scope | Lexical scope manipulation | pandoric-eval | |
| |
| ## Use for Fine-tuning |
| |
| The data is in instruction-input-output JSONL format, ready for fine-tuning: |
| |
| ```python |
| from datasets import load_dataset |
| ds = load_dataset("j14i/cl-macros", split="train") |
| ``` |
| |
| Target model size: β€ 30B parameters (the domain is narrow β pattern matching on ASTs and transformations β so a smaller model suffices). |
| |