Update README.md

1ecdb44 verified 30 days ago

5.24 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: mlx
	pipeline_tag: text-generation
	base_model: Qwen/Qwen2.5-1.5B-Instruct
	tags:
	- cron
	- systemd
	- devops
	- schedule
	- text-generation
	- mlx
	- lora
	datasets:
	- Shpigford/cron-schedule-conversion
	---

	# Shpigford/cron-mini

	A small fine-tuned language model that converts natural-language schedules into cron expressions and systemd `OnCalendar` strings.

	## What it does

	```
	Input: every Tuesday at 3am except December

	Output: {"cron": "0 3 * 1-11 2",
	"systemd": "Tue -01..11- 03:00:00",
	"note": "Months 1-11 only excludes December."}
	```

	It handles:

	- Standard schedules (daily, weekly, monthly, every N minutes/hours)
	- Holidays (Christmas, Thanksgiving, Black Friday, Halloween, etc.)
	- Casual time references ("lunchtime", "before bed", "first thing in the morning")
	- Ordinal weekdays ("second Tuesday of the month", "last Friday")
	- Negative specifications ("every day except Sunday", "all months except December")
	- Sub-minute intervals (cron can't, systemd can — model annotates the limitation)
	- Awkward intervals (every 90 minutes — cron can't, expanded across the day)
	- Compound schedules requiring multiple cron lines
	- systemd-specific features (`OnBootSec=`, `Persistent=`, `RandomizedDelaySec=`)
	- Time zones (sets `TZ=` for cron, uses `Asia/Tokyo`-style for systemd)
	- Typos and informal phrasings ("evry tues @ 3am")

	## Usage

	### MLX (Apple Silicon)

	```python
	from mlx_lm import load, generate

	model, tokenizer = load("Shpigford/cron-mini")

	SYSTEM = ("You convert natural-language schedules into cron expressions and "
	"systemd OnCalendar strings. Output JSON with keys: cron, systemd, "
	"note. If cron cannot exactly express the schedule, put the closest "
	"valid cron and explain in note. Do not output anything else.")

	messages = [
	{"role": "system", "content": SYSTEM},
	{"role": "user", "content": "Convert this schedule to cron and systemd OnCalendar: every weekday at 9am"},
	]
	prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
	print(generate(model, tokenizer, prompt=prompt, max_tokens=200, temp=0.0))
	```

	### Transformers (any platform)

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("Shpigford/cron-mini", torch_dtype="auto", device_map="auto")
	tokenizer = AutoTokenizer.from_pretrained("Shpigford/cron-mini")

	SYSTEM = "..." # same as above
	messages = [
	{"role": "system", "content": SYSTEM},
	{"role": "user", "content": "Convert this schedule to cron and systemd OnCalendar: every weekday at 9am"},
	]
	inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
	out = model.generate(inputs, max_new_tokens=200, do_sample=False)
	print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))
	```

	### llama.cpp / Ollama (GGUF)

	A GGUF version is available — see the Files tab for `.gguf` files. Load with llama.cpp or import into Ollama:

	```bash
	ollama create cron-mini -f Modelfile
	```

	## Evaluation

	Held-out test set of 91 cases including all the trick categories above:

	- Overall (cron+systemd both correct): 63/91 (69.2%)
	- Cron exact match: 73/91 (80.2%)
	- Cron syntactically valid: 87/91 (95.6%)
	- systemd exact match: 71/91 (78.0%)

	See `eval_results.json` in this repo for per-case results.

	## Training

	- Base model: `Qwen/Qwen2.5-1.5B-Instruct` (Apache 2.0)
	- Method: LoRA fine-tune via [mlx-lm](https://github.com/ml-explore/mlx-examples/tree/main/llms)
	- Hardware: M4 Mac mini, 16GB unified memory
	- Dataset: ~3000 examples — hand-crafted hard cases + templated generation + Claude-API paraphrases and synthetic novel cases (verified with a self-check pass)
	- Dataset on HF: [Shpigford/cron-schedule-conversion](https://huggingface.co/datasets/Shpigford/cron-schedule-conversion)

	## Limitations

	- The model emits a single best-guess for ambiguous fuzzy times (e.g., "morning" → 7am). It will not ask clarifying questions.
	- For "every other Monday" / "biweekly" / "fortnightly" patterns, cron cannot express them natively — the model emits "every Monday" and notes the limitation. Gate in your script with a week-of-year check.
	- For "last day of month" / "last Friday", cron has no native expression — the model approximates with day-of-month ranges and flags the limitation.
	- Vixie cron OR-matches DOM and DOW when both are restricted; the model emits expressions that work under the more common AND-matching interpretation. Verify on your specific cron implementation.
	- Time zone handling: cron has no built-in TZ field; the model emits the schedule in the system's local time and notes when a `TZ=` env var is needed.
	- Trained on English. Other languages will likely degrade significantly.

	## License

	Apache 2.0, same as the base model.

	## Citation

	If you find this useful:

	```bibtex
	@misc{cron-mini,
	author = {Pigford, Josh},
	title = {Cron-Mini: A Small Model for Schedule Conversion},
	year = {2026},
	howpublished = {Hugging Face},
	url = {https://huggingface.co/Shpigford/cron-mini}
	}
	```