NL to Bash — Qwen2.5-Coder-0.5B Fine-tuned

Fine-tuned version of Qwen2.5-Coder-0.5B-Instruct on 40,639 natural language → Bash command pairs from the NL2SH-ALFA dataset.

Try it live: 🚀 Gradio Demo

Results

Metric	Score
Exact Match	13.67%
Semantic Match (cosine ≥ 0.8)	60.33%
Avg Similarity	0.776

Evaluated on 300 held-out test examples from NL2SH-ALFA. Semantic similarity is computed using all-MiniLM-L6-v2 embeddings and is a better indicator of real-world quality than exact match alone, since multiple Bash commands can be functionally equivalent.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("dhwanichande29/nl-to-bash")
tokenizer = AutoTokenizer.from_pretrained("dhwanichande29/nl-to-bash")

system_prompt = "Your task is to translate a natural language instruction to a Bash command. You will receive an instruction in English and output a Bash command that can be run in a Linux terminal."

def translate(instruction):
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": instruction}
    ]
    formatted = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    inputs = tokenizer(formatted, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=100, do_sample=False)
    response = outputs[0][inputs.input_ids.shape[-1]:]
    return tokenizer.decode(response, skip_special_tokens=True).strip()

print(translate("list all files in current directory"))
# find . -type f

Example Outputs

Natural Language	Generated Bash
list all files in current directory	`find . -type f`
find all python files	`find . -name "*.py"`
count lines in a text file	`wc -l path/to/file`
remove all .tmp files	`find . -name "*.tmp" -exec rm {} \;`
show disk usage	`du -h /`

Training Details

Base model: Qwen/Qwen2.5-Coder-0.5B-Instruct (494M parameters)
Dataset: westenfelder/NL2SH-ALFA
Train split: 40,639 examples
Test split: 300 examples
Epochs: 10
Batch size: 15 per device (effective: 75 with gradient accumulation steps of 5)
Precision: bfloat16
Max token length: 150
Hardware: NVIDIA A100-SXM4-80GB
Training time: ~2.09 hours
Experiment tracking: Weights & Biases (nl2sh project)

Dataset

westenfelder/NL2SH-ALFA — a dataset of natural language instructions paired with corresponding Bash commands.

GitHub

Full training code, evaluation notebooks, and FastAPI deployment: 👉 github.com/Dhwani-Chande/Natural-Language-to-Bash-Translation-using-LLMs

Downloads last month: 6

Safetensors

Model size

0.5B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dhwanichande29/nl-to-bash

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-Coder-0.5B

Finetuned

Qwen/Qwen2.5-Coder-0.5B-Instruct

Finetuned

(93)

this model

dhwanichande29
/

nl-to-bash

NL to Bash — Qwen2.5-Coder-0.5B Fine-tuned

Results

Usage

Example Outputs

Training Details

Dataset

GitHub

Model tree for dhwanichande29/nl-to-bash

Dataset used to train dhwanichande29/nl-to-bash

Space using dhwanichande29/nl-to-bash 1