NL to Bash β€” Qwen2.5-Coder-0.5B Fine-tuned

Fine-tuned version of Qwen2.5-Coder-0.5B-Instruct on 40,639 natural language β†’ Bash command pairs from the NL2SH-ALFA dataset.

Try it live: πŸš€ Gradio Demo


Results

Metric Score
Exact Match 13.67%
Semantic Match (cosine β‰₯ 0.8) 60.33%
Avg Similarity 0.776

Evaluated on 300 held-out test examples from NL2SH-ALFA. Semantic similarity is computed using all-MiniLM-L6-v2 embeddings and is a better indicator of real-world quality than exact match alone, since multiple Bash commands can be functionally equivalent.


Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("dhwanichande29/nl-to-bash")
tokenizer = AutoTokenizer.from_pretrained("dhwanichande29/nl-to-bash")

system_prompt = "Your task is to translate a natural language instruction to a Bash command. You will receive an instruction in English and output a Bash command that can be run in a Linux terminal."

def translate(instruction):
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": instruction}
    ]
    formatted = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    inputs = tokenizer(formatted, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=100, do_sample=False)
    response = outputs[0][inputs.input_ids.shape[-1]:]
    return tokenizer.decode(response, skip_special_tokens=True).strip()

print(translate("list all files in current directory"))
# find . -type f

Example Outputs

Natural Language Generated Bash
list all files in current directory find . -type f
find all python files find . -name "*.py"
count lines in a text file wc -l path/to/file
remove all .tmp files find . -name "*.tmp" -exec rm {} \;
show disk usage du -h /

Training Details

  • Base model: Qwen/Qwen2.5-Coder-0.5B-Instruct (494M parameters)
  • Dataset: westenfelder/NL2SH-ALFA
  • Train split: 40,639 examples
  • Test split: 300 examples
  • Epochs: 10
  • Batch size: 15 per device (effective: 75 with gradient accumulation steps of 5)
  • Precision: bfloat16
  • Max token length: 150
  • Hardware: NVIDIA A100-SXM4-80GB
  • Training time: ~2.09 hours
  • Experiment tracking: Weights & Biases (nl2sh project)

Dataset

westenfelder/NL2SH-ALFA β€” a dataset of natural language instructions paired with corresponding Bash commands.


GitHub

Full training code, evaluation notebooks, and FastAPI deployment: πŸ‘‰ github.com/Dhwani-Chande/Natural-Language-to-Bash-Translation-using-LLMs

Downloads last month
88
Safetensors
Model size
0.5B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for dhwanichande29/nl-to-bash

Finetuned
(75)
this model

Dataset used to train dhwanichande29/nl-to-bash

Space using dhwanichande29/nl-to-bash 1