dhwanichande29
/

nl-to-bash

@@ -5,35 +5,101 @@ tags:
   - code-generation
   - nlp
   - qwen2
 base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct
 ---
 # NL to Bash — Qwen2.5-Coder-0.5B Fine-tuned
-Fine-tuned version of Qwen2.5-Coder-0.5B-Instruct on 40,639 NL→Bash pairs
-from the NL2SH-ALFA dataset.
 ## Results
 | Metric | Score |
 |---|---|
 | Exact Match | 13.67% |
-| Semantic Match (≥0.8) | 60.33% |
 | Avg Similarity | 0.776 |
 ## Usage
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained("dhwanichande29/nl-to-bash")
 tokenizer = AutoTokenizer.from_pretrained("dhwanichande29/nl-to-bash")
 ```
 ## Dataset
-[westenfelder/NL2SH-ALFA](https://huggingface.co/datasets/westenfelder/NL2SH-ALFA)
-— 40,639 train / 300 test examples
-## Training
-- Hardware: NVIDIA A100-SXM4-80GB
-- Duration: ~2.09 hours
-- Epochs: 10
-- Precision: bfloat16

   - code-generation
   - nlp
   - qwen2
+  - shell
+  - natural-language-processing
+license: apache-2.0
 base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct
+datasets:
+  - westenfelder/NL2SH-ALFA
 ---
 # NL to Bash — Qwen2.5-Coder-0.5B Fine-tuned
+Fine-tuned version of [Qwen2.5-Coder-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct) on 40,639 natural language → Bash command pairs from the NL2SH-ALFA dataset.
+Try it live: 🚀 [Gradio Demo](https://huggingface.co/spaces/dhwanichande29/nl-to-bash)
+---
 ## Results
 | Metric | Score |
 |---|---|
 | Exact Match | 13.67% |
+| Semantic Match (cosine ≥ 0.8) | 60.33% |
 | Avg Similarity | 0.776 |
+> Evaluated on 300 held-out test examples from NL2SH-ALFA.
+> Semantic similarity is computed using `all-MiniLM-L6-v2` embeddings and is a better indicator of real-world quality than exact match alone, since multiple Bash commands can be functionally equivalent.
+---
 ## Usage
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained("dhwanichande29/nl-to-bash")
 tokenizer = AutoTokenizer.from_pretrained("dhwanichande29/nl-to-bash")
+system_prompt = "Your task is to translate a natural language instruction to a Bash command. You will receive an instruction in English and output a Bash command that can be run in a Linux terminal."
+def translate(instruction):
+    messages = [
+        {"role": "system", "content": system_prompt},
+        {"role": "user", "content": instruction}
+    ]
+    formatted = tokenizer.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=True
+    )
+    inputs = tokenizer(formatted, return_tensors="pt").to(model.device)
+    outputs = model.generate(**inputs, max_new_tokens=100, do_sample=False)
+    response = outputs[0][inputs.input_ids.shape[-1]:]
+    return tokenizer.decode(response, skip_special_tokens=True).strip()
+print(translate("list all files in current directory"))
+# find . -type f
 ```
+---
+## Example Outputs
+| Natural Language | Generated Bash |
+|---|---|
+| list all files in current directory | `find . -type f` |
+| find all python files | `find . -name "*.py"` |
+| count lines in a text file | `wc -l path/to/file` |
+| remove all .tmp files | `find . -name "*.tmp" -exec rm {} \;` |
+| show disk usage | `du -h /` |
+---
+## Training Details
+- **Base model:** Qwen/Qwen2.5-Coder-0.5B-Instruct (494M parameters)
+- **Dataset:** [westenfelder/NL2SH-ALFA](https://huggingface.co/datasets/westenfelder/NL2SH-ALFA)
+- **Train split:** 40,639 examples
+- **Test split:** 300 examples
+- **Epochs:** 10
+- **Batch size:** 15 per device (effective: 75 with gradient accumulation steps of 5)
+- **Precision:** bfloat16
+- **Max token length:** 150
+- **Hardware:** NVIDIA A100-SXM4-80GB
+- **Training time:** ~2.09 hours
+- **Experiment tracking:** Weights & Biases (`nl2sh` project)
+---
 ## Dataset
+[westenfelder/NL2SH-ALFA](https://huggingface.co/datasets/westenfelder/NL2SH-ALFA) — a dataset of natural language instructions paired with corresponding Bash commands.
+---
+## GitHub
+Full training code, evaluation notebooks, and FastAPI deployment:
+👉 [github.com/Dhwani-Chande/Natural-Language-to-Bash-Translation-using-LLMs](https://github.com/Dhwani-Chande/Natural-Language-to-Bash-Translation-using-LLMs)