| --- |
| license: gemma |
| datasets: |
| - nphearum/Code-Reasoning-4k |
| language: |
| - en |
| base_model: |
| - google/gemma-3-270m-it |
| pipeline_tag: text-generation |
| tags: |
| - text-generation-inference |
| - gemma |
| - code |
| - SLM |
| - chat |
| - reasoning |
| --- |
| |
|
|
| # GemCod-R-Sapphire (gemma-270m-it-code-reasoning v2.1.4) |
|
|
|  |
| |
| GemCod is a lightweight code generation model finetuned using SFT on the base gemma-270m-it model(https://huggingface.co/google/gemma-3-270m-it). It offers accurate and quick(ish) code snippet and long-form code generation in all major programming languages. |
| It's small size (270M parameters) allows it to run comfortably on laptop grade GPUs. |
| |
| GemCod-R is a family of code agents built upon the GemCod architecture with expansions for reasoning capabilities. This allows the models to serve higher quality and hallucination-free generations on snippets and long-form code. |
|
|
| The Sapphire model represents the next generation of GemCod agents by integrating COT(Chain Of Thought) reasoning capabilities into the standard coding architecture. This almost completely removes the chances of hallucination and allows the model to give highly detailed and specialized explanations and instructions along with its generations. |
|
|
| It serves as an upgrade from the previous Jade(https://huggingface.co/DireDreadlord/GemCod-Jade-270M) and Topaz(https://huggingface.co/DireDreadlord/GemCod-Topaz-270M) models whilst only having minor bloat to inference time and space requirements. |
| This model also offers rudimentary Q/A and subject matter expert capabilities on code related subjects. |
|
|
| --- |
|
|
|
|
| **Estimated parameters:** ~270M |
|
|
| **Architecture:** Gemma3 |
|
|
| **Intended use:** Code snippet and long-form generations from natural language, instruction generation and COT explanations on code snippets |
|
|
| --- |
|
|
|
|
| ## Training data |
| - Source: code-reasoning-4k dataset (https://huggingface.co/datasets/nphearum/Code-Reasoning-4k) |
| - Rows: ~40,000 rows templated with a custom .jinja chat format |
| - Training: trained for 3,000 steps on an RTX 3050 (4GB VRAM) |
|
|
|
|
| ## Usage |
|
|
| Install requirements: |
|
|
| ```bash |
| pip install -r requirements.txt |
| pip install transformers datasets accelerate safetensors |
| ``` |
|
|
|
|
| ## Usage (Hugging Face Hub) |
| You can load it directly from HuggingFace: |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| |
| |
| tokenizer = AutoTokenizer.from_pretrained("DireDreadlord/GemCod-R-Sapphire-270M") |
| model = AutoModelForCausalLM.from_pretrained("DireDreadlord/GemCod-R-Sapphire-270M") |
| model.to(device) |
| model.eval() |
| model.resize_token_embeddings(len(tokenizer)) |
| |
| |
| user_prompt = ( |
| "write a bubble sort algorithm in cpp." |
| "Please think step by step and show your chain-of-thought before the final code." #<-- comment out this line to disable COT |
| ) |
| |
| messages = [{"role": "user", "content": user_prompt}] |
| |
| |
| inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt") |
| inputs = {k: v.to(device) for k, v in inputs.items()} |
| |
| |
| with torch.no_grad(): |
| outputs = model.generate( |
| **inputs, |
| max_new_tokens=1024, |
| do_sample=False, |
| num_beams=1, |
| pad_token_id=tokenizer.eos_token_id, |
| eos_token_id=tokenizer.eos_token_id, |
| use_cache=False, |
| ) |
| |
| |
| prompt_len = inputs["input_ids"].shape[1] |
| generated_ids = outputs[0, prompt_len:] |
| print(tokenizer.decode(generated_ids.tolist(), skip_special_tokens=True)) |
| ``` |
| **For optimal long-form generation along with COT, set `max_new_tokens=2048`** |
|
|
| ## Limitations |
| - Model for experimental use only; users should employ it as such under license. |