|
|
--- |
|
|
datasets: |
|
|
- nvidia/OpenCodeReasoning |
|
|
pipeline_tag: image-text-to-text |
|
|
--- |
|
|
|
|
|
|
|
|
# Hicoder-R1-Distill-Gemma-27B |
|
|
|
|
|
Notably, this CoT-enabled model was trained using only a single RTX 4090D, achieved through optimizations in both GPU VRAM and system RAM management, as well as specific techniques applied during the training steps. |
|
|
### Model Overview |
|
|
|
|
|
**Hicoder-R1-Distill-Gemma-27B** is a large language model fine-tuned from Google's **Gemma-3 27B** base model. This model is specifically optimized for **Chain-of-Thought (CoT) reasoning** and **code generation** tasks. |
|
|
|
|
|
* **Base Model:** google/gemma-3-27b |
|
|
* **Fine-tuned by:** tonyli8623 |
|
|
* **Focus Areas:** Chain-of-Thought (CoT), Code Generation, Code Explanation, Debugging |
|
|
* **Language:** Primarily English for prompts and reasoning, generates code in multiple languages. |
|
|
|
|
|
### Key Features |
|
|
|
|
|
* **Enhanced CoT Reasoning:** Explicitly trained to break down complex problems into intermediate steps before providing a final answer, particularly useful for complex coding or algorithmic tasks. |
|
|
* **Strong Coding Capabilities:** Generates, explains, debugs, and translates code across various programming languages (e.g., Python, JavaScript, Java, C++, SQL, etc.). |
|
|
* **Gemma-3 Foundation:** Built upon the powerful and efficient architecture of Google's Gemma-3 27B model. |
|
|
* **Distillation Enhanced (Implied):** Potentially benefits from knowledge distillation for improved performance relative to standard fine-tuning on the target tasks. |
|
|
|
|
|
### How to Use |
|
|
|
|
|
You can use this model with the Hugging Face `transformers` library. |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
# Specify the path to your fine-tuned model (local or Hugging Face Hub ID) |
|
|
model_id = "tonyli8623/Hicoder-R1-Distill-Gemma-27B" |
|
|
|
|
|
# Load tokenizer and model |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_id, |
|
|
torch_dtype=torch.bfloat16, # Use bfloat16 for efficiency if supported |
|
|
device_map="auto" # Automatically distribute across available GPUs |
|
|
) |
|
|
|
|
|
# --- Example 1: Simple Code Generation --- |
|
|
prompt_simple = "Write a Python function to calculate the factorial of a number." |
|
|
# Note: Use the appropriate chat template if the base model requires it (e.g., Gemma-2 instruct) |
|
|
# Example using Gemma-2 instruct template structure (adjust if needed): |
|
|
messages_simple = [ |
|
|
{"role": "user", "content": prompt_simple} |
|
|
] |
|
|
input_ids_simple = tokenizer.apply_chat_template(messages_simple, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device) |
|
|
|
|
|
outputs_simple = model.generate( |
|
|
input_ids_simple, |
|
|
max_new_tokens=150, |
|
|
do_sample=True, |
|
|
temperature=0.7, |
|
|
top_k=50, |
|
|
top_p=0.95 |
|
|
) |
|
|
response_simple = tokenizer.decode(outputs_simple[0][input_ids_simple.shape[1]:], skip_special_tokens=True) |
|
|
print("--- Simple Code Generation ---") |
|
|
print(response_simple) |
|
|
|
|
|
# --- Example 2: Code Generation with CoT --- |
|
|
prompt_cot = """Think step-by-step to write a Python function that finds all prime numbers up to a given integer 'n' using the Sieve of Eratosthenes algorithm. Then, provide the function. |
|
|
|
|
|
Let's break this down: |
|
|
1. Understand the Sieve of Eratosthenes. |
|
|
2. Outline the steps needed in the function. |
|
|
3. Write the Python code based on the outline.""" |
|
|
|
|
|
messages_cot = [ |
|
|
{"role": "user", "content": prompt_cot} |
|
|
] |
|
|
input_ids_cot = tokenizer.apply_chat_template(messages_cot, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device) |
|
|
|
|
|
outputs_cot = model.generate( |
|
|
input_ids_cot, |
|
|
max_new_tokens=500, # Allow more tokens for CoT + code |
|
|
do_sample=True, |
|
|
temperature=0.6, |
|
|
top_k=50, |
|
|
top_p=0.95 |
|
|
) |
|
|
response_cot = tokenizer.decode(outputs_cot[0][input_ids_cot.shape[1]:], skip_special_tokens=True) |
|
|
print("\n--- Code Generation with CoT ---") |
|
|
print(response_cot) |
|
|
|
|
|
``` |
|
|
|
|
|
**Prompting:** For best results, especially when seeking CoT reasoning, explicitly ask the model to "think step-by-step" or "provide your reasoning process before the code". in system prompts add "You are a code engineer proficient in various programming languages. Before answering, please carefully consider the question and create a logically coherent thought process, starting with and ending with . After thinking, provide the answer." |
|
|
|
|
|
### Limitations and Bias |
|
|
|
|
|
* This model is based on Gemma-3, and inherits its capabilities and limitations. |
|
|
* While fine-tuned for coding, it may still generate incorrect, inefficient, or insecure code. **Always review and test generated code thoroughly.** |
|
|
* The model's knowledge is limited to its training data cutoff. |
|
|
* Like all LLMs, it may exhibit biases present in the underlying training data. |
|
|
* Chain-of-Thought reasoning may not always be perfect or logical. |
|
|
|
|
|
### License |
|
|
|
|
|
The license for this model depends on the base Gemma-2 model's license and any additional terms you impose. The Gemma-3 models are typically governed by the "Gemma Terms of Use". Please consult the specific license file included with the model or the Gemma Terms of Use. |
|
|
|
|
|
* **Gemma Terms of Use:** [Link to Google's Gemma Terms, e.g., https://ai.google.dev/gemma/terms] |
|
|
* **Fine-tuning Specific License (if any):** [Specify if you add Apache 2.0, MIT, etc., or state it follows the base model license] |
|
|
|
|
|
### Citation |
|
|
|
|
|
If you use this model in your research or work, please consider citing: |
|
|
|
|
|
```bibtex |
|
|
@misc{hicoder_r1_distill_gemma_27b_[year], |
|
|
title={Hicoder-R1-Distill-Gemma-27B: A Chain-of-Thought and Code Generation Focused Model}, |
|
|
author={[Your Name/Organization]}, |
|
|
year={[Year of Release]}, |
|
|
howpublished={\url{[Link to Model Hub or Repository]}} |
|
|
} |
|
|
|
|
|
@misc{gemma2_2024, |
|
|
title={Gemma 3 Technical Report}, |
|
|
author={Gemma Team, Google}, |
|
|
year={2024}, |
|
|
howpublished={\url{https://ai.google.dev/gemma}} % Replace with actual Gemma 2 paper/report link if available |
|
|
} |
|
|
``` |
|
|
|
|
|
### Contact |
|
|
|
|
|
For questions, feedback, or issues, please contact tonyli288@gmail.com. |
|
|
|
|
|
--- |
|
|
--- |
|
|
|
|
|
## 中文版 (Chinese Version) |
|
|
|
|
|
### 模型概述 |
|
|
|
|
|
**Hicoder-R1-Distill-Gemma-27B** 是一个基于 Google **Gemma-3 27B** (基础模型进行微调的大型语言模型。该模型专门针对**思维链 (Chain-of-Thought, CoT) 推理**和**代码生成**任务进行了优化。 |
|
|
|
|
|
* **基础模型:** google/gemma-2-27b (或指定使用的确切变体,例如 gemma-2-27b-it) |
|
|
* **微调者:** [您的姓名/组织名称] |
|
|
* **专注领域:** 思维链 (CoT), 代码生成, 代码解释, 代码调试 |
|
|
* **语言:** 主要使用英文进行提示和推理,可生成多种编程语言的代码。 |
|
|
|
|
|
### 主要特性 |
|
|
|
|
|
* **增强的 CoT 推理能力:** 经过专门训练,能够在提供最终答案之前将复杂问题分解为中间步骤,这对于复杂的编码或算法任务特别有用。 |
|
|
* **强大的编码能力:** 能生成、解释、调试和翻译多种编程语言(如 Python, JavaScript, Java, C++, SQL 等)的代码。 |
|
|
* **基于 Gemma-2:** 构建于 Google 强大且高效的 Gemma-2 27B 模型架构之上。 |
|
|
* **蒸馏增强 (推测):** 可能受益于知识蒸馏,相对于在目标任务上的标准微调,性能有所提升。 |
|
|
|
|
|
### 如何使用 |
|
|
|
|
|
您可以通过 Hugging Face `transformers` 库来使用此模型。 |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
# 指定您的微调模型的路径 (本地路径或 Hugging Face Hub ID) |
|
|
model_id = "tonyli8623/Hicoder-R1-Distill-Gemma-27B" |
|
|
# 加载分词器和模型 |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_id, |
|
|
torch_dtype=torch.bfloat16, # 如果硬件支持,使用 bfloat16 以提高效率 |
|
|
device_map="auto" # 自动将模型分配到可用的 GPU 上 |
|
|
) |
|
|
|
|
|
# --- 示例 1: 简单代码生成 --- |
|
|
prompt_simple = "编写一个 Python 函数来计算一个数的阶乘。" |
|
|
# 注意: 如果基础模型需要,请使用相应的聊天模板 (例如 Gemma-2 instruct) |
|
|
# 使用 Gemma-2 instruct 模板结构的示例 (如果需要请调整): |
|
|
messages_simple = [ |
|
|
{"role": "user", "content": prompt_simple} |
|
|
] |
|
|
input_ids_simple = tokenizer.apply_chat_template(messages_simple, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device) |
|
|
|
|
|
outputs_simple = model.generate( |
|
|
input_ids_simple, |
|
|
max_new_tokens=150, |
|
|
do_sample=True, |
|
|
temperature=0.7, |
|
|
top_k=50, |
|
|
top_p=0.95 |
|
|
) |
|
|
response_simple = tokenizer.decode(outputs_simple[0][input_ids_simple.shape[1]:], skip_special_tokens=True) |
|
|
print("--- 简单代码生成 ---") |
|
|
print(response_simple) |
|
|
|
|
|
# --- 示例 2: 带 CoT 的代码生成 --- |
|
|
prompt_cot = """请逐步思考如何编写一个 Python 函数,使用埃拉托斯特尼筛法 (Sieve of Eratosthenes) 找出小于等于给定整数 'n' 的所有素数。然后,提供该函数。 |
|
|
|
|
|
让我们分解一下步骤: |
|
|
1. 理解埃拉托斯特尼筛法的原理。 |
|
|
2. 概述函数中需要的步骤。 |
|
|
3. 基于概述编写 Python 代码。""" |
|
|
|
|
|
messages_cot = [ |
|
|
{"role": "user", "content": prompt_cot} |
|
|
] |
|
|
input_ids_cot = tokenizer.apply_chat_template(messages_cot, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device) |
|
|
|
|
|
outputs_cot = model.generate( |
|
|
input_ids_cot, |
|
|
max_new_tokens=500, # 为 CoT + 代码允许更多 token |
|
|
do_sample=True, |
|
|
temperature=0.6, |
|
|
top_k=50, |
|
|
top_p=0.95 |
|
|
) |
|
|
response_cot = tokenizer.decode(outputs_cot[0][input_ids_cot.shape[1]:], skip_special_tokens=True) |
|
|
print("\n--- 带 CoT 的代码生成 ---") |
|
|
print(response_cot) |
|
|
|
|
|
``` |
|
|
|
|
|
**提示词技巧 (Prompting):** 为了获得最佳效果,尤其是在需要 CoT 推理时,请明确要求模型“逐步思考”或“在代码前提供你的推理过程”。如添加system prompts "你是一位精通各种编程语言的代码工程师。在回答之前,请仔细思考问题,并创建一个逻辑连贯的思考过程,以<think>开始,以</think>结束,思考完后给出答案。" |
|
|
|
|
|
### 局限性与偏见 |
|
|
|
|
|
* 该模型基于 Gemma-2,继承了其能力和局限性。 |
|
|
* 尽管针对编码进行了微调,它仍可能生成不正确、低效或不安全的代码。**请务必仔细审查和测试生成的代码。** |
|
|
* 模型的知识仅限于其训练数据的截止日期。 |
|
|
* 与所有大型语言模型一样,它可能表现出基础训练数据中存在的偏见。 |
|
|
* 思维链推理可能并非总是完美或符合逻辑。 |
|
|
|
|
|
### 许可证 (License) |
|
|
|
|
|
该模型的许可证取决于基础 Gemma-2 模型的许可证以及您可能施加的任何附加条款。Gemma-2 模型通常受 "Gemma 使用条款" 的约束。请查阅模型附带的具体许可证文件或 Gemma 使用条款。 |
|
|
|
|
|
* **Gemma 使用条款:** [指向 Google Gemma 条款的链接, 例如: https://ai.google.dev/gemma/terms] |
|
|
* **微调特定许可证 (如有):** [在此说明您是否添加了 Apache 2.0, MIT 等许可证,或声明其遵循基础模型的许可证] |
|
|
|
|
|
### 引用 |
|
|
|
|
|
如果您在研究或工作中使用此模型,请考虑引用: |
|
|
|
|
|
```bibtex |
|
|
@misc{hicoder_r1_distill_gemma_27b_[年份], |
|
|
title={Hicoder-R1-Distill-Gemma-27B: 一个专注于思维链和代码生成的模型}, |
|
|
author={[您的姓名/组织名称]}, |
|
|
year={[发布年份]}, |
|
|
howpublished={\url{[模型 Hub 或仓库的链接]}} |
|
|
} |
|
|
|
|
|
@misc{gemma2_2024, |
|
|
title={Gemma 2 Technical Report}, |
|
|
author={Gemma Team, Google}, |
|
|
year={2024}, |
|
|
|
|
|
} |