abocide
/

Qwen2.5-7B-Instruct-R1-forfinance

+---
+library_name: transformers
+license: apache-2.0
+base_model: Qwen/Qwen2.5-7B-Instruct
+tags:
+- llama-factory
+- full
+- generated_from_trainer
+- finance
+- chinese
+- sft
+- reasoning
+- financial-ai
+model-index:
+- name: Qwen2.5-7B-Instruct-R1-forfinance
+  results: []
+---
+# Qwen2.5-7B-Instruct-R1-forfinance
+## 模型简介 / Model Description
+**Qwen2.5-7B-Instruct-R1-forfinance** 是一个专门针对金融领域进行微调的大语言模型。该模型基于 Qwen2.5-7B-Instruct 进行全量微调，结合了开源金融问答数据集和高质量的思维链推理数据。
+**Qwen2.5-7B-Instruct-R1-forfinance** is a large language model specifically fine-tuned for the financial domain. This model is based on Qwen2.5-7B-Instruct with full parameter fine-tuning, combining open-source financial Q&A datasets with high-quality chain-of-thought reasoning data.
+## 数据集 / Training Data
+### 数据来源 / Data Sources
+1. **开源金融问答数据集** / Open-source financial Q&A datasets
+2. **DeepSeek-R1 生成的思维链数据** / Chain-of-thought data generated by DeepSeek-R1
+   - 使用 DeepSeek-R1 进行推理生成思维链数据 / Use DeepSeek-R1 for inference to generate chain-of-thought data
+   - 通过 GPT-5 对生成的回答进行质量评分 / Quality scoring of generated responses using GPT-5
+   - 筛选高质量回答作为训练数据 / Select high-quality responses as training data
+### 数据内容 / Data Content
+- **基础金融知识问答** / Basic financial knowledge Q&A
+- **金融计算题** / Financial calculation problems
+- **金融概念解释** / Financial concept explanations
+- **思维链推理** / Chain-of-thought reasoning
+数据质量控制：使用 GPT-5 对 DeepSeek-R1 的回答进行评分，只选择高质量的回答作为 SFT 训练数据。
+Quality control: GPT-5 was used to score DeepSeek-R1's responses, and only high-quality answers were selected as SFT training data.
+## 训练详情 / Training Details
+### 基础模型 / Base Model
+- **模型 / Model**: Qwen2.5-7B-Instruct
+- **微调方式 / Fine-tuning Method**: 全量微调 (Full Fine-tuning)
+- **训练类型 / Training Type**: 监督微调 (Supervised Fine-Tuning, SFT)
+### 训练环境 / Training Environment
+- **硬件 / Hardware**: 8 × NVIDIA A100 GPU
+- **分布式训练 / Distributed Training**: 多GPU并行训练 (Multi-GPU parallel training)
+### 训练超参数 / Training Hyperparameters
+- **学习率 / Learning Rate**: 1e-05
+- **训练批次大小 / Train Batch Size**: 1
+- **评估批次大小 / Eval Batch Size**: 8
+- **随机种子 / Seed**: 42
+- **分布式类型 / Distributed Type**: multi-GPU
+- **设备数量 / Number of Devices**: 8
+- **梯度累积步数 / Gradient Accumulation Steps**: 16
+- **总训练批次大小 / Total Train Batch Size**: 128
+- **总评估批次大小 / Total Eval Batch Size**: 64
+- **优化器 / Optimizer**: AdamW (betas=(0.9,0.999), epsilon=1e-08)
+- **学习率调度器 / LR Scheduler**: Linear
+- **预热比例 / Warmup Ratio**: 0.03
+- **训练轮数 / Epochs**: 2.0
+### 训练结果 / Training Results
+- **最终训练损失 / Final Training Loss**: 0.7332
+- **训练步数 / Training Steps**: 312
+- **训练时长 / Training Runtime**: 6450.97 秒 (seconds)
+- **训练样本处理速度 / Samples per Second**: 6.168
+- **训练步骤处理速度 / Steps per Second**: 0.048
+## 快速开始 / Quick Start
+### 模型推理 / Model Inference
+我们提供了一个简单的推理脚本 `inference.py`，可以直接使用模型进行金融问答。
+We provide a simple inference script `inference.py` for direct financial Q&A using the model.
+#### 使用方法 / Usage
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# 使用你本地的检查点路径 / Use your local checkpoint path
+model_path = "/root/Qwen2.5-7B-Instruct-R1-forfinance/"
+# 加载模型和分词器 / Load model and tokenizer
+print("正在加载模型... / Loading model...")
+model = AutoModelForCausalLM.from_pretrained(
+    model_path,
+    torch_dtype=torch.bfloat16,  # 根据config.json中的torch_dtype
+    device_map="auto",
+    trust_remote_code=True  # 如果需要的话
+)
+tokenizer = AutoTokenizer.from_pretrained(
+    model_path,
+    trust_remote_code=True
+)
+print("模型加载完成！/ Model loaded successfully!")
+# 准备输入 / Prepare input
+prompt = "假设你是一位金融行业专家，请回答下列问题。\n在宏观分析中，描述在既定利率水平下产品市场达到均衡状态的曲线是什么？\n请一步步思考。"
+messages = [
+    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
+    {"role": "user", "content": prompt}
+]
+# 应用聊天模板 / Apply chat template
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+# 编码输入 / Encode input
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+# 生成回答 / Generate response
+print("正在生成回答... / Generating response...")
+with torch.no_grad():  # 节省显存 / Save GPU memory
+    generated_ids = model.generate(
+        **model_inputs,
+        max_new_tokens=2048,
+        do_sample=True,
+        temperature=0.7,
+        top_p=0.8,
+        repetition_penalty=1.05,
+        pad_token_id=tokenizer.eos_token_id
+    )
+# 解码生成的tokens / Decode generated tokens
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+# 输出结果 / Output result
+print("模型回答 / Model Response:")
+print(response)
+```
+#### 运行推理脚本 / Run Inference Script
+```bash
+# 确保模型路径正确 / Ensure the model path is correct
+python inference.py
+```
+### 环境要求 / Requirements
+- **Python**: ≥ 3.8
+- **PyTorch**: ≥ 2.0
+- **Transformers**: ≥ 4.55.0
+- **GPU**: 建议使用 NVIDIA GPU with CUDA support
+- **显存 / GPU Memory**: 建议 ≥ 16GB (推荐 24GB+)
+## 后续计划 / Future Plans
+**强化学习训练** / Reinforcement Learning Training
+- 计划使用 GRPO (Group Relative Policy Optimization) 进行强化学习训练 / Plan to use GRPO for reinforcement learning training
+- 进一步提升模型在金融领域的表现和安全性 / Further improve model performance and safety in the financial domain
+We plan to conduct reinforcement learning training using GRPO (Group Relative Policy Optimization) to further improve the model's performance and safety in the financial domain.
+## 使用场景 / Use Cases
+- **金融知识问答** / Financial knowledge Q&A
+- **金融计算和分析** / Financial calculations and analysis
+- **投资建议咨询** / Investment advice consultation
+- **金融概念解释** / Financial concept explanations
+- **风险评估** / Risk assessment
+## 限制和注意事项 / Limitations and Disclaimers
+⚠️ **重要提醒** / Important Notice:
+- 本模型仅供学习和研究使用，不构成投资建议 / This model is for educational and research purposes only and does not constitute investment advice
+- 在实际应用中请谨慎使用，并结合专业判断 / Please use with caution in practical applications and combine with professional judgment
+- 模型可能存在幻觉和错误，请进行事实核查 / The model may have hallucinations and errors, please fact-check the outputs
+⚠️ This model is for educational and research purposes only and does not constitute investment advice. Please use with caution in practical applications and combine with professional judgment. The model may have hallucinations and errors, please fact-check the outputs.
+## 技术框架版本 / Framework Versions
+- **Transformers**: 4.55.0
+- **PyTorch**: 2.6.0+cu124
+- **Datasets**: 3.6.0
+- **Tokenizers**: 0.21.1