Update README.md
32aa92b verified - Model Card qwen2.5-coder-minimax-lora 1. Model Overview qwen2.5-coder-minimax-lora is a Supervised Fine-Tuned (SFT) LoRA adapter built on top of the Qwen2.5-Coder-7B base model. The objective of this fine-tuning process was to enhance algorithmic reasoning and structured code generation capabilities, particularly for Minimax-based decision-making algorithms and recursive problem-solving tasks. The model uses Parameter-Efficient Fine-Tuning (PEFT) through LoRA to adapt the base model without updating all 7 billion parameters. 2. Base Model Base Model: Qwen Rename README.md to Model Card qwen2.5-coder-minimax-lora 1. Model Overview qwen2.5-coder-minimax-lora is a Supervised Fine-Tuned (SFT) LoRA adapter built on top of the Qwen2.5-Coder-7B base model. The objective of this fine-tuning process was to enhance algorithmic reasoning and structured code generation capabilities, particularly for Minimax-based decision-making algorithms and recursive problem-solving tasks. The model uses Parameter-Efficient Fine-Tuning (PEFT) through LoRA to adapt the base model without updating all 7 billion parameters. 2. Base Model Base Model: Qwen/Qwen2.5-Coder-7B Architecture: Decoder-only Transformer Domain: Code generation and programming assistance Training Type: Instruction-following coding model The base model was loaded in 4-bit precision during training to enable memory-efficient fine-tuning. 3. Fine-Tuning Methodology Supervised Fine-Tuning (SFT) The model was fine-tuned using the TRL SFTTrainer framework. The training process followed a supervised instruction-response format: Instruction: <task description> Response: <model answer> This formatting allows the model to learn structured completion behavior aligned with instruction-following tasks. LoRA Configuration Low-Rank Adaptation (LoRA) was applied using the following configuration: LoRA Rank (r): 16 LoRA Alpha: 16 LoRA Dropout: 0 Bias: none Target Modules: q_proj k_proj v_proj o_proj Only these projection layers were adapted, keeping the base model weights frozen. This reduces memory usage and training cost while preserving base knowledge. Quantization 4-bit quantization enabled during training (QLoRA-style setup) FP16 precision for forward pass Gradient checkpointing enabled This configuration enabled efficient fine-tuning on limited hardware resources. 4. Training Details Dataset: TeichAI/MiniMax-M2.1-Code-SFT Samples used: 200 Number of epochs: 1 Learning rate: 2e-4 Per-device batch size: 1 Gradient accumulation steps: 4 Optimizer: AdamW Save strategy: Disabled (manual save) The dataset primarily focuses on algorithmic reasoning and Minimax-based structured code responses. 5. Model Capabilities After fine-tuning, the model demonstrates improved: Recursive algorithm implementation Game-tree reasoning Minimax logic generation Backtracking-based solutions Structured Python code output Instruction-following consistency in algorithmic tasks The model retains general coding ability from the base Qwen2.5-Coder model while slightly biasing toward structured algorithmic reasoning. 6. Limitations Fine-tuned on a small subset (200 samples), so behavioral shift is moderate. Performance improvements are most noticeable in algorithmic reasoning tasks. Not optimized for production-critical environments. Does not include reinforcement learning or preference alignment. 7. Intended Use This model is suitable for: Educational demonstrations of LoRA fine-tuning Research in parameter-efficient adaptation Algorithmic code generation tasks Game AI and recursive logic generation It is not intended for safety-critical or real-time production systems without additional evaluation. 8. How to Load the Model This repository contains only the LoRA adapter. The base model must be loaded separately. Loading Instructions from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel import torch base_model = "Qwen/Qwen2.5-Coder-7B" lora_repo = "your-username/qwen2.5-coder-minimax-lora" tokenizer = AutoTokenizer.from_pretrained(base_model) model = AutoModelForCausalLM.from_pretrained( base_model, torch_dtype=torch.float16, device_map="auto" ) model = PeftModel.from_pretrained(model, lora_repo) model.eval() Example Inference prompt = "Write a Python function to implement Minimax for Tic Tac Toe with recursion." inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=300, temperature=0.2 ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) Author Priyanka Shilwant AI Full Stack Intern Focus Areas: Generative AI, LLM Fine-Tuning, Applied Machine LearningREADME.md
- 1.57 kB Upload model trained with Unsloth
- 1.38 kB Update README.md
- 1.15 kB Upload model trained with Unsloth
- 40.4 MB Upload model trained with Unsloth
- 632 Bytes Upload model trained with Unsloth
- 1.67 MB Upload model trained with Unsloth
- 616 Bytes Upload model trained with Unsloth
- 11.4 MB Upload model trained with Unsloth
- 4.89 kB Upload model trained with Unsloth
- 2.78 MB Upload model trained with Unsloth