Instructions to use codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-coder-0.5b-instruct-bnb-4bit") model = PeftModel.from_pretrained(base_model, "codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora", max_seq_length=2048, )
- codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora
codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora
π Progressive Context Extension to 2.0M Tokens
This is a progressive LoRA adapter that extends Qwen/Qwen2.5-Coder-0.5B-Instruct to handle 2.0 MILLION token contexts through curriculum learning.
Part of the Ellora project - Recipe #4: Progressive Long Context Extension.
π― Key Features
- Final Context: 2,000,000 tokens (62x base model)
- Training Method: Hybrid approach with vLLM + Unsloth optimizations
- Data Generation: vLLM for 10x+ faster task generation
- Training: Unsloth for memory-efficient progressive training
- Single Adapter: One LoRA handles all context lengths up to 2000K
- Use Cases:
- Entire codebase analysis
- Multi-repository understanding
- Large-scale code generation
- Cross-file dependency analysis
π Training Progression
The model was trained progressively through these stages:
- Stage 1: 32K tokens (loss: 0.4882)
- Stage 2: 128K tokens (loss: 0.0641)
- Stage 3: 512K tokens (loss: 0.1327)
- Stage 4: 2000K tokens (loss: 0.0484)
Performance Metrics
- Final Training Loss: 0.0484
- Total Training Time: 0.17 hours
- Peak Memory Usage: 4.7 GB
- LoRA Rank: 64
- LoRA Alpha: 128
π§ Usage with Unsloth
from unsloth import FastLanguageModel
from transformers import TextStreamer
# Load model with Unsloth (automatically handles 2M context!)
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora",
max_seq_length=2000000,
dtype=None, # Auto-detect
load_in_4bit=True,
)
# Enable native fast generation
FastLanguageModel.for_inference(model)
# Example: Analyze a large codebase
prompt = """Repository Context:
[Your repository content up to 2000K tokens]
Question: Analyze the overall architecture and provide improvement suggestions.
Answer:"""
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2000000)
streamer = TextStreamer(tokenizer)
outputs = model.generate(
**inputs,
streamer=streamer,
max_new_tokens=1024,
temperature=0.7,
do_sample=True
)
π§ Usage with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load base model
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-Coder-0.5B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
attn_implementation="flash_attention_2"
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
# Load the progressive adapter
model = PeftModel.from_pretrained(model, "codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora")
# Now you can use contexts up to 2000K tokens!
π Progressive Training Details
This adapter was trained using a novel progressive curriculum approach with hybrid optimizations:
- Stage 1 (32K): Basic file-level understanding
- Stage 2 (128K): Multi-file repository comprehension
- Stage 3 (512K): Large repository analysis
- Stage 4 (2M): Massive codebase understanding
Each stage included data from all previous stages, allowing the model to maintain and build upon its learned capabilities.
π οΈ Training Configuration
Progressive Stages: 32K β 128K β 512K β 2000K
Final Context: 2000K tokens
Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct
Data Generation: vLLM (fast batch inference)
Training: Unsloth (memory-efficient training)
LoRA Rank: 64
LoRA Alpha: 128
Learning Rate: 0.0002
Batch Size: 1
Gradient Accumulation: 4
π Optimizations Used
Data Generation (vLLM)
- Batch Generation: Process multiple prompts simultaneously
- Optimized Memory: GPU memory utilization tuning
- Fast Inference: 10x+ faster than sequential generation
Training (Unsloth)
- Custom CUDA Kernels: 2-5x training speedup
- Flash Attention 2: Efficient attention computation
- Gradient Checkpointing: Memory-efficient backprop
- 4-bit Quantization: Reduced memory footprint
- RSLoRA: Rank-stabilized LoRA for better convergence
π Evaluation Tasks
The model excels at:
- Complete repository architectural analysis
- Cross-file dependency tracing
- Large-scale refactoring suggestions
- Security vulnerability detection across entire codebases
- Test coverage analysis
- Documentation generation for entire projects
π Achievements
- Successfully extended context from 32K β 2000K tokens
- Hybrid optimization: vLLM for generation + Unsloth for training
- Single adapter handles all context lengths
- Memory-efficient training on single H100 GPU
- Real repository understanding, not just synthetic data
π Links
This model is part of the Ellora project - standardized recipes for enhancing LLM capabilities.
- Downloads last month
- 1
Model tree for codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora
Base model
Qwen/Qwen2.5-0.5B