ai-code-analyzer / INTEGRATION_GUIDE.md
arun3676
Initial commit: AI Code Analyzer with fine-tuned model support
5c6379b
|
raw
history blame
9.02 kB

Integration Guide: Fine-tuned Model with Your Code Analyzer

This guide explains how to integrate your fine-tuned DeepSeek model with the existing code analyzer app.

πŸ“‹ What You Have Now

After completing the Colab training, you have:

  • βœ… Fine-tuned DeepSeek model adapters (~20MB)
  • βœ… Enhanced analyzer class supporting both models
  • βœ… Original CodeT5+ model still working
  • βœ… All existing UI features preserved

πŸ”„ Integration Steps

Step 1: Download Your Fine-tuned Model from Colab

In your final Colab cell, you saved the model to Google Drive. Now download it:

Option A: From Google Drive

  1. Go to Google Drive β†’ MyDrive/ai-code-analyzer/
  2. Download the fine-tuned-analyst folder
  3. Place it in your project root: C:\Users\arunk\professional\ai-code-analyzer\fine-tuned-analyst\

Option B: Download Directly from Colab

# Run this in Colab to create a downloadable ZIP
import shutil
shutil.make_archive('fine-tuned-analyst', 'zip', './fine-tuned-analyst')

from google.colab import files
files.download('fine-tuned-analyst.zip')

Then extract the ZIP in your project root.

Step 2: Install Required Dependencies

Update your requirements.txt to include PEFT:

# Add this line to requirements.txt
peft>=0.7.0

Install it:

pip install peft

Step 3: Test the Enhanced Analyzer Locally

Run the test script to verify everything works:

python optimized_code_analyzer_enhanced.py

You should see:

  • βœ… CodeT5+ analysis
  • βœ… Fine-tuned DeepSeek analysis
  • βœ… Model comparison

Step 4: Update Your Streamlit UI

Replace the analyzer import in matrix_final.py:

Find this (around line 8):

from optimized_code_analyzer import OptimizedCodeAnalyzer

Replace with:

from optimized_code_analyzer_enhanced import EnhancedCodeAnalyzer

Find this (around line 287):

@st.cache_resource
def get_local_analyzer():
    return OptimizedCodeAnalyzer(
        model_id="Salesforce/codet5p-220m",
        precision="fp16",
        quick_max_new_tokens=180,
        detailed_max_new_tokens=240,
    )

Replace with:

@st.cache_resource
def get_local_analyzer(model_type="codet5"):
    return EnhancedCodeAnalyzer(
        model_type=model_type,
        precision="fp16",
        quick_max_new_tokens=180,
        detailed_max_new_tokens=300,
    )

Step 5: Add Model Selector to Sidebar

Add this to your sidebar (around line 490, in the sidebar section):

# Model Selection
st.sidebar.markdown("---")
st.sidebar.markdown("### πŸ€– AI Model Selection")
model_choice = st.sidebar.radio(
    "Choose Analysis Model:",
    ["CodeT5+ (Fast)", "Fine-tuned DeepSeek (Accurate)"],
    help="CodeT5+ is faster, Fine-tuned model gives more detailed analysis"
)

model_type = "codet5" if "CodeT5+" in model_choice else "deepseek-finetuned"

Step 6: Update the Analysis Call

Find where the analyzer is called (around line 600+) and update it:

Find something like:

local_analyzer = get_local_analyzer()
result = local_analyzer.analyze_code_fast(code)

Replace with:

local_analyzer = get_local_analyzer(model_type)
result = local_analyzer.analyze_code_fast(code)

Step 7: Test Everything

Run your Streamlit app:

streamlit run matrix_final.py

Test both models:

  1. Select "CodeT5+ (Fast)" β†’ Run analysis β†’ Should work as before
  2. Select "Fine-tuned DeepSeek (Accurate)" β†’ Run analysis β†’ Should give detailed analysis with quality scores

πŸ“Š What Each Model Does

CodeT5+ (Base Model)

  • Speed: ⚑ Fast (2-3 seconds)
  • Memory: ~1GB
  • Analysis: General code analysis
  • Best for: Quick checks, batch processing
  • Quality: Good for basic issues

Fine-tuned DeepSeek (Your Model)

  • Speed: πŸš€ Moderate (3-5 seconds)
  • Memory: ~1.5GB
  • Analysis: Detailed with quality scores (1-100)
  • Best for: Deep analysis, learning, production code
  • Quality: Excellent - trained on your specific patterns
  • Output format:
    • Quality Score (1-100)
    • Bugs section
    • Performance issues
    • Security concerns
    • Improvement suggestions with code examples

🎯 Key Features of the Enhanced System

1. Dual Model Support

  • Seamlessly switch between models
  • Separate caching for each model
  • No breaking changes to existing code

2. Improved Analysis Quality

Your fine-tuned model provides:

  • Structured output: Quality score, bugs, performance, security
  • Code examples: Shows how to fix issues
  • Contextual understanding: Trained on your dataset patterns
  • Consistent formatting: Always includes all sections

3. Memory Efficient

  • LoRA adapters are tiny (~20MB vs 1GB+ full model)
  • Base model shared, adapters loaded on demand
  • Can deploy both models without doubling memory

πŸš€ Deployment Options

Option 1: Local Deployment (Current)

Pros:

  • Free
  • Fast
  • Full control
  • Easy testing

Cons:

  • Only you can use it
  • Needs your computer running

Setup: Already working! Just use Streamlit locally.

Option 2: Hugging Face Spaces (Recommended)

Pros:

  • FREE hosting
  • Automatic HTTPS
  • Share with anyone
  • GPU available (paid tier)

Setup:

  1. Create account on huggingface.co
  2. Create new Space (Streamlit)
  3. Upload files:
    • matrix_final.py
    • optimized_code_analyzer_enhanced.py
    • requirements.txt
    • fine-tuned-analyst/ folder
  4. Add app.py:
# app.py (for HF Spaces)
import subprocess
subprocess.run(["streamlit", "run", "matrix_final.py"])

Option 3: Railway.app

Cost: $5/month Memory: Up to 8GB Speed: Faster than HF Spaces

Setup:

  1. Connect GitHub repo
  2. Set start command: streamlit run matrix_final.py --server.port $PORT
  3. Deploy

Option 4: Render.com

Cost: FREE tier available Memory: 512MB (might be tight) Speed: Good

Setup:

  1. Connect repo
  2. Use Docker:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD streamlit run matrix_final.py --server.port $PORT

πŸ› Troubleshooting

Issue: "fine-tuned-analyst folder not found"

Solution: Make sure the folder is in your project root with these files:

fine-tuned-analyst/
β”œβ”€β”€ adapter_config.json
β”œβ”€β”€ adapter_model.bin (or adapter_model.safetensors)
β”œβ”€β”€ tokenizer_config.json
└── special_tokens_map.json

Issue: "PEFT not installed"

Solution:

pip install peft

Issue: "Model too slow"

Solution:

  • Use "quick" mode instead of "detailed"
  • Reduce max_new_tokens to 150
  • Use INT8 or INT4 quantization

Issue: "Out of memory"

Solution:

  • Close other applications
  • Use CodeT5+ instead (smaller)
  • Enable quantization: precision="int8"

πŸ“š Understanding the Libraries Used

Core Libraries

Transformers (transformers)

  • What: Hugging Face's library for AI models
  • Does: Loads models, tokenizers, handles generation
  • Used for: Loading DeepSeek and CodeT5+ models

PEFT (peft)

  • What: Parameter Efficient Fine-Tuning
  • Does: Loads LoRA adapters efficiently
  • Used for: Your fine-tuned model adapters

PyTorch (torch)

  • What: Deep learning framework
  • Does: Runs neural networks on GPU/CPU
  • Used for: Model inference, tensor operations

Streamlit (streamlit)

  • What: Web app framework for Python
  • Does: Creates interactive UI
  • Used for: Your code analyzer interface

How They Work Together

User Input (Streamlit)
    ↓
EnhancedCodeAnalyzer
    ↓
Transformers (loads base model)
    ↓
PEFT (loads adapters)
    ↓
PyTorch (runs inference)
    ↓
Result β†’ Streamlit UI

πŸŽ“ Next Steps

  1. Test both models with various code samples
  2. Compare quality - which model works better for your use cases?
  3. Expand dataset - Add more samples and retrain (only takes 20 minutes!)
  4. Deploy - Choose a hosting option and share with others
  5. Iterate - Collect feedback and improve

πŸ’‘ Tips for Best Results

When to Use CodeT5+

  • Quick syntax checks
  • Batch processing many files
  • Resource-constrained environments
  • Simple code reviews

When to Use Fine-tuned DeepSeek

  • Production code reviews
  • Learning/education
  • Complex analysis needed
  • When quality > speed
  • Security audits

πŸŽ‰ Congratulations!

You've successfully:

  • βœ… Fine-tuned a language model
  • βœ… Integrated it with your app
  • βœ… Created a dual-model system
  • βœ… Learned about model deployment
  • βœ… Built a production-ready tool

Your code analyzer now has:

  • 2 AI models to choose from
  • Professional quality analysis
  • Scalable architecture for future improvements
  • Production-ready code

πŸ“ž Support

If you need help:

  1. Check error messages carefully
  2. Review this guide
  3. Test with simple code first
  4. Compare with working examples
  5. Ask for help with specific errors

Happy coding! πŸš€