Mify-Coder-2.5B / README.md
srkchowdary2000's picture
Update README.md
46cc8d0 verified
|
raw
history blame
2.47 kB
metadata
{}

Model Summary: Mify-Coder-2.5B

Overview

Mify-Coder-2.5B-8K is a 2.5B-parameter code-focused language model. It delivers frontier-grade performance in code generation, reasoning, and function calling tasks while maintaining compute efficiency and enterprise-grade safety. Unlike scale-first paradigms, Mify-Coder demonstrates that smaller models can achieve competitive results through principled data curation and optimized training strategies.

Developed by: Infosys Ltd.


Architecture & Training

  • Base Model: Mify-2.5B
  • Training Phases:
    • Continual Pretraining (CPT): Next-token prediction with Fill-in-the-Middle (FIM) for structural infilling.
    • Supervised Fine-Tuning (SFT): Instruction alignment for coding tasks, multi-turn dialogues, function calling, and safety.
  • Optimization:
    • BF16 mixed precision, Grouped Query Attention (GQA), and Distributed Fused Adam optimizer.
    • Specialized tokenization with syntax markers and reasoning tokens for advanced behaviors.

Performance Highlights

Category Benchmark # Shots Metric Scores
Code Gen MBPP 0 pass@1 89.23%
Code Gen MBPP+ 0 pass@1 88.89%
Code Gen HumanEval 0 pass@1 53.05%
Code Gen HumanEval+ 0 pass@1 46.95%
Code Gen NumpyEval 0 pass@1 56.44%
Code Gen PandasEval 0 pass@1 53.47%
Tool Use BFCL v1 0 acc 79.19%
Tool Use BFCL v2 0 acc 55.26%
  • Outperforms larger models on algorithmic reasoning tasks while maintaining competitive general coding and security-oriented capabilities.

Responsible AI & Safety

  • Integrated safety objectives during SFT.
  • Balanced harmful/general sample ratio (1:4) for secure code generation and ethical language use.
  • Validated against Stanford AirBench and CyberSecEval benchmarks.

Deployment & Future Work

  • Quantization: FP8 and AWQ for efficient inference; optimized with TensorRT-LLM.