sarthak1
/

codemalt

@@ -4,7 +4,7 @@ library_name: distiller
 license: apache-2.0
 license_name: apache-2.0
 license_link: LICENSE
-model_name: codemalt-base-8m
 tags:
 - code-search
 - code-embeddings
@@ -24,9 +24,9 @@ language:
 pipeline_tag: feature-extraction
 ---
-# CodeMalt-Base-8M
-**CodeMalt-Base-8M** is a high-performance, code-specialized static embedding model created through Model2Vec distillation of `sentence-transformers/all-mpnet-base-v2`. This model achieves **73.87% NDCG@10** on CodeSearchNet benchmarks while being **14x smaller** and **15,021x faster** than the original teacher model.
 ## 🏆 Performance Highlights
@@ -130,7 +130,7 @@ results = distill.run_local_distillation(
 # Evaluate on CodeSearchNet
 evaluation_results = evaluate.run_evaluation(
-    models=["./code_model2vec/final/codemalt-base-8m"],
     max_queries=1000,
     languages=["python", "javascript", "java", "go", "php", "ruby"]
 )
@@ -152,8 +152,6 @@ analyze.main(
   - General-purpose: `sentence-transformers/all-mpnet-base-v2`, `BAAI/bge-m3`
   - Instruction-tuned: `Alibaba-NLP/gte-Qwen2-1.5B-instruct`
-- **CodeMalt Model Series**: Our flagship models follow the naming convention `codemalt-base-[N]m` where `[N]m` indicates millions of parameters (e.g., `codemalt-base-8m` has ~7.6 million parameters)
 - **Advanced Training Pipeline**: Optional tokenlearn-based training following the POTION approach:
   1. Model2Vec distillation (basic static embeddings)
   2. Feature extraction using sentence transformers

 license: apache-2.0
 license_name: apache-2.0
 license_link: LICENSE
+model_name: codemalt
 tags:
 - code-search
 - code-embeddings
 pipeline_tag: feature-extraction
 ---
+# CodeMalt
+**CodeMalt** is a high-performance, code-specialized static embedding model created through Model2Vec distillation of `sentence-transformers/all-mpnet-base-v2`. This model achieves **73.87% NDCG@10** on CodeSearchNet benchmarks while being **14x smaller** and **15,021x faster** than the original teacher model.
 ## 🏆 Performance Highlights
 # Evaluate on CodeSearchNet
 evaluation_results = evaluate.run_evaluation(
+    models=["."],
     max_queries=1000,
     languages=["python", "javascript", "java", "go", "php", "ruby"]
 )
   - General-purpose: `sentence-transformers/all-mpnet-base-v2`, `BAAI/bge-m3`
   - Instruction-tuned: `Alibaba-NLP/gte-Qwen2-1.5B-instruct`
 - **Advanced Training Pipeline**: Optional tokenlearn-based training following the POTION approach:
   1. Model2Vec distillation (basic static embeddings)
   2. Feature extraction using sentence transformers