Sarthak commited on
Commit
6742590
·
1 Parent(s): 12d70ca

docs: rename model to codemalt and update evaluation instructions

Browse files

This commit updates the README to reflect the new model name and simplifies the evaluation instructions.

Files changed (1) hide show
  1. README.md +4 -6
README.md CHANGED
@@ -4,7 +4,7 @@ library_name: distiller
4
  license: apache-2.0
5
  license_name: apache-2.0
6
  license_link: LICENSE
7
- model_name: codemalt-base-8m
8
  tags:
9
  - code-search
10
  - code-embeddings
@@ -24,9 +24,9 @@ language:
24
  pipeline_tag: feature-extraction
25
  ---
26
 
27
- # CodeMalt-Base-8M
28
 
29
- **CodeMalt-Base-8M** is a high-performance, code-specialized static embedding model created through Model2Vec distillation of `sentence-transformers/all-mpnet-base-v2`. This model achieves **73.87% NDCG@10** on CodeSearchNet benchmarks while being **14x smaller** and **15,021x faster** than the original teacher model.
30
 
31
  ## 🏆 Performance Highlights
32
 
@@ -130,7 +130,7 @@ results = distill.run_local_distillation(
130
 
131
  # Evaluate on CodeSearchNet
132
  evaluation_results = evaluate.run_evaluation(
133
- models=["./code_model2vec/final/codemalt-base-8m"],
134
  max_queries=1000,
135
  languages=["python", "javascript", "java", "go", "php", "ruby"]
136
  )
@@ -152,8 +152,6 @@ analyze.main(
152
  - General-purpose: `sentence-transformers/all-mpnet-base-v2`, `BAAI/bge-m3`
153
  - Instruction-tuned: `Alibaba-NLP/gte-Qwen2-1.5B-instruct`
154
 
155
- - **CodeMalt Model Series**: Our flagship models follow the naming convention `codemalt-base-[N]m` where `[N]m` indicates millions of parameters (e.g., `codemalt-base-8m` has ~7.6 million parameters)
156
-
157
  - **Advanced Training Pipeline**: Optional tokenlearn-based training following the POTION approach:
158
  1. Model2Vec distillation (basic static embeddings)
159
  2. Feature extraction using sentence transformers
 
4
  license: apache-2.0
5
  license_name: apache-2.0
6
  license_link: LICENSE
7
+ model_name: codemalt
8
  tags:
9
  - code-search
10
  - code-embeddings
 
24
  pipeline_tag: feature-extraction
25
  ---
26
 
27
+ # CodeMalt
28
 
29
+ **CodeMalt** is a high-performance, code-specialized static embedding model created through Model2Vec distillation of `sentence-transformers/all-mpnet-base-v2`. This model achieves **73.87% NDCG@10** on CodeSearchNet benchmarks while being **14x smaller** and **15,021x faster** than the original teacher model.
30
 
31
  ## 🏆 Performance Highlights
32
 
 
130
 
131
  # Evaluate on CodeSearchNet
132
  evaluation_results = evaluate.run_evaluation(
133
+ models=["."],
134
  max_queries=1000,
135
  languages=["python", "javascript", "java", "go", "php", "ruby"]
136
  )
 
152
  - General-purpose: `sentence-transformers/all-mpnet-base-v2`, `BAAI/bge-m3`
153
  - Instruction-tuned: `Alibaba-NLP/gte-Qwen2-1.5B-instruct`
154
 
 
 
155
  - **Advanced Training Pipeline**: Optional tokenlearn-based training following the POTION approach:
156
  1. Model2Vec distillation (basic static embeddings)
157
  2. Feature extraction using sentence transformers