|
|
--- |
|
|
language: |
|
|
- en |
|
|
- code |
|
|
license: apache-2.0 |
|
|
library_name: transformers.js |
|
|
tags: |
|
|
- code |
|
|
- embeddings |
|
|
- onnx |
|
|
- transformers.js |
|
|
- semantic-search |
|
|
- code-search |
|
|
pipeline_tag: feature-extraction |
|
|
base_model: microsoft/unixcoder-base |
|
|
--- |
|
|
|
|
|
# UniXcoder ONNX for Code Search |
|
|
|
|
|
**Converted by [VibeAtlas](https://vibeatlas.dev)** - AI Context Optimization for Developers |
|
|
|
|
|
This is [Microsoft's UniXcoder](https://huggingface.co/microsoft/unixcoder-base) converted to ONNX format for use with **Transformers.js** in browser and Node.js environments. |
|
|
|
|
|
## Why UniXcoder? |
|
|
|
|
|
UniXcoder understands code **semantically**, not just as text: |
|
|
- Trained on 6 programming languages (Python, Java, JavaScript, PHP, Ruby, Go) |
|
|
- Understands AST structure and data flow |
|
|
- 20-30% better code search accuracy vs generic embedding models |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### Transformers.js (Browser/Node.js) |
|
|
|
|
|
```javascript |
|
|
import { pipeline } from '@huggingface/transformers'; |
|
|
|
|
|
const embedder = await pipeline( |
|
|
'feature-extraction', |
|
|
'sailesh27/unixcoder-base-onnx' |
|
|
); |
|
|
|
|
|
const code = `function authenticate(user) { |
|
|
return user.isValid && user.hasPermission; |
|
|
}`; |
|
|
|
|
|
const embedding = await embedder(code, { |
|
|
pooling: 'mean', |
|
|
normalize: true |
|
|
}); |
|
|
|
|
|
console.log(embedding.dims); // [1, 768] |
|
|
``` |
|
|
|
|
|
### Semantic Code Search |
|
|
|
|
|
```javascript |
|
|
import { pipeline, cos_sim } from '@huggingface/transformers'; |
|
|
|
|
|
const embedder = await pipeline('feature-extraction', 'sailesh27/unixcoder-base-onnx'); |
|
|
|
|
|
// Index your code |
|
|
const codeSnippets = [ |
|
|
'function login(user, pass) { ... }', |
|
|
'function formatDate(date) { ... }', |
|
|
'function validateEmail(email) { ... }' |
|
|
]; |
|
|
|
|
|
const codeEmbeddings = await embedder(codeSnippets, { pooling: 'mean', normalize: true }); |
|
|
|
|
|
// Search with natural language |
|
|
const query = 'user authentication'; |
|
|
const queryEmbedding = await embedder(query, { pooling: 'mean', normalize: true }); |
|
|
|
|
|
// Find most similar |
|
|
const similarities = codeEmbeddings.tolist().map((emb, i) => ({ |
|
|
code: codeSnippets[i], |
|
|
score: cos_sim(queryEmbedding.tolist()[0], emb) |
|
|
})); |
|
|
``` |
|
|
|
|
|
## Technical Details |
|
|
|
|
|
- **Architecture**: RoBERTa-based encoder |
|
|
- **Hidden Size**: 768 |
|
|
- **Max Sequence Length**: 512 tokens |
|
|
- **Output Dimensions**: 768 |
|
|
- **ONNX Opset**: 14 |
|
|
|
|
|
## About VibeAtlas |
|
|
|
|
|
**VibeAtlas** is the reliability infrastructure for AI coding: |
|
|
|
|
|
- Reduce AI token costs by 40-60% |
|
|
- Improve code search accuracy with semantic understanding |
|
|
- Add governance guardrails to AI workflows |
|
|
|
|
|
**Links**: |
|
|
- [Website](https://vibeatlas.dev) |
|
|
- [VS Code Extension](https://marketplace.visualstudio.com/items?itemName=vibeatlas.vibeatlas) |
|
|
- [GitHub](https://github.com/vibeatlas) |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{unixcoder-onnx-2025, |
|
|
title={UniXcoder ONNX: Code Embeddings for JavaScript}, |
|
|
author={VibeAtlas Team}, |
|
|
year={2025}, |
|
|
publisher={Hugging Face}, |
|
|
url={https://huggingface.co/sailesh27/unixcoder-base-onnx} |
|
|
} |
|
|
``` |
|
|
|
|
|
### Original UniXcoder Paper |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{guo2022unixcoder, |
|
|
title={UniXcoder: Unified Cross-Modal Pre-training for Code Representation}, |
|
|
author={Guo, Daya and Lu, Shuai and Duan, Nan and Wang, Yanlin and Zhou, Ming and Yin, Jian}, |
|
|
booktitle={ACL}, |
|
|
year={2022} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 (same as original UniXcoder) |
|
|
|