unixcoder-base-onnx / README.md
sailesh27's picture
Add UniXcoder ONNX model for Transformers.js
229a4c5 verified
---
language:
- en
- code
license: apache-2.0
library_name: transformers.js
tags:
- code
- embeddings
- onnx
- transformers.js
- semantic-search
- code-search
pipeline_tag: feature-extraction
base_model: microsoft/unixcoder-base
---
# UniXcoder ONNX for Code Search
**Converted by [VibeAtlas](https://vibeatlas.dev)** - AI Context Optimization for Developers
This is [Microsoft's UniXcoder](https://huggingface.co/microsoft/unixcoder-base) converted to ONNX format for use with **Transformers.js** in browser and Node.js environments.
## Why UniXcoder?
UniXcoder understands code **semantically**, not just as text:
- Trained on 6 programming languages (Python, Java, JavaScript, PHP, Ruby, Go)
- Understands AST structure and data flow
- 20-30% better code search accuracy vs generic embedding models
## Quick Start
### Transformers.js (Browser/Node.js)
```javascript
import { pipeline } from '@huggingface/transformers';
const embedder = await pipeline(
'feature-extraction',
'sailesh27/unixcoder-base-onnx'
);
const code = `function authenticate(user) {
return user.isValid && user.hasPermission;
}`;
const embedding = await embedder(code, {
pooling: 'mean',
normalize: true
});
console.log(embedding.dims); // [1, 768]
```
### Semantic Code Search
```javascript
import { pipeline, cos_sim } from '@huggingface/transformers';
const embedder = await pipeline('feature-extraction', 'sailesh27/unixcoder-base-onnx');
// Index your code
const codeSnippets = [
'function login(user, pass) { ... }',
'function formatDate(date) { ... }',
'function validateEmail(email) { ... }'
];
const codeEmbeddings = await embedder(codeSnippets, { pooling: 'mean', normalize: true });
// Search with natural language
const query = 'user authentication';
const queryEmbedding = await embedder(query, { pooling: 'mean', normalize: true });
// Find most similar
const similarities = codeEmbeddings.tolist().map((emb, i) => ({
code: codeSnippets[i],
score: cos_sim(queryEmbedding.tolist()[0], emb)
}));
```
## Technical Details
- **Architecture**: RoBERTa-based encoder
- **Hidden Size**: 768
- **Max Sequence Length**: 512 tokens
- **Output Dimensions**: 768
- **ONNX Opset**: 14
## About VibeAtlas
**VibeAtlas** is the reliability infrastructure for AI coding:
- Reduce AI token costs by 40-60%
- Improve code search accuracy with semantic understanding
- Add governance guardrails to AI workflows
**Links**:
- [Website](https://vibeatlas.dev)
- [VS Code Extension](https://marketplace.visualstudio.com/items?itemName=vibeatlas.vibeatlas)
- [GitHub](https://github.com/vibeatlas)
## Citation
```bibtex
@misc{unixcoder-onnx-2025,
title={UniXcoder ONNX: Code Embeddings for JavaScript},
author={VibeAtlas Team},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/sailesh27/unixcoder-base-onnx}
}
```
### Original UniXcoder Paper
```bibtex
@inproceedings{guo2022unixcoder,
title={UniXcoder: Unified Cross-Modal Pre-training for Code Representation},
author={Guo, Daya and Lu, Shuai and Duan, Nan and Wang, Yanlin and Zhou, Ming and Yin, Jian},
booktitle={ACL},
year={2022}
}
```
## License
Apache 2.0 (same as original UniXcoder)