File size: 3,209 Bytes
229a4c5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
---
language:
- en
- code
license: apache-2.0
library_name: transformers.js
tags:
- code
- embeddings
- onnx
- transformers.js
- semantic-search
- code-search
pipeline_tag: feature-extraction
base_model: microsoft/unixcoder-base
---
# UniXcoder ONNX for Code Search
**Converted by [VibeAtlas](https://vibeatlas.dev)** - AI Context Optimization for Developers
This is [Microsoft's UniXcoder](https://huggingface.co/microsoft/unixcoder-base) converted to ONNX format for use with **Transformers.js** in browser and Node.js environments.
## Why UniXcoder?
UniXcoder understands code **semantically**, not just as text:
- Trained on 6 programming languages (Python, Java, JavaScript, PHP, Ruby, Go)
- Understands AST structure and data flow
- 20-30% better code search accuracy vs generic embedding models
## Quick Start
### Transformers.js (Browser/Node.js)
```javascript
import { pipeline } from '@huggingface/transformers';
const embedder = await pipeline(
'feature-extraction',
'sailesh27/unixcoder-base-onnx'
);
const code = `function authenticate(user) {
return user.isValid && user.hasPermission;
}`;
const embedding = await embedder(code, {
pooling: 'mean',
normalize: true
});
console.log(embedding.dims); // [1, 768]
```
### Semantic Code Search
```javascript
import { pipeline, cos_sim } from '@huggingface/transformers';
const embedder = await pipeline('feature-extraction', 'sailesh27/unixcoder-base-onnx');
// Index your code
const codeSnippets = [
'function login(user, pass) { ... }',
'function formatDate(date) { ... }',
'function validateEmail(email) { ... }'
];
const codeEmbeddings = await embedder(codeSnippets, { pooling: 'mean', normalize: true });
// Search with natural language
const query = 'user authentication';
const queryEmbedding = await embedder(query, { pooling: 'mean', normalize: true });
// Find most similar
const similarities = codeEmbeddings.tolist().map((emb, i) => ({
code: codeSnippets[i],
score: cos_sim(queryEmbedding.tolist()[0], emb)
}));
```
## Technical Details
- **Architecture**: RoBERTa-based encoder
- **Hidden Size**: 768
- **Max Sequence Length**: 512 tokens
- **Output Dimensions**: 768
- **ONNX Opset**: 14
## About VibeAtlas
**VibeAtlas** is the reliability infrastructure for AI coding:
- Reduce AI token costs by 40-60%
- Improve code search accuracy with semantic understanding
- Add governance guardrails to AI workflows
**Links**:
- [Website](https://vibeatlas.dev)
- [VS Code Extension](https://marketplace.visualstudio.com/items?itemName=vibeatlas.vibeatlas)
- [GitHub](https://github.com/vibeatlas)
## Citation
```bibtex
@misc{unixcoder-onnx-2025,
title={UniXcoder ONNX: Code Embeddings for JavaScript},
author={VibeAtlas Team},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/sailesh27/unixcoder-base-onnx}
}
```
### Original UniXcoder Paper
```bibtex
@inproceedings{guo2022unixcoder,
title={UniXcoder: Unified Cross-Modal Pre-training for Code Representation},
author={Guo, Daya and Lu, Shuai and Duan, Nan and Wang, Yanlin and Zhou, Ming and Yin, Jian},
booktitle={ACL},
year={2022}
}
```
## License
Apache 2.0 (same as original UniXcoder)
|