File size: 3,209 Bytes

229a4c5

---
language:
  - en
  - code
license: apache-2.0
library_name: transformers.js
tags:
  - code
  - embeddings
  - onnx
  - transformers.js
  - semantic-search
  - code-search
pipeline_tag: feature-extraction
base_model: microsoft/unixcoder-base
---

# UniXcoder ONNX for Code Search

**Converted by [VibeAtlas](https://vibeatlas.dev)** - AI Context Optimization for Developers

This is [Microsoft's UniXcoder](https://huggingface.co/microsoft/unixcoder-base) converted to ONNX format for use with **Transformers.js** in browser and Node.js environments.

## Why UniXcoder?

UniXcoder understands code **semantically**, not just as text:
- Trained on 6 programming languages (Python, Java, JavaScript, PHP, Ruby, Go)
- Understands AST structure and data flow
- 20-30% better code search accuracy vs generic embedding models

## Quick Start

### Transformers.js (Browser/Node.js)

```javascript
import { pipeline } from '@huggingface/transformers';

const embedder = await pipeline(
  'feature-extraction',
  'sailesh27/unixcoder-base-onnx'
);

const code = `function authenticate(user) {
  return user.isValid && user.hasPermission;
}`;

const embedding = await embedder(code, {
  pooling: 'mean',
  normalize: true
});

console.log(embedding.dims); // [1, 768]
```

### Semantic Code Search

```javascript
import { pipeline, cos_sim } from '@huggingface/transformers';

const embedder = await pipeline('feature-extraction', 'sailesh27/unixcoder-base-onnx');

// Index your code
const codeSnippets = [
  'function login(user, pass) { ... }',
  'function formatDate(date) { ... }',
  'function validateEmail(email) { ... }'
];

const codeEmbeddings = await embedder(codeSnippets, { pooling: 'mean', normalize: true });

// Search with natural language
const query = 'user authentication';
const queryEmbedding = await embedder(query, { pooling: 'mean', normalize: true });

// Find most similar
const similarities = codeEmbeddings.tolist().map((emb, i) => ({
  code: codeSnippets[i],
  score: cos_sim(queryEmbedding.tolist()[0], emb)
}));
```

## Technical Details

- **Architecture**: RoBERTa-based encoder
- **Hidden Size**: 768
- **Max Sequence Length**: 512 tokens
- **Output Dimensions**: 768
- **ONNX Opset**: 14

## About VibeAtlas

**VibeAtlas** is the reliability infrastructure for AI coding:

- Reduce AI token costs by 40-60%
- Improve code search accuracy with semantic understanding
- Add governance guardrails to AI workflows

**Links**:
- [Website](https://vibeatlas.dev)
- [VS Code Extension](https://marketplace.visualstudio.com/items?itemName=vibeatlas.vibeatlas)
- [GitHub](https://github.com/vibeatlas)

## Citation

```bibtex
@misc{unixcoder-onnx-2025,
  title={UniXcoder ONNX: Code Embeddings for JavaScript},
  author={VibeAtlas Team},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/sailesh27/unixcoder-base-onnx}
}
```

### Original UniXcoder Paper

```bibtex
@inproceedings{guo2022unixcoder,
  title={UniXcoder: Unified Cross-Modal Pre-training for Code Representation},
  author={Guo, Daya and Lu, Shuai and Duan, Nan and Wang, Yanlin and Zhou, Ming and Yin, Jian},
  booktitle={ACL},
  year={2022}
}
```

## License

Apache 2.0 (same as original UniXcoder)