File size: 1,841 Bytes
314c10a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | ---
license: apache-2.0
language:
- en
library_name: transformers.js
tags:
- code
- python
- maincoder
- code-generation
- reinforcement-learning
- mcpo
- onnx
pipeline_tag: text-generation
base_model: Maincode/Maincoder-1B
---
# Maincoder 1B — ONNX (Quantized, WebGPU)
This is a **quantized ONNX** version of [Maincode/Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B), optimized for in-browser inference with [Transformers.js](https://huggingface.co/docs/transformers.js) and WebGPU.
## Quantization
- **Format:** ONNX with int4 (MatMulNBits) quantization
- **Original model size:** ~5 GB (fp32)
- **Quantized model size:** ~1.5 GB (q4)
- **Quantization method:** `MatMulNBitsQuantizer` from `onnxruntime` with block_size=32, symmetric quantization
All tensor data is embedded in a single `.onnx` file (no external data files) for browser compatibility.
## Usage with Transformers.js
```javascript
import { AutoModelForCausalLM, AutoTokenizer } from "@huggingface/transformers";
const model = await AutoModelForCausalLM.from_pretrained(
"shreyask/Maincoder-1B-ONNX-web",
{ dtype: "q4", device: "webgpu" }
);
const tokenizer = await AutoTokenizer.from_pretrained(
"shreyask/Maincoder-1B-ONNX-web"
);
const messages = [
{ role: "system", content: "You are Maincoder, an expert code generation assistant." },
{ role: "user", content: "Write a binary search function in Python" },
];
const input = tokenizer.apply_chat_template(messages, {
add_generation_prompt: true,
return_dict: true,
});
const output = await model.generate({
...input,
max_new_tokens: 1024,
eos_token_id: [151643, 151645],
});
```
## Base Model
This is a quantized conversion of [Maincode/Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B). See the base model card for training details, benchmarks, and intended use.
|