File size: 1,841 Bytes

314c10a

---
license: apache-2.0
language:
- en
library_name: transformers.js
tags:
- code
- python
- maincoder
- code-generation
- reinforcement-learning
- mcpo
- onnx
pipeline_tag: text-generation
base_model: Maincode/Maincoder-1B
---

# Maincoder 1B — ONNX (Quantized, WebGPU)

This is a **quantized ONNX** version of [Maincode/Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B), optimized for in-browser inference with [Transformers.js](https://huggingface.co/docs/transformers.js) and WebGPU.

## Quantization

- **Format:** ONNX with int4 (MatMulNBits) quantization
- **Original model size:** ~5 GB (fp32)
- **Quantized model size:** ~1.5 GB (q4)
- **Quantization method:** `MatMulNBitsQuantizer` from `onnxruntime` with block_size=32, symmetric quantization

All tensor data is embedded in a single `.onnx` file (no external data files) for browser compatibility.

## Usage with Transformers.js

```javascript
import { AutoModelForCausalLM, AutoTokenizer } from "@huggingface/transformers";

const model = await AutoModelForCausalLM.from_pretrained(
  "shreyask/Maincoder-1B-ONNX-web",
  { dtype: "q4", device: "webgpu" }
);

const tokenizer = await AutoTokenizer.from_pretrained(
  "shreyask/Maincoder-1B-ONNX-web"
);

const messages = [
  { role: "system", content: "You are Maincoder, an expert code generation assistant." },
  { role: "user", content: "Write a binary search function in Python" },
];

const input = tokenizer.apply_chat_template(messages, {
  add_generation_prompt: true,
  return_dict: true,
});

const output = await model.generate({
  ...input,
  max_new_tokens: 1024,
  eos_token_id: [151643, 151645],
});
```

## Base Model

This is a quantized conversion of [Maincode/Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B). See the base model card for training details, benchmarks, and intended use.