--- license: apache-2.0 language: - en library_name: transformers.js tags: - code - python - maincoder - code-generation - reinforcement-learning - mcpo - onnx pipeline_tag: text-generation base_model: Maincode/Maincoder-1B --- # Maincoder 1B — ONNX (Quantized, WebGPU) This is a **quantized ONNX** version of [Maincode/Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B), optimized for in-browser inference with [Transformers.js](https://huggingface.co/docs/transformers.js) and WebGPU. ## Quantization - **Format:** ONNX with int4 (MatMulNBits) quantization - **Original model size:** ~5 GB (fp32) - **Quantized model size:** ~1.5 GB (q4) - **Quantization method:** `MatMulNBitsQuantizer` from `onnxruntime` with block_size=32, symmetric quantization All tensor data is embedded in a single `.onnx` file (no external data files) for browser compatibility. ## Usage with Transformers.js ```javascript import { AutoModelForCausalLM, AutoTokenizer } from "@huggingface/transformers"; const model = await AutoModelForCausalLM.from_pretrained( "shreyask/Maincoder-1B-ONNX-web", { dtype: "q4", device: "webgpu" } ); const tokenizer = await AutoTokenizer.from_pretrained( "shreyask/Maincoder-1B-ONNX-web" ); const messages = [ { role: "system", content: "You are Maincoder, an expert code generation assistant." }, { role: "user", content: "Write a binary search function in Python" }, ]; const input = tokenizer.apply_chat_template(messages, { add_generation_prompt: true, return_dict: true, }); const output = await model.generate({ ...input, max_new_tokens: 1024, eos_token_id: [151643, 151645], }); ``` ## Base Model This is a quantized conversion of [Maincode/Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B). See the base model card for training details, benchmarks, and intended use.