shreyask commited on
Commit
314c10a
·
verified ·
1 Parent(s): dd73600

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +64 -0
README.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers.js
6
+ tags:
7
+ - code
8
+ - python
9
+ - maincoder
10
+ - code-generation
11
+ - reinforcement-learning
12
+ - mcpo
13
+ - onnx
14
+ pipeline_tag: text-generation
15
+ base_model: Maincode/Maincoder-1B
16
+ ---
17
+
18
+ # Maincoder 1B — ONNX (Quantized, WebGPU)
19
+
20
+ This is a **quantized ONNX** version of [Maincode/Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B), optimized for in-browser inference with [Transformers.js](https://huggingface.co/docs/transformers.js) and WebGPU.
21
+
22
+ ## Quantization
23
+
24
+ - **Format:** ONNX with int4 (MatMulNBits) quantization
25
+ - **Original model size:** ~5 GB (fp32)
26
+ - **Quantized model size:** ~1.5 GB (q4)
27
+ - **Quantization method:** `MatMulNBitsQuantizer` from `onnxruntime` with block_size=32, symmetric quantization
28
+
29
+ All tensor data is embedded in a single `.onnx` file (no external data files) for browser compatibility.
30
+
31
+ ## Usage with Transformers.js
32
+
33
+ ```javascript
34
+ import { AutoModelForCausalLM, AutoTokenizer } from "@huggingface/transformers";
35
+
36
+ const model = await AutoModelForCausalLM.from_pretrained(
37
+ "shreyask/Maincoder-1B-ONNX-web",
38
+ { dtype: "q4", device: "webgpu" }
39
+ );
40
+
41
+ const tokenizer = await AutoTokenizer.from_pretrained(
42
+ "shreyask/Maincoder-1B-ONNX-web"
43
+ );
44
+
45
+ const messages = [
46
+ { role: "system", content: "You are Maincoder, an expert code generation assistant." },
47
+ { role: "user", content: "Write a binary search function in Python" },
48
+ ];
49
+
50
+ const input = tokenizer.apply_chat_template(messages, {
51
+ add_generation_prompt: true,
52
+ return_dict: true,
53
+ });
54
+
55
+ const output = await model.generate({
56
+ ...input,
57
+ max_new_tokens: 1024,
58
+ eos_token_id: [151643, 151645],
59
+ });
60
+ ```
61
+
62
+ ## Base Model
63
+
64
+ This is a quantized conversion of [Maincode/Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B). See the base model card for training details, benchmarks, and intended use.