Source:bRadu/translategemma-4b-it-novision

Export using onnxruntime-genai:

q4: --precision int4 --execution_provider webgpu --extra_options int4_accuracy_level=1 int4_block_size=32 int4_algo_config=rtn enable_webgpu_graph=true
q4f16: --precision int4 --execution_provider webgpu --extra_options int4_accuracy_level=2 int4_block_size=64 int4_algo_config=rtn enable_webgpu_graph=true

Using int4_accuracy_level=4 or int4_block_size=128 will cause a significant drop in accuracy, even leading to nonsensical output.

What this is

This repo contains a converted Gemma3ForCausalLM checkpoint extracted from the language component of the original multimodal model:

  • Source: google/translategemma-4b-it
  • Target architecture: Gemma3ForCausalLM (text-only)
  • Precision: float16 weights
  • Removed components: vision tower / multimodal-only parts

Intended use

  • Translation and text generation with the same chat-format style expected by TranslateGemma.
  • Efficient local or server inference where vision capability is not needed.

Recommended system prompt (see https://arxiv.org/pdf/2601.09012 for src codes)

Language code reference: https://huggingface.co/google/translategemma-4b-it/blob/main/chat_template.jinja.

How to use (different chat template from the original google/translategemma-4b-it page, similar to google/gemma-3-1b-it)

transformers.js

If you encounter dialogue problems, try not using "apply_chat_template" directly passing messages.
ex:const result = await generator(messages, {....

    try {
        if (!generator) {
            // Check WebGPU 
            if (!('gpu' in navigator)) {
                throw new Error("Worker: WebGPU is not supported in this environment.");
            }

            generator = await pipeline('text-generation', MODEL_ID, {
                device: 'webgpu',
                dtype: 'q4f16', //q4
                progress_callback: (info: any) => {
                    self.postMessage({
                        type: 'progress',
                        data: info
                    });
                }
            }) as unknown as TextGenerationPipeline;
        }

        self.postMessage({ type: 'ready' });
        console.log('Worker: Model initialized successfully');

    } catch (error: any) {
        console.error('Worker: Initialization failed', error);
        self.postMessage({ 
            type: 'error', 
            error: error.message || 'Unknown error during initialization' 
        });
    }
}

    try {
        console.log('Worker: Start translation...');
        
        const sourceName = "jp-ja";
        const targetName = "zh-tw";

        const messages = [
            { 
                role: "system", 
                content: `You are an expert multilingual translator. Your task is to accurately and fluently translate text from ${sourceName} into ${targetName}. Provide ONLY the translated text, without any additional explanations, commentary, or greetings.`
            },
            { 
                role: "user", 
                content: `${text}.`
                //content: `Translate the following to ${targetName}:\n\n${text}.`  If there are multiple paragraphs or long texts, use this "content."
            }
        ];

        const prompt = generator.tokenizer.apply_chat_template(messages, {
            tokenize: false,
            add_generation_prompt: true,
        }) as string;
        console.log("Check prompt:", prompt)
        let buffer = "";
        const streamer = new TextStreamer(generator.tokenizer, {
            skip_prompt: true,
            skip_special_tokens: true, 
            callback_function: (output: string) => {
                buffer += output;
        
                if (buffer.length >= 3 || output.includes('。') || output.includes('\n')) {
                self.postMessage({
                type: 'update',
                output: buffer
                });
                buffer = ""; 
                }
            }
        });

        const result = await generator(prompt, {
            max_new_tokens: 1024,
            temperature: 0.6,
            repetition_penalty: 1.1,
            do_sample: false,
            return_full_text: false,
            streamer: streamer,
            // Gemma 3  EOS ID 7
            //eos_token_id: [1, 106],
            //stop_strings: ["<end_of_turn>", "<eos>"],
        });
        self.postMessage({ type: 'complete', result });
        console.log("Print result: ", result);
        console.log('Worker: Translation complete');

    } catch (error: any) {
        console.error('Worker: Translation failed', error);
        self.postMessage({ 
            type: 'error', 
            error: error.message || 'Error during translation generation' 
        });
    }
Downloads last month
54
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for willopcbeta/translategemma-4b-it-Text-ONNX

Quantized
(1)
this model

Paper for willopcbeta/translategemma-4b-it-Text-ONNX