GGUF version of Stockmark 2 100B Instruct

What is Stockmark 2 100B Instruct?

Stockmark-2-100B-Instruct is a 100-billion-parameter large language model by Stockmark Inc. built from scratch, with a particular focus on Japanese.

For more information, see the README file on the original model.

What is this?

This repository provides access to corresponding GGUF files (including quantizations) to try Stockmark 2's capabilities with AI workstations (with large VRAM using GPU or CPU+GPU inference) and/or powerful PCs (with large RAM using CPU inference).

Beware that, CPU inference is not practical for production use (since of its dense architecture). The primary purpose of this repository is for experiments and quality evaluation.

Note that this is just the format conversion and nothing more. The model itself is unchanged, except quantization.

Although that Stockmark 2 100B Instruct is built from scratch (avoids contamination from the Llama licenses), it and Llama share some similarities (including tokenizers) and converted GGUF files are recognized as a Llama variant.

What to try first?

For CPU inference with 128GiB RAM, Q3_K_M would be a balanced choice. Alternatively, Q4_K_S would work for you.

The author (of this repository) confirmed up to Q6_K are working on the same machine (sometimes the author needed to turn mmap off).

Format Conversion, Quantization and Splitting

First, the BFloat16 version of the GGUF file (as the original model precision) is generated using a patched version of llama.cpp (convert_hf_to_gguf.py) and quantized versions are generated from it (with unpatched llama-quantize).

All GGUF files exceeding 50GB are split using llama-gguf-split to workaround Hugging Face's 50GB file size limit (note: 50GB here is in decimal).

Input: Stockmark-2-100B-Instruct (commit 98e959a472d0; 2025-09-25)
llama.cpp: build 6992 (commit aa3b7a90b407; 2025-11-08)
A patch to convert Stockmark 2 100B Instruct as a Llama-like model with a custom vocabulary.

License

The base instruct model (which Stockmark Inc. owns the rights) is licensed under the MIT license.

Conversion and quantization of the model (to make those GGUF files) are considered non-copyrightable itself (so that the author cannot set its license terms).

For countries / regions where this argument would not apply, the author (a4lg) declares that those "contribution" are put under the terms of CC0 1.0 Universal as follows:

Converted and processed in 2025 by a4lg

To the extent possible under law, the author have dedicated all copyright and related and neighboring rights possibly owned by the author to the public domain worldwide. Those large language model files are distributed without any warranty.

You should have received a copy of the CC0 Public Domain Dedication along with those files. If not, see http://creativecommons.org/publicdomain/zero/1.0/.

Remind that, those GGUF files are derivative works of Stockmark 2 100B Instruct, created by Stockmark Inc. You should also see the license of the original work at https://huggingface.co/stockmark/Stockmark-2-100B-Instruct.

For the llama.cpp patch (0001-Quick-Stockmark-2-100B-support.patch) in this repository, CC0 1.0 Universal applies (though the author considers the patch itself is non-copyrightable).

Written in 2025 by a4lg

To the extent possible under law, the author(s) have dedicated all copyright and related and neighboring rights to this software to the public domain worldwide. This software is distributed without any warranty.

You should have received a copy of the CC0 Public Domain Dedication along with this software. If not, see http://creativecommons.org/publicdomain/zero/1.0/.

Downloads last month: 32

GGUF

Model size

96B params

Architecture

llama

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for a4lg/Stockmark-2-100B-Instruct-GGUF

Base model

stockmark/Stockmark-2-100B-Instruct

Quantized

(1)

this model