This is a d-Matrix functional reference of the Llama-3.1-70B model. The reference provides the following functional configurations:

Configuration Explanation
BASELINE a reference functionally equivalent to the original model
BASIC all linear algebraic operands quantized to MXINT8-64

Usage

Install d-Matrix Dmx_Compressor first.

pip install dmx_compressor

The following is an example of how to instantiate the model. Note that this repo only contains custom modeling code and still uses weights from the official model repo.

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
from dmx.compressor import DmxModel
import torch

model_name = "d-matrix/Llama-3.1-70B"
official_model = "meta-llama/Llama-3.1-70B"
config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    official_model,
    config=config,
    trust_remote_code=True,
    device_map="auto",
)
model = DmxModel.from_torch(model)
x = torch.rand(1, 1024)
model(x)
Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results