Xmodel-2 / README.md

Update README.md

18d86e7 verified 9 months ago

2.69 kB

license: apache-2.0

Introduction

Xmodel-2 is a 1.2-billion-parameter large language model designed specifically for reasoning tasks. Its architecture enables different model scales to share a unified set of hyperparameters, allowing for extensive experimentation on smaller models and seamless transfer of optimal configurations to larger models. To maximize training efficiency and stability, Xmodel-2 employs the WSD learning rate scheduler from MiniCPM. Pretrained on 1.5 trillion tokens from diverse sources, Xmodel-2 achieves state-of-the-art performance in complex reasoning and agent-based tasks, while maintaining low training costs. These results highlight the potential of efficient model design and training strategies in advancing reasoning capabilities. Model checkpoints and code are publicly available on GitHub at https://github.com/XiaoduoAILab/Xmodel-2

For detail, you can read the paper at https://huggingface.co/papers/2412.19638

To use Xmodel-2 for the inference, all you need to do is to input a few lines of codes as demonstrated below. However, please make sure that you are using the latest code and related virtual environments.

import os
from transformers.models.auto.modeling_auto import AutoModelForCausalLM
from transformers.models.auto.tokenization_auto import AutoTokenizer


os.environ["CUDA_VISIBLE_DEVICES"] = "5"

model_path = os.path.expanduser("~/models/Xmodel-2")

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
    model_path,
    trust_remote_code=True
)

prompt = "Give me a short introduction to large language model."
messages = [{"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer(text, return_tensors="pt").to(model.device)

stop_tokens = ["<|im_end|>", "<|im_start|>"]
stop_token_ids = []
for token in stop_tokens:
    encoded = tokenizer.encode(token, add_special_tokens=False)
    if encoded:
        stop_token_ids.extend(encoded)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=256,
    do_sample=True,
    top_p=0.9,
    temperature=0.7,
    pad_token_id=tokenizer.eos_token_id,
    stop_strings=stop_tokens,
    tokenizer=tokenizer
)

output = tokenizer.decode(
    generated_ids[0][len(model_inputs.input_ids[0]):], 
    skip_special_tokens=True
)

for stop_token in stop_tokens:
    output = output.replace(stop_token, "")

output = output.split("<|im_start|>")[0]
output = output.strip()

print("Generated Response:")
print(output)