Files changed (1) hide show
  1. README.md +69 -1
README.md CHANGED
@@ -4,4 +4,72 @@ license: apache-2.0
4
  # Introduction
5
  Xmodel-2 is a 1.2-billion-parameter large language model designed specifically for reasoning tasks. Its architecture enables different model scales to share a unified set of hyperparameters, allowing for extensive experimentation on smaller models and seamless transfer of optimal configurations to larger models. To maximize training efficiency and stability, Xmodel-2 employs the WSD learning rate scheduler from MiniCPM. Pretrained on 1.5 trillion tokens from diverse sources, Xmodel-2 achieves state-of-the-art performance in complex reasoning and agent-based tasks, while maintaining low training costs. These results highlight the potential of efficient model design and training strategies in advancing reasoning capabilities. Model checkpoints and code are publicly available on GitHub at https://github.com/XiaoduoAILab/Xmodel-2
6
 
7
- For detail, you can read the paper at https://huggingface.co/papers/2412.19638
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  # Introduction
5
  Xmodel-2 is a 1.2-billion-parameter large language model designed specifically for reasoning tasks. Its architecture enables different model scales to share a unified set of hyperparameters, allowing for extensive experimentation on smaller models and seamless transfer of optimal configurations to larger models. To maximize training efficiency and stability, Xmodel-2 employs the WSD learning rate scheduler from MiniCPM. Pretrained on 1.5 trillion tokens from diverse sources, Xmodel-2 achieves state-of-the-art performance in complex reasoning and agent-based tasks, while maintaining low training costs. These results highlight the potential of efficient model design and training strategies in advancing reasoning capabilities. Model checkpoints and code are publicly available on GitHub at https://github.com/XiaoduoAILab/Xmodel-2
6
 
7
+ For detail, you can read the paper at https://huggingface.co/papers/2412.19638
8
+
9
+ To use Xmodel-2 for the inference, all you need to do is to input a few lines of codes as demonstrated below. However, please make sure that you are using the latest code and related virtual environments.
10
+
11
+ ```
12
+ import os
13
+ from transformers.models.auto.modeling_auto import AutoModelForCausalLM
14
+ from transformers.models.auto.tokenization_auto import AutoTokenizer
15
+
16
+
17
+ os.environ["CUDA_VISIBLE_DEVICES"] = "5"
18
+
19
+ model_path = os.path.expanduser("~/models/Xmodel-2")
20
+
21
+ model = AutoModelForCausalLM.from_pretrained(
22
+ model_path,
23
+ torch_dtype="auto",
24
+ device_map="auto",
25
+ trust_remote_code=True
26
+ )
27
+
28
+ tokenizer = AutoTokenizer.from_pretrained(
29
+ model_path,
30
+ trust_remote_code=True
31
+ )
32
+
33
+ prompt = "Give me a short introduction to large language model."
34
+ messages = [{"role": "user", "content": prompt}]
35
+
36
+ text = tokenizer.apply_chat_template(
37
+ messages,
38
+ tokenize=False,
39
+ add_generation_prompt=True
40
+ )
41
+
42
+ model_inputs = tokenizer(text, return_tensors="pt").to(model.device)
43
+
44
+ stop_tokens = ["<|im_end|>", "<|im_start|>"]
45
+ stop_token_ids = []
46
+ for token in stop_tokens:
47
+ encoded = tokenizer.encode(token, add_special_tokens=False)
48
+ if encoded:
49
+ stop_token_ids.extend(encoded)
50
+
51
+ generated_ids = model.generate(
52
+ **model_inputs,
53
+ max_new_tokens=256,
54
+ do_sample=True,
55
+ top_p=0.9,
56
+ temperature=0.7,
57
+ pad_token_id=tokenizer.eos_token_id,
58
+ stop_strings=stop_tokens,
59
+ tokenizer=tokenizer
60
+ )
61
+
62
+ output = tokenizer.decode(
63
+ generated_ids[0][len(model_inputs.input_ids[0]):],
64
+ skip_special_tokens=True
65
+ )
66
+
67
+ for stop_token in stop_tokens:
68
+ output = output.replace(stop_token, "")
69
+
70
+ output = output.split("<|im_start|>")[0]
71
+ output = output.strip()
72
+
73
+ print("Generated Response:")
74
+ print(output)
75
+ ```