MaziyarPanahi commited on
Commit
a4f8dc0
·
verified ·
1 Parent(s): 939d49e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md CHANGED
@@ -18,3 +18,75 @@ tags:
18
  - mlx
19
  pipeline_tag: text-generation
20
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  - mlx
19
  pipeline_tag: text-generation
20
  ---
21
+
22
+ <div align="center">
23
+ <picture>
24
+ <img
25
+ src="https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/i-v1KyAMOW_mgVGeic9WJ.png"
26
+ alt="Arcee Trinity Mini"
27
+ style="max-width: 100%; height: auto;"
28
+ >
29
+ </picture>
30
+ </div>
31
+
32
+ # Trinity Nano MLX 8bit
33
+
34
+ Trinity Nano Preview is a preview of Arcee AI's 6B MoE model with 1B active parameters. It is the small-sized model in our new Trinity family, a series of open-weight models for enterprise and tinkerers alike.
35
+
36
+ This is a chat tuned model, with a delightful personality and charm we think users will love. We note that this model is pushing the limits of sparsity in small language models with only 800M non-embedding parameters active per token, and as such **may be unstable** in certain use cases, especially in this preview.
37
+
38
+ This is an *experimental* release, it's fun to talk to but will not be hosted anywhere, so download it and try it out yourself!
39
+
40
+ ***
41
+
42
+ Trinity Nano Preview is trained on 10T tokens gathered and curated through a key partnership with [Datology](https://www.datologyai.com/), building upon the excellent dataset we used on [AFM-4.5B](https://huggingface.co/arcee-ai/AFM-4.5B) with additional math and code.
43
+
44
+ Training was performed on a cluster of 512 H200 GPUs powered by [Prime Intellect](https://www.primeintellect.ai/) using HSDP parallelism.
45
+
46
+ More details, including key architecture decisions, can be found on our blog [here](https://www.arcee.ai/blog/the-trinity-manifesto)
47
+
48
+ ***
49
+
50
+ ## Model Details
51
+
52
+ * **Model Architecture:** AfmoeForCausalLM
53
+ * **Parameters:** 6B, 1B active
54
+ * **Experts:** 128 total, 8 active, 1 shared
55
+ * **Context length:** 128k
56
+ * **Training Tokens:** 10T
57
+ * **License:** [Apache 2.0](https://huggingface.co/arcee-ai/Trinity-Mini#license)
58
+
59
+ ## Use with mlx
60
+
61
+ ```
62
+ pip install mlx-lm
63
+ ```
64
+
65
+ ```python
66
+ from mlx_lm import load, generate
67
+ from mlx_lm.sample_utils import make_sampler, make_logits_processors
68
+
69
+ model, tokenizer = load("arcee-ai/Trinity-Nano-Preview-MLX-8bit")
70
+
71
+ prompt = "What is the capital of France?"
72
+
73
+ if tokenizer.chat_template is not None:
74
+ messages = [{"role": "user", "content": prompt}]
75
+ prompt = tokenizer.apply_chat_template(
76
+ messages, tokenize=False, add_generation_prompt=True
77
+ )
78
+
79
+ sampler = make_sampler(temp=0.1, top_k=50, top_p=0.1)
80
+ logits_processors = make_logits_processors(repetition_penalty=1.05)
81
+
82
+ response = generate(
83
+ model,
84
+ tokenizer,
85
+ prompt=prompt,
86
+ max_tokens=512,
87
+ sampler=sampler,
88
+ logits_processors=logits_processors,
89
+ verbose=True,
90
+ )
91
+ ```
92
+