kernelpool commited on
Commit
dcec42c
·
verified ·
1 Parent(s): 6ea422b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -2
README.md CHANGED
@@ -1,7 +1,41 @@
1
  ---
2
- language: en
 
3
  pipeline_tag: text-generation
4
  tags:
 
5
  - mlx
6
- library_name: mlx
7
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: mit
3
+ library_name: mlx
4
  pipeline_tag: text-generation
5
  tags:
6
+ - transformers
7
  - mlx
8
+ base_model: meituan-longcat/LongCat-Flash-Thinking-2601
9
  ---
10
+
11
+ # mlx-community/LongCat-Flash-Thinking-2601-4bit
12
+
13
+ This model [mlx-community/LongCat-Flash-Thinking-2601-4bit](https://huggingface.co/mlx-community/LongCat-Flash-Thinking-2601-4bit) was
14
+ converted to MLX format from [meituan-longcat/LongCat-Flash-Thinking-2601](https://huggingface.co/meituan-longcat/LongCat-Flash-Thinking-2601)
15
+ using mlx-lm version **0.30.2**.
16
+
17
+ ## Warning
18
+
19
+ This model is not yet fully compatible with MLX-LM beyond 8192 tokens, but you can track its progress [here](https://github.com/ml-explore/mlx-lm/pull/768) or use [this branch](https://github.com/kernelpool/mlx-lm/tree/longcat-flash) to test it.
20
+
21
+ ## Use with mlx
22
+
23
+ ```bash
24
+ pip install mlx-lm
25
+ ```
26
+
27
+ ```python
28
+ from mlx_lm import load, generate
29
+
30
+ model, tokenizer = load("mlx-community/LongCat-Flash-Thinking-2601-4bit")
31
+
32
+ prompt = "hello"
33
+
34
+ if tokenizer.chat_template is not None:
35
+ messages = [{"role": "user", "content": prompt}]
36
+ prompt = tokenizer.apply_chat_template(
37
+ messages, add_generation_prompt=True, return_dict=False,
38
+ )
39
+
40
+ response = generate(model, tokenizer, prompt=prompt, verbose=True)
41
+ ```