Instructions to use tiny-random/longcat-flash-lite with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tiny-random/longcat-flash-lite with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="tiny-random/longcat-flash-lite", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("tiny-random/longcat-flash-lite", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use tiny-random/longcat-flash-lite with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "tiny-random/longcat-flash-lite"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tiny-random/longcat-flash-lite",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/tiny-random/longcat-flash-lite

SGLang

How to use tiny-random/longcat-flash-lite with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "tiny-random/longcat-flash-lite" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tiny-random/longcat-flash-lite",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "tiny-random/longcat-flash-lite" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tiny-random/longcat-flash-lite",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use tiny-random/longcat-flash-lite with Docker Model Runner:
```
docker model run hf.co/tiny-random/longcat-flash-lite
```

yujiepan commited on Feb 2

Commit

5ff2736

verified ·

1 Parent(s): d3ee99d

Upload folder using huggingface_hub

Browse files

Files changed (2) hide show

README.md +4 -7
config.json +3 -3

README.md CHANGED Viewed

@@ -126,17 +126,14 @@ model.model.ngram_embeddings = None  # avoid saving shared params
 model.save_pretrained(save_folder)
 torch.set_default_dtype(torch.float32)
-print(model.model.rotary_emb.inv_freq.shape)
-# 1 / 0
-# for n, m in model.named_modules():
-#     if 'LongcatFlashMLA' in str(type(m)):
-#         print(n, m.layer_idx)
 with open(f"{save_folder}/config.json", "r", encoding='utf-8') as f:
     config_json = json.load(f)
-    config_json['auto_map'] = {k: v.split('--')[-1] for k, v in config_json['auto_map'].items()}
 with open(f"{save_folder}/config.json", "w", encoding='utf-8') as f:
     json.dump(config_json, f, indent=2)
 ```
 </details>

 model.save_pretrained(save_folder)
 torch.set_default_dtype(torch.float32)
 with open(f"{save_folder}/config.json", "r", encoding='utf-8') as f:
     config_json = json.load(f)
+    config_json['auto_map'] = {k: source_model_id + '--' +
+                               v.split('--')[-1] for k, v in config_json['auto_map'].items()}
 with open(f"{save_folder}/config.json", "w", encoding='utf-8') as f:
     json.dump(config_json, f, indent=2)
+for f in Path(save_folder).glob('*.py'):
+    f.unlink()
 ```
 </details>

config.json CHANGED Viewed

@@ -5,9 +5,9 @@
   "attention_bias": false,
   "attention_dropout": 0.0,
   "auto_map": {
-    "AutoConfig": "configuration_longcat_ngram.LongcatFlashNgramConfig",
-    "AutoModel": "modeling_longcat_ngram.LongcatFlashNgramModel",
-    "AutoModelForCausalLM": "modeling_longcat_ngram.LongcatFlashNgramForCausalLM"
   },
   "bos_token_id": 1,
   "dtype": "bfloat16",

   "attention_bias": false,
   "attention_dropout": 0.0,
   "auto_map": {
+    "AutoConfig": "meituan-longcat/LongCat-Flash-Lite--configuration_longcat_ngram.LongcatFlashNgramConfig",
+    "AutoModel": "meituan-longcat/LongCat-Flash-Lite--modeling_longcat_ngram.LongcatFlashNgramModel",
+    "AutoModelForCausalLM": "meituan-longcat/LongCat-Flash-Lite--modeling_longcat_ngram.LongcatFlashNgramForCausalLM"
   },
   "bos_token_id": 1,
   "dtype": "bfloat16",