MambaByte
Collection
MambaByte: Token-free Selective State Space Model (arxiv.org/abs/2401.13660) • 6 items • Updated
How to use JunxiongWang/MambaByte_Arxiv with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="JunxiongWang/MambaByte_Arxiv") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("JunxiongWang/MambaByte_Arxiv", dtype="auto")How to use JunxiongWang/MambaByte_Arxiv with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "JunxiongWang/MambaByte_Arxiv"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "JunxiongWang/MambaByte_Arxiv",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/JunxiongWang/MambaByte_Arxiv
How to use JunxiongWang/MambaByte_Arxiv with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "JunxiongWang/MambaByte_Arxiv" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "JunxiongWang/MambaByte_Arxiv",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "JunxiongWang/MambaByte_Arxiv" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "JunxiongWang/MambaByte_Arxiv",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use JunxiongWang/MambaByte_Arxiv with Docker Model Runner:
docker model run hf.co/JunxiongWang/MambaByte_Arxiv
Train in 30B Byte. Mode size 353M. Table 2 in MambaByte
To use
import torch
import numpy as np
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
model=MambaLMHeadModel.from_pretrained("JunxiongWang/MambaByte_Arxiv", device='cuda', dtype=torch.bfloat16)
text = "\documentclass[12pt]{article}"
text_byte = np.frombuffer(text.encode('utf-8'), dtype=np.uint8)
input_ids = torch.from_numpy(text_byte[None, :].copy()).long().cuda()
sample = model.generate(
input_ids=input_ids,
max_length=2048,
cg=True,
return_dict_in_generate=True,
output_scores=True,
enable_timing=True,
temperature=1,
top_k=256,
top_p=0.9,
)
print(bytes(sample.sequences[0].tolist()).decode('utf-8'))
Output:
\documentclass[12pt]{article}}}}^{{\mathbf{P}}\uplus{\mathbf{Q}}}}}}}{}}$ is a symmetric poset. This implies that $$\operatorname{end}({\mathscr{L}}) = \operatorname{end}({\mathscr{L}}\setminus\{\sigma_{{\mathbf{P}}}\}) = \operatorname{end}({\mathscr{L}}\setminus\{\sigma_{{\mathbf{Q}}}\}) = \operatorname{end}({\mathscr{L}}\setminus\{\sigma_{{\mathbf{P}}},\sigma_{{\mathbf{Q}}}\}),$$ i.e., ${\mathscr{L}}$ is $\{\sigma_{{\mathbf{P}}},\sigma_{{\mathbf{Q}}}\}$-bistochastic for any ${\mathbf{P}}\neq{\mathbf{Q}}$. Thus, ${\mathscr{L}}$ is reversible, and is in fact maximal among all $\{\sigma_{{\mathbf{P}}},\sigma_{{\mathbf{Q}}}\}$-bistochastic matrices.
Since ${\mathscr{L}}$ is in the same class as $\sigma_{{\mathbf{P}}}^{{\mathbf{Q}}}$, we have $\operatorname{end}({\mathscr{L}})\subseteq\operatorname{end}({\mathscr{L}})$. Conversely, if $\operatorname{end}({\mathscr{L}})=\operatorname{end}({\mathscr{L}})$, then $\sigma_{{\mathbf{P}}}^{{\mathbf{Q}}}$ is maximal in $\operatorname{end}({\mathscr{L}})$. Since ${\mathbf{P}}\setminus\{\sigma_{{\mathbf{P}}}\}\subseteq\operatorname{end}({\mathscr{L}})$, this implies that ${\mathscr{L}}$ is in the same class as $\sigma_{{\mathbf{P}}}^{{\mathbf{Q}}}$, and hence ${\mathscr{L}}$ is reversible.
We are now ready to show that $\{\sigma_{{\mathbf{P}}},\sigma_{{\mathbf{Q}}}\}$-bistochastic matrices form a symmetric poset of ends.
\[lem:end\_symm\_class\] Let ${\mathbf{P}},{\mathbf{Q}}\in{\mathscr{M}}$. Then $\sigma_{{\mathbf{P}}}^{{\mathbf{Q}}}$ is symmetric if and only if $\operatorname{end}({\mathscr{L}})=\operatorname{end}({\mathscr{L}})$.
Suppose that $\operatorname{end}({\mathscr{L}})=\operatorname{end}({\mathscr{L}})$, and we prove that $\sigma_{{\mathbf{P}}}^{{\mathbf{Q}}}$ is symmetric. Clearly, $\operatorname{end}({\mathscr{L}})$ contains exactly the ends of $\operatorname{end}({\mathscr{L}})$ by definition, and the only case that survives is when $\operatorname{end}({\mathscr{L}})=\operatorname{end}({\mathscr{L}})$. By construction, this means that $\sigma_{{\mathbf{P}}}