Instructions to use telecomadm1145/mamba2_exp2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use telecomadm1145/mamba2_exp2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="telecomadm1145/mamba2_exp2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("telecomadm1145/mamba2_exp2", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use telecomadm1145/mamba2_exp2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "telecomadm1145/mamba2_exp2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "telecomadm1145/mamba2_exp2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/telecomadm1145/mamba2_exp2

SGLang

How to use telecomadm1145/mamba2_exp2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "telecomadm1145/mamba2_exp2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "telecomadm1145/mamba2_exp2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "telecomadm1145/mamba2_exp2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "telecomadm1145/mamba2_exp2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use telecomadm1145/mamba2_exp2 with Docker Model Runner:
```
docker model run hf.co/telecomadm1145/mamba2_exp2
```

mamba2_exp2

mamba2_exp2 is a Mamba2 architecture model with approximately 0.2 Billion parameters. It has been pre-trained on a dataset of Chinese light novels (esjzone). It is intended for text generation and story continuation tasks in Chinese.

Model Details

Model Description

This model utilizes the Mamba2 state-space model architecture, designed for efficient inference. It was pre-trained from scratch on a corpus of uncleaned Chinese light novels.

Note: This is a base model (pre-trained only), meaning it has not undergone instruction tuning (RLHF or SFT). It is best suited for completing text based on a prompt (continuation) rather than answering questions or following complex instructions.

Developed by: telecomadm1145
Model type: Mamba2 (State Space Model)
Language(s) (NLP): Chinese (zh)
License: MIT
Finetuned from model: None (Trained from scratch)
Model Size: ~0.2B parameters
Context Length: 1024 tokens

Model Sources

Repository: https://huggingface.co/telecomadm1145/mamba2_exp2
Dataset: telecomadm1145/esjzone_novel_cn

Uses

Direct Use

The model is designed for:

Creative Writing: Generating light novel-style stories.
Text Completion: Continuing a given text narrative in Chinese.
Style Imitation: Mimicking the tropes and writing styles found in web novels.

Out-of-Scope Use

Factual Question Answering: Since it is trained on fiction, it will likely hallucinate facts.
Instruction Following: It has not been fine-tuned to follow commands (e.g., "Write a summary of...").
Code Generation: Not trained on code.
Long-context retrieval: The model was trained with a context window of 1024 tokens; performance may degrade significantly beyond this length.

Bias, Risks, and Limitations

Dataset Quality: The training data consists of uncleaned web novels. Consequently, the model may generate text containing typos, grammatical errors, or non-standard formatting present in the source material.
Content Warnings: The model may generate content that includes violence, mature themes, or offensive language, reflecting the nature of some web fiction genres.
Hallucinations: As a fiction-focused model, it creates content and should not be used as a knowledge base.

How to Get Started with the Model

Use the code below to get started with the model.

Note: You may need to install mamba-ssm and causal-conv1d depending on the environment configuration for Mamba2 models.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_id = "telecomadm1145/mamba2_exp2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

# Move to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# Generate text
text = "<replace your prompt here>"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(
    **inputs, 
    max_new_tokens=100, 
    do_sample=True, 
    top_k=50, 
    top_p=0.95,
    repetition_penalty=1.1
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

Dataset Name: esjzone_novel_cn
Data Type: Chinese Light Novels (轻小说).
Data Size: Approximately 1GB.
Preprocessing: The data was uncleaned (raw text) during training.

Training Procedure

Training Hyperparameters

Context Length: 1024 tokens
Training Stage: Pre-training (Causal Language Modeling)

Speeds, Sizes, Times

Hardware: 2x NVIDIA T4 GPUs
Training Duration: ~23 hours
Model Parameters: ~0.2 Billion

Environmental Impact

Hardware Type: NVIDIA T4 x2
Hours used: 23 hours
Compute Region: [Unknown/Cloud]

Technical Specifications

Model Architecture and Objective

The model follows the Mamba2 architecture, which is a type of State Space Model (SSM) designed to handle sequences efficiently. The objective was standard Causal Language Modeling (predicting the next token) on a dataset of fiction.

Downloads last month: 10

Safetensors

Model size

0.2B params

Tensor type

F32

telecomadm1145
/

mamba2_exp2