mamba2_exp2

mamba2_exp2 is a Mamba2 architecture model with approximately 0.2 Billion parameters. It has been pre-trained on a dataset of Chinese light novels (esjzone). It is intended for text generation and story continuation tasks in Chinese.

Model Details

Model Description

This model utilizes the Mamba2 state-space model architecture, designed for efficient inference. It was pre-trained from scratch on a corpus of uncleaned Chinese light novels.

Note: This is a base model (pre-trained only), meaning it has not undergone instruction tuning (RLHF or SFT). It is best suited for completing text based on a prompt (continuation) rather than answering questions or following complex instructions.

  • Developed by: telecomadm1145
  • Model type: Mamba2 (State Space Model)
  • Language(s) (NLP): Chinese (zh)
  • License: MIT
  • Finetuned from model: None (Trained from scratch)
  • Model Size: ~0.2B parameters
  • Context Length: 1024 tokens

Model Sources

Uses

Direct Use

The model is designed for:

  • Creative Writing: Generating light novel-style stories.
  • Text Completion: Continuing a given text narrative in Chinese.
  • Style Imitation: Mimicking the tropes and writing styles found in web novels.

Out-of-Scope Use

  • Factual Question Answering: Since it is trained on fiction, it will likely hallucinate facts.
  • Instruction Following: It has not been fine-tuned to follow commands (e.g., "Write a summary of...").
  • Code Generation: Not trained on code.
  • Long-context retrieval: The model was trained with a context window of 1024 tokens; performance may degrade significantly beyond this length.

Bias, Risks, and Limitations

  • Dataset Quality: The training data consists of uncleaned web novels. Consequently, the model may generate text containing typos, grammatical errors, or non-standard formatting present in the source material.
  • Content Warnings: The model may generate content that includes violence, mature themes, or offensive language, reflecting the nature of some web fiction genres.
  • Hallucinations: As a fiction-focused model, it creates content and should not be used as a knowledge base.

How to Get Started with the Model

Use the code below to get started with the model.

Note: You may need to install mamba-ssm and causal-conv1d depending on the environment configuration for Mamba2 models.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_id = "telecomadm1145/mamba2_exp2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

# Move to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# Generate text
text = "<replace your prompt here>"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(
    **inputs, 
    max_new_tokens=100, 
    do_sample=True, 
    top_k=50, 
    top_p=0.95,
    repetition_penalty=1.1
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

  • Dataset Name: esjzone_novel_cn
  • Data Type: Chinese Light Novels (轻小说).
  • Data Size: Approximately 1GB.
  • Preprocessing: The data was uncleaned (raw text) during training.

Training Procedure

Training Hyperparameters

  • Context Length: 1024 tokens
  • Training Stage: Pre-training (Causal Language Modeling)

Speeds, Sizes, Times

  • Hardware: 2x NVIDIA T4 GPUs
  • Training Duration: ~23 hours
  • Model Parameters: ~0.2 Billion

Environmental Impact

  • Hardware Type: NVIDIA T4 x2
  • Hours used: 23 hours
  • Compute Region: [Unknown/Cloud]

Technical Specifications

Model Architecture and Objective

The model follows the Mamba2 architecture, which is a type of State Space Model (SSM) designed to handle sequences efficiently. The objective was standard Causal Language Modeling (predicting the next token) on a dataset of fiction.


Downloads last month
84
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train telecomadm1145/mamba2_exp2

Space using telecomadm1145/mamba2_exp2 1