Instructions to use alibidaran/Spark_Anime with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use alibidaran/Spark_Anime with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="alibidaran/Spark_Anime")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("alibidaran/Spark_Anime")
model = AutoModelForCausalLM.from_pretrained("alibidaran/Spark_Anime")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use alibidaran/Spark_Anime with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "alibidaran/Spark_Anime"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "alibidaran/Spark_Anime",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/alibidaran/Spark_Anime

SGLang

How to use alibidaran/Spark_Anime with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "alibidaran/Spark_Anime" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "alibidaran/Spark_Anime",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "alibidaran/Spark_Anime" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "alibidaran/Spark_Anime",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use alibidaran/Spark_Anime with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for alibidaran/Spark_Anime to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for alibidaran/Spark_Anime to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for alibidaran/Spark_Anime to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="alibidaran/Spark_Anime",
    max_seq_length=2048,
)

Docker Model Runner
How to use alibidaran/Spark_Anime with Docker Model Runner:
```
docker model run hf.co/alibidaran/Spark_Anime
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

This repository provides a powerful and modular Text-to-Speech (TTS) model trained on Spark model that supports controllable audio generation using semantic and global token conditioning. It is designed for immersive narration, guided visualization, or expressive AI agents.

🔊 Model Highlights 🎯 Task-specific generation using <|task_tts|> prompt format

🧠 Semantic tokens capture content-related prosody and intonation

🌍 Global tokens control speaker identity, style, and other features

⚡ Optimized for fast inference with native acceleration

🧪 Example input: Guided fitness visualization prompt

📦 Installation Make sure to install the required packages:

bash Copy Edit pip install torch torchaudio soundfile 🚀 Usage

import torch
import re
import numpy as np
import torchaudio.transforms as T
from typing import Dict, Any

FastModel.for_inference(model)  # Enable 2x faster inference

input_text = "Frieren: Now, let's explore the imagery of your fitness journey..."

@torch.inference_mode()
def generate_speech_from_text(
    text: str,
    temperature: float = 0.8,
    top_k: int = 50,
    top_p: float = 1.0,
    max_new_audio_tokens: int = 2048,
    device: torch.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
) -> np.ndarray:
    prompt = "".join([
        "<|task_tts|>",
        "<|start_content|>",
        text,
        "<|end_content|>",
        "<|start_global_token|>"
    ])
    model_inputs = tokenizer([prompt], return_tensors="pt").to(device)

    print("Generating token sequence...")
    generated_ids = model.generate(
        **model_inputs,
        max_new_tokens=max_new_audio_tokens,
        do_sample=True,
        temperature=temperature,
        top_k=top_k,
        top_p=top_p,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id
    )
    print("Token sequence generated.")

    generated_ids_trimmed = generated_ids[:, model_inputs.input_ids.shape[1]:]
    predicts_text = tokenizer.batch_decode(generated_ids_trimmed, skip_special_tokens=False)[0]

    semantic_matches = re.findall(r"<\|bicodec_semantic_(\d+)\|>", predicts_text)
    if not semantic_matches:
        print("Warning: No semantic tokens found.")
        return np.array([], dtype=np.float32)

    pred_semantic_ids = torch.tensor([int(token) for token in semantic_matches]).long().unsqueeze(0)

    global_matches = re.findall(r"<\|bicodec_global_(\d+)\|>", predicts_text)
    if not global_matches:
        print("Warning: No global tokens found. Using defaults.")
        pred_global_ids = torch.zeros((1, 1), dtype=torch.long)
    else:
        pred_global_ids = torch.tensor([int(token) for token in global_matches]).long().unsqueeze(0)

    pred_global_ids = pred_global_ids.unsqueeze(0)

    print(f"Found {pred_semantic_ids.shape[1]} semantic tokens.")
    print(f"Found {pred_global_ids.shape[2]} global tokens.")

    print("Detokenizing audio tokens...")
    audio_tokenizer.device = device
    audio_tokenizer.model.to(device)
    wav_np = audio_tokenizer.detokenize(
        pred_global_ids.to(device).squeeze(0),
        pred_semantic_ids.to(device)
    )
    print("Detokenization complete.")
    return wav_np


if __name__ == "__main__":
    print(f"Generating speech for: '{input_text}'")
    text = f"{chosen_voice}: " + input_text if 'chosen_voice' in globals() else input_text
    generated_waveform = generate_speech_from_text(text)

    if generated_waveform.size > 0:
        import soundfile as sf
        output_filename = "generated_speech_controllable.wav"
        sample_rate = audio_tokenizer.config.get("sample_rate", 16000)
        sf.write(output_filename, generated_waveform, sample_rate)
        print(f"Audio saved to {output_filename}")

        from IPython.display import Audio, display
        display(Audio(generated_waveform, rate=sample_rate))
    else:
        print("Audio generation failed (no tokens found?).")
```
## 🔧 Parameters

| Parameter              | Type          | Default | Description                                                 |
|------------------------|---------------|---------|-------------------------------------------------------------|
| `text`                 | `str`         | —       | The input text to be converted into speech.                 |
| `temperature`          | `float`       | `0.8`   | Sampling temperature for diversity in generation.           |
| `top_k`                | `int`         | `50`    | Limits sampling to top-k most likely tokens.                |
| `top_p`                | `float`       | `1.0`   | Nucleus sampling (select from top-p cumulative probability).|
| `max_new_audio_tokens` | `int`         | `2048`  | Maximum number of audio tokens to generate.                 |
| `device`               | `torch.device`| Auto    | Uses CUDA if available, otherwise CPU.                      |


📁 Output Format
Output: generated_speech_controllable.wav

Sample Rate: Defaults to 16kHz (configurable via audio_tokenizer.config)

⚠️ Notes
Make sure model, tokenizer, and audio_tokenizer are properly initialized.

Designed for research and development use

Downloads last month: 10

Safetensors

Model size

0.5B params

Tensor type

F32

Model tree for alibidaran/Spark_Anime

Quantizations

1 model

alibidaran
/

Spark_Anime

Model tree for alibidaran/Spark_Anime

Dataset used to train alibidaran/Spark_Anime