Text Generation
Transformers
PyTorch
English
taonet_mini_t2
taonet
taotern
ssm
state-space-model
dplr
custom_code
experimental
Instructions to use TaoTern/TaoNet-mini-T2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TaoTern/TaoNet-mini-T2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TaoTern/TaoNet-mini-T2", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("TaoTern/TaoNet-mini-T2", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use TaoTern/TaoNet-mini-T2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TaoTern/TaoNet-mini-T2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TaoTern/TaoNet-mini-T2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TaoTern/TaoNet-mini-T2
- SGLang
How to use TaoTern/TaoNet-mini-T2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TaoTern/TaoNet-mini-T2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TaoTern/TaoNet-mini-T2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TaoTern/TaoNet-mini-T2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TaoTern/TaoNet-mini-T2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TaoTern/TaoNet-mini-T2 with Docker Model Runner:
docker model run hf.co/TaoTern/TaoNet-mini-T2
| import torch | |
| from taoTrain.data.async_loader import AsyncBatchIterator | |
| from taoTrain.data.sft_jsonl import SFTJSONLDataset | |
| from taoTrain.data.sft_utils import build_response_only_next_token_labels | |
| def test_response_only_labels_mask_target_token_not_input_token(): | |
| input_ids = [10, 20, 30, 40, 2, 0] | |
| mask = [0, 0, 1, 1, 0, 0] | |
| labels = build_response_only_next_token_labels(input_ids, mask) | |
| assert labels == [-100, 30, 40, -100, -100, -100] | |
| def test_sft_dataset_direct_path_matches_response_only_helper(): | |
| dataset = SFTJSONLDataset.__new__(SFTJSONLDataset) | |
| dataset.chunk_manager = None | |
| dataset._current_chunk_data = { | |
| "input_ids": [[10, 20, 30, 40, 2, 0]], | |
| "attention_mask": [[1, 1, 1, 1, 1, 0]], | |
| "mask": [[0, 0, 1, 1, 0, 0]], | |
| } | |
| sample = dataset[0] | |
| assert torch.equal(sample["labels"], torch.tensor([-100, 30, 40, -100, -100, -100])) | |
| class _OneChunkQueue: | |
| def __init__(self): | |
| self._chunk = { | |
| "input_ids": [[10, 20, 30, 40, 2, 0]], | |
| "attention_mask": [[1, 1, 1, 1, 1, 0]], | |
| "mask": [[0, 0, 1, 1, 0, 0]], | |
| } | |
| self._returned = False | |
| self._next_chunk_idx = 0 | |
| self._chunk_order = [0] | |
| self._threads = [object()] | |
| def get_next_chunk(self, timeout=None): | |
| if self._returned: | |
| return None | |
| self._returned = True | |
| return self._chunk | |
| def is_exhausted(self): | |
| return self._returned | |
| def shutdown(self, wait=True): | |
| return None | |
| def __len__(self): | |
| return 1 | |
| def test_async_sft_loader_matches_direct_dataset_labels(): | |
| iterator = AsyncBatchIterator( | |
| tokenization_queue=_OneChunkQueue(), | |
| batch_size=1, | |
| device=torch.device("cpu"), | |
| drop_last=True, | |
| ) | |
| batch = next(iter(iterator)) | |
| assert torch.equal(batch["labels"], torch.tensor([[-100, 30, 40, -100, -100, -100]])) | |