Text Generation
Transformers
PyTorch
English
taonet_mini_t2
taonet
taotern
ssm
state-space-model
dplr
custom_code
experimental
Instructions to use TaoTern/TaoNet-mini-T2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TaoTern/TaoNet-mini-T2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TaoTern/TaoNet-mini-T2", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("TaoTern/TaoNet-mini-T2", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use TaoTern/TaoNet-mini-T2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TaoTern/TaoNet-mini-T2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TaoTern/TaoNet-mini-T2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TaoTern/TaoNet-mini-T2
- SGLang
How to use TaoTern/TaoNet-mini-T2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TaoTern/TaoNet-mini-T2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TaoTern/TaoNet-mini-T2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TaoTern/TaoNet-mini-T2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TaoTern/TaoNet-mini-T2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TaoTern/TaoNet-mini-T2 with Docker Model Runner:
docker model run hf.co/TaoTern/TaoNet-mini-T2
File size: 3,248 Bytes
e2bfccc | 1 2 3 4 5 | {"turns": [{"user": "What is machine learning?", "assistant": "Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. It uses algorithms to identify patterns in data."}, {"user": "Can you give me some examples?", "assistant": "Sure! Examples include: 1) Email spam detection - learns to identify spam patterns, 2) Image recognition - learns to identify objects in images, 3) Recommendation systems - learns user preferences to suggest content, 4) Natural language processing - learns language patterns for tasks like translation."}]}
{"turns": [{"user": "Explain neural networks", "assistant": "Neural networks are computing systems inspired by biological neural networks. They consist of interconnected nodes (neurons) organized in layers. Each connection has a weight that gets adjusted during training to learn patterns from data."}, {"user": "What layers do they have?", "assistant": "Neural networks typically have three types of layers: 1) Input layer - receives raw data, 2) Hidden layers - process data through weighted connections and activation functions, 3) Output layer - produces the final prediction or classification. Deep networks have multiple hidden layers."}]}
{"turns": [{"user": "What is a transformer model?", "assistant": "A transformer is a neural network architecture that uses self-attention mechanisms to weigh the importance of different input elements. It processes all input tokens in parallel (unlike RNNs) and captures long-range dependencies effectively. Transformers form the basis of modern large language models like GPT and BERT."}, {"user": "How does attention work?", "assistant": "Attention computes relevance scores between query tokens and all other tokens (keys and values). It learns which tokens to focus on when processing each position. Multi-head attention applies this in parallel with multiple learned representations. This allows the model to attend to different types of relationships simultaneously."}, {"user": "Why is it better than RNNs?", "assistant": "Transformers are better than RNNs because: 1) Parallel processing - all tokens processed simultaneously instead of sequentially, 2) Better long-range dependencies - attention can directly connect distant tokens, 3) Easier to train - no vanishing gradient problems, 4) Scalability - transformers scale better to large datasets and models."}]}
{"turns": [{"user": "What is the difference between supervised and unsupervised learning?", "assistant": "Supervised learning uses labeled data (input-output pairs) to train models, like predicting house prices from features. Unsupervised learning finds patterns in unlabeled data, like clustering similar customers. Supervised learning is more accurate but requires labeled data; unsupervised is useful for exploration."}, {"user": "What about semi-supervised learning?", "assistant": "Semi-supervised learning uses both labeled and unlabeled data. It's useful when labeling is expensive. The unlabeled data helps improve performance by providing additional patterns. Common techniques include pseudo-labeling (labeling unlabeled data with the current model) and consistency regularization."}]}
|