Text Generation
PyTorch
Transformers
English
language-model
graph-neural-network
sparse-attention
adaptive-depth
temporal-decay
mesh-attention
efficient-transformer
novel-architecture
causal-lm
research
preprint
mesh-transformer
dynamic-graph
early-exit
per-token-routing
Eval Results (legacy)
Instructions to use vigneshwar234/TemporalMesh-Transformer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use vigneshwar234/TemporalMesh-Transformer with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="vigneshwar234/TemporalMesh-Transformer")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("vigneshwar234/TemporalMesh-Transformer", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use vigneshwar234/TemporalMesh-Transformer with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "vigneshwar234/TemporalMesh-Transformer" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "vigneshwar234/TemporalMesh-Transformer", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/vigneshwar234/TemporalMesh-Transformer
- SGLang
How to use vigneshwar234/TemporalMesh-Transformer with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "vigneshwar234/TemporalMesh-Transformer" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "vigneshwar234/TemporalMesh-Transformer", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "vigneshwar234/TemporalMesh-Transformer" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "vigneshwar234/TemporalMesh-Transformer", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use vigneshwar234/TemporalMesh-Transformer with Docker Model Runner:
docker model run hf.co/vigneshwar234/TemporalMesh-Transformer
| """ | |
| TMTConfig β central configuration for the TemporalMesh Transformer. | |
| Novel vs standard: a single config surface that governs dynamic graph topology | |
| (graph_k), per-token adaptive depth (exit_threshold), temporal decay rate, and | |
| the dual-stream FFN β none of which exist in vanilla transformer configs. | |
| """ | |
| from dataclasses import dataclass, field | |
| class TMTConfig: | |
| # Vocabulary & sequence | |
| vocab_size: int = 32000 | |
| max_seq_len: int = 1024 | |
| # Core dims | |
| d_model: int = 512 | |
| n_heads: int = 8 | |
| n_layers: int = 12 | |
| # Innovation 1 β Mesh Attention | |
| graph_k: int = 8 # each token connects to k nearest neighbours by cosine sim | |
| # Innovation 2 β Temporal decay | |
| decay_rate: float = 0.1 # base for learned temporal decay scalars | |
| # Innovation 3 β Adaptive depth routing | |
| exit_threshold: float = 0.85 # confidence above which a token exits early | |
| # Dual-stream FFN | |
| dual_stream: bool = True | |
| ffn_stream_dim: int = 256 # each stream is d_model // 2 | |
| # Memory anchors | |
| memory_anchors: int = 16 # number of persistent KV memory parameter vectors | |
| # Training | |
| dropout: float = 0.1 | |
| layer_norm_eps: float = 1e-5 | |
| def __repr__(self) -> str: | |
| total_params_est = ( | |
| self.vocab_size * self.d_model # embedding | |
| + self.n_layers * ( | |
| 4 * self.d_model * self.d_model # attention projections | |
| + 2 * self.d_model * self.ffn_stream_dim # dual stream FFN | |
| + self.d_model # exit gate + memory | |
| ) | |
| ) | |
| return ( | |
| f"TMTConfig(" | |
| f"vocab={self.vocab_size}, d={self.d_model}, " | |
| f"heads={self.n_heads}, layers={self.n_layers}, " | |
| f"k={self.graph_k}, decay={self.decay_rate}, " | |
| f"exit_thr={self.exit_threshold}, " | |
| f"~params={total_params_est / 1e6:.1f}M)" | |
| ) | |