Instructions to use SKIS-AI-Research/EPT-I with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SKIS-AI-Research/EPT-I with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SKIS-AI-Research/EPT-I")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("SKIS-AI-Research/EPT-I", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use SKIS-AI-Research/EPT-I with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SKIS-AI-Research/EPT-I" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SKIS-AI-Research/EPT-I", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/SKIS-AI-Research/EPT-I
- SGLang
How to use SKIS-AI-Research/EPT-I with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SKIS-AI-Research/EPT-I" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SKIS-AI-Research/EPT-I", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SKIS-AI-Research/EPT-I" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SKIS-AI-Research/EPT-I", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use SKIS-AI-Research/EPT-I with Docker Model Runner:
docker model run hf.co/SKIS-AI-Research/EPT-I
EPT-I is the first generation of the Efficiency-Prioritized Token-mixer(EPT) series, LLMs designed for extra efficient inference by reducing computation and memory occupation through architectural designs. EPT-I has 3 billion parameters(3B), allowing smoother inference on computation/memory-constrained devices.
Primary Architectural Features
1D Convolution Layers: EPT-I implements depthwise 1D convolution and pointwise computations instead of conventional Multi-Layer Perceptron(MLP) layers. This allows the FFN layers to process local context feature and positional signals, drastically improving both computational efficiency and performance, leading to higher token throughput, lower memory usage, less parameters with the same hidden size and number of layers, less training time and higher performance. EPT-I leverages convolution to model languages in continous way instead of processing them discretely per token. As the result, the model is able to retain smaller number of parameters while keeping the hidden size and number of layers same, supercharging on-device/edge inference.
Multi-Head Latent Attention(MLA): EPT-I uses Multi-Head Latent Attention, inspired by DeepSeek, to minimize KV Cache increment during long-context inference. This allows the model to keep the memory usage low while preserving intelligence, addressing the challenges in long-context scenarios.
Multi-Token Prediction(MTP): Instead of predicting one token at a time, the model predicts multiple tokens simultaneously, boosting both training and inference speed.
Intended Use
EPT-I is primarily designed as an educational assistant, but at the same time it is capable of performing as a generic LLM. It is recommended to use as a chatbot for aiding students' academic achievements, but can be used for other purposes such as accelerating STEM research.
Out of Scope Use