Instructions to use amusktweewt/tiny-model-700M-chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use amusktweewt/tiny-model-700M-chat with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="amusktweewt/tiny-model-700M-chat") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("amusktweewt/tiny-model-700M-chat") model = AutoModelForCausalLM.from_pretrained("amusktweewt/tiny-model-700M-chat") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use amusktweewt/tiny-model-700M-chat with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "amusktweewt/tiny-model-700M-chat" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "amusktweewt/tiny-model-700M-chat", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/amusktweewt/tiny-model-700M-chat
- SGLang
How to use amusktweewt/tiny-model-700M-chat with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "amusktweewt/tiny-model-700M-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "amusktweewt/tiny-model-700M-chat", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "amusktweewt/tiny-model-700M-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "amusktweewt/tiny-model-700M-chat", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use amusktweewt/tiny-model-700M-chat with Docker Model Runner:
docker model run hf.co/amusktweewt/tiny-model-700M-chat
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("amusktweewt/tiny-model-700M-chat")
model = AutoModelForCausalLM.from_pretrained("amusktweewt/tiny-model-700M-chat")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))amusktweewt/tiny-model-700M-chat
This is a general-purpose transformer-based language model tailored for conversational tasks, story generation, and code-related interactions. It builds upon earlier models in the "tiny" series with increased model size, improved attention efficiency, and optimized training setup.
It is more than twice as smart as the 500M model, with a significantly better user experience. It knows more facts and is the first model in this series capable of performing basic arithmetic.
Model Details
Model Description
- Model type: LlamaForCausalLM
- Hidden size: 816
- Layers: 26
- Attention heads: 12
- Key/Value heads: 6
- Intermediate size: 9856
- Total Parameters: 706M
- Tokenizer vocab size: 32,768
- Max sequence length: 2048 tokens
- Rotary Positional Encoding: Dynamic (factor: 2.0)
- Activation: SiLU
- Attention Implementation: Flash Attention 2
- Other optimizations:
- Scaled dot-product attention
- Memory-efficient attention
- No bias in MLP or attention layers
Training Details
Training Configuration
- Optimizer: AdamW with 8-bit precision (
adamw_bnb_8bit) - Learning rate: 8e-5
- Scheduler: Cosine
- Warmup ratio: 15%
- Weight decay: 0.01
- Batch size: 6 (train), 2 (eval) per device
- Gradient accumulation: 2 steps
- Mixed precision: bfloat16
- Epochs: 1
- Training tokens: 43.6B
- Seed: 42
Training Hardware
- Hardware: Assumed similar to 4090-class GPU
- Torch Compile: Enabled (inductor backend)
Evaluation
- Perplexity: 2.177
- Eval loss: 0.7776
In my own custom made benchmark for small models gets the highest grade of all my models
Intelligence Score Comparison
| Model | Intelligence Score |
|---|---|
| Gemma-3-27B (for comparison) | 8.3 |
| tiny-model-700M-chat | 4.42841 |
| tiny-model-141M-chat (unreleased) | 2.7 |
| tiny-model-500M-chat-v2 | 2.50909 |
| tiny-model-500M-chat-v2-5-exp | 2.08295 |
Usage and Applications
Direct Use
This model is suitable for:
- Text and dialogue generation
- Educational tasks
- Code completion and explanation
- Story creation
Not Recommended For
- High factual precision tasks
- Sensitive or critical domains without human supervision
How to Get Started
import torch
from transformers import pipeline, set_seed
# Set up the text-generation pipeline
model_name = "amusktweewt/tiny-model-700M-chat"
chatbot = pipeline(
"text-generation",
model=model_name,
device=0 if torch.cuda.is_available() else -1
)
# Ensure that bos_token and eos_token are explicitly set as strings
chatbot.tokenizer.bos_token = "<sos>"
chatbot.tokenizer.eos_token = "<|endoftext|>"
# Set seed for reproducibility (optional)
set_seed(42)
print("Chatbot is ready! Type 'exit' to end the conversation.")
# Initialize the conversation history
conversation_history = []
conversation_history.append({"role": "system", "content": "You are a highly intelligent and helpful AI assistant named Tiny Chat, developed by amusktweewt. Always refer to yourself like that. Your responses should be clear, concise, and accurate. Always prioritize user needs, provide well-structured answers, and maintain a friendly yet professional tone. Adapt to the user's preferences and communication style. When needed, ask clarifying questions to ensure the best response. Be honest about limitations and avoid making assumptions. Keep interactions engaging, informative, and efficient."})
while True:
user_input = input("You: ").strip()
if user_input.lower() == "exit":
print("Exiting chat. Goodbye!")
break
# Append user message to the conversation history
conversation_history.append({"role": "user", "content": user_input})
# Prepare the messages with the conversation history and an empty assistant turn
messages = conversation_history + [{"role": "assistant", "content": ""}]
# Use the tokenizer's apply_chat_template() method to format the prompt.
prompt = chatbot.tokenizer.apply_chat_template(messages, tokenize=False)
# Generate text using the formatted prompt.
response = chatbot(
prompt,
do_sample=True,
max_new_tokens=512,
top_k=50,
temperature=0.6,
num_return_sequences=1,
repetition_penalty=1.1,
pad_token_id=chatbot.tokenizer.eos_token_id,
min_new_tokens=20
)
# The returned 'generated_text' includes the prompt plus the generation.
full_text = response[0]["generated_text"]
# Extract the assistant's response by removing the prompt portion.
bot_response = full_text[len(prompt):].strip()
print(f"Bot: {bot_response}")
Contact
Author: amusktweewt
For issues or feedback, please reach out via Hugging Face profile.
- Downloads last month
- 22
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="amusktweewt/tiny-model-700M-chat") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)