How to use from
vLLM
# Gated model: Login with a HF token with gated access permission
hf auth login
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "WithinUsAI/Aspire.Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WithinUsAI/Aspire.Base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'
Use Docker
docker model run hf.co/WithinUsAI/Aspire.Base
Quick Links

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

🌌 Aspire_1.1B

Long-Context Frontier Language Model

“Built to think across distance.”

🌌 Overview

Aspire_1.1B is a highly capable 1.1 billion parameter frontier language model engineered for extreme long-context reasoning, instruction following, and scalable inference efficiency.

Developed for persistent cognition workflows, Aspire_1.1B supports a native 256K context window while maintaining strong reasoning coherence and efficient memory utilization through:

  • Grouped Query Attention (GQA)
  • dynamically scaled RoPE embeddings
  • optimized transformer routing
  • TPU-native bfloat16 training

Unlike conventional small-scale models constrained by short context windows, Aspire_1.1B is designed for:

  • long-form reasoning
  • extended conversational continuity
  • large document understanding
  • retrieval-heavy workflows
  • persistent agent memory systems
  • scalable frontier experimentation

The architecture balances:

  • efficiency
  • reasoning capability
  • long-context retention
  • deployment practicality

⚡ Model Highlights

Attribute Value Parameters ~1.12B Architecture Llama-based Causal LM Context Window 262,144 Tokens (256K) Precision bfloat16 Hidden Size 2048 Layers 22 Attention Heads 16 KV Heads 4 (GQA) Vocabulary 32K Custom BPE Optimization Adafactor Training Hardware Google Cloud TPUs

🧠 Architecture

Aspire_1.1B is built around a highly optimized transformer stack designed for efficient long-context scaling.

Core architectural features include:

  • Grouped Query Attention (GQA)
  • high-base Rotary Positional Embeddings (RoPE)
  • TPU-optimized training pathways
  • efficient KV-cache scaling
  • long-sequence extrapolation support

The architecture is optimized for:

  • inference efficiency
  • stable long-context attention
  • reduced memory overhead
  • scalable deployment workflows

🌌 Long-Context Design

256K Context Window

Aspire_1.1B supports:

  • 262,144 token context processing
  • persistent conversational memory
  • large-document reasoning
  • long-form analytical workflows
  • retrieval-augmented generation systems

The model utilizes:

  • dynamically scaled RoPE embeddings
  • Grouped Query Attention
  • optimized attention routing

to maintain coherence across extremely long sequences.

🔬 Training Details

Hardware

Component Configuration Accelerator Google Cloud TPUs (Kaggle TPU Environment) Precision bfloat16 Optimization Adafactor Framework Hugging Face Transformers + XLA

The model was trained using TPU-native workflows optimized for:

  • efficient large-scale sequence processing
  • stable long-context convergence
  • reduced memory fragmentation
  • uninterrupted checkpoint recovery

📚 Training Datasets

Aspire_1.1B was pretrained on a curated combination of reasoning and instruction-following datasets.

🧠 OpenThoughts-114k

A dense reasoning dataset focused on:

  • chain-of-thought reasoning
  • logical deduction
  • structured inference
  • analytical problem solving

Dataset: OpenThoughts-114k

⚡ WizardLM Evol Instruct 70K

An evolved instruction-following dataset designed to improve:

  • prompt adherence
  • formatting consistency
  • complex instruction execution
  • conversational alignment

Dataset: WizardLM Evol Instruct 70K

💻 Usage

Loading the Model

from transformers import AutoTokenizer, AutoModelForCausalLM import torch repo_id = "GODsStrongestSoldier/Aspire_1.1B" tokenizer = AutoTokenizer.from_pretrained(repo_id) model = AutoModelForCausalLM.from_pretrained( repo_id, torch_dtype=torch.bfloat16, device_map="auto" )

Text Generation Example

prompt = """ Explain the concept of RoPE (Rotary Positional Embeddings) and how it benefits 256K context windows. Answer: """ inputs = tokenizer( prompt, return_tensors="pt" ).to(model.device) outputs = model.generate( **inputs, max_new_tokens=512, temperature=0.7, top_p=0.9 ) response = tokenizer.decode( outputs[0], skip_special_tokens=True ) print(response)

🔄 Checkpointing & Recovery

Aspire_1.1B was trained using a robust checkpointing system that continuously saved training state directly to the Hugging Face Hub.

This workflow enabled:

  • uninterrupted TPU training continuation
  • session recovery across Kaggle runtime limits
  • persistent optimizer state management
  • scalable long-duration pretraining workflows

⚙️ Intended Use Cases

Domain Purpose Long-Context Chat Persistent conversational memory Document Analysis Large-scale text understanding Frontier Research Long-sequence experimentation Instruction Following Complex prompt execution Retrieval Systems RAG & memory augmentation Agentic Workflows Persistent reasoning systems

⚠️ Limitations

Aspire_1.1B is an experimental open language model. Human verification is recommended for:

  • medical information
  • legal advice
  • financial decisions
  • safety-critical applications

🌵 Origin

Developed through independent frontier AI experimentation using:

  • Kaggle TPU infrastructure
  • Hugging Face Transformers
  • open reasoning datasets
  • long-context architecture research

Focused on:

  • efficient frontier models
  • scalable context systems
  • accessible open AI research
  • persistent reasoning architectures

👑 Final Motto

“Long context is memory. Memory is continuity. Continuity is intelligence.”

Downloads last month
151
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train WithinUsAI/Aspire.Base

Collection including WithinUsAI/Aspire.Base