Aspire.Base / README.md
GODsStrongestSoldier's picture
Update README.md
7f3b408 verified
metadata
license: apache-2.0
language:
  - en
pipeline_tag: text-generation
library_name: transformers
tags:
  - transformers
  - llama
  - long-context
  - 256k-context
  - reasoning
  - instruction-following
  - causal-lm
  - text-generation-inference
  - gqa
  - rope-scaling
  - bfloat16
  - safetensors
  - withinusai
  - Aspire_1.1B
datasets:
  - open-thoughts/OpenThoughts-114k
  - WizardLMTeam/WizardLM_evol_instruct_70k

🌌 Aspire_1.1B

Long-Context Frontier Language Model

“Built to think across distance.”

🌌 Overview

Aspire_1.1B is a highly capable 1.1 billion parameter frontier language model engineered for extreme long-context reasoning, instruction following, and scalable inference efficiency.

Developed for persistent cognition workflows, Aspire_1.1B supports a native 256K context window while maintaining strong reasoning coherence and efficient memory utilization through:

  • Grouped Query Attention (GQA)
  • dynamically scaled RoPE embeddings
  • optimized transformer routing
  • TPU-native bfloat16 training

Unlike conventional small-scale models constrained by short context windows, Aspire_1.1B is designed for:

  • long-form reasoning
  • extended conversational continuity
  • large document understanding
  • retrieval-heavy workflows
  • persistent agent memory systems
  • scalable frontier experimentation

The architecture balances:

  • efficiency
  • reasoning capability
  • long-context retention
  • deployment practicality

⚡ Model Highlights

Attribute Value Parameters ~1.12B Architecture Llama-based Causal LM Context Window 262,144 Tokens (256K) Precision bfloat16 Hidden Size 2048 Layers 22 Attention Heads 16 KV Heads 4 (GQA) Vocabulary 32K Custom BPE Optimization Adafactor Training Hardware Google Cloud TPUs

🧠 Architecture

Aspire_1.1B is built around a highly optimized transformer stack designed for efficient long-context scaling.

Core architectural features include:

  • Grouped Query Attention (GQA)
  • high-base Rotary Positional Embeddings (RoPE)
  • TPU-optimized training pathways
  • efficient KV-cache scaling
  • long-sequence extrapolation support

The architecture is optimized for:

  • inference efficiency
  • stable long-context attention
  • reduced memory overhead
  • scalable deployment workflows

🌌 Long-Context Design

256K Context Window

Aspire_1.1B supports:

  • 262,144 token context processing
  • persistent conversational memory
  • large-document reasoning
  • long-form analytical workflows
  • retrieval-augmented generation systems

The model utilizes:

  • dynamically scaled RoPE embeddings
  • Grouped Query Attention
  • optimized attention routing

to maintain coherence across extremely long sequences.

🔬 Training Details

Hardware

Component Configuration Accelerator Google Cloud TPUs (Kaggle TPU Environment) Precision bfloat16 Optimization Adafactor Framework Hugging Face Transformers + XLA

The model was trained using TPU-native workflows optimized for:

  • efficient large-scale sequence processing
  • stable long-context convergence
  • reduced memory fragmentation
  • uninterrupted checkpoint recovery

📚 Training Datasets

Aspire_1.1B was pretrained on a curated combination of reasoning and instruction-following datasets.

🧠 OpenThoughts-114k

A dense reasoning dataset focused on:

  • chain-of-thought reasoning
  • logical deduction
  • structured inference
  • analytical problem solving

Dataset: OpenThoughts-114k

⚡ WizardLM Evol Instruct 70K

An evolved instruction-following dataset designed to improve:

  • prompt adherence
  • formatting consistency
  • complex instruction execution
  • conversational alignment

Dataset: WizardLM Evol Instruct 70K

💻 Usage

Loading the Model

from transformers import AutoTokenizer, AutoModelForCausalLM import torch repo_id = "GODsStrongestSoldier/Aspire_1.1B" tokenizer = AutoTokenizer.from_pretrained(repo_id) model = AutoModelForCausalLM.from_pretrained( repo_id, torch_dtype=torch.bfloat16, device_map="auto" )

Text Generation Example

prompt = """ Explain the concept of RoPE (Rotary Positional Embeddings) and how it benefits 256K context windows. Answer: """ inputs = tokenizer( prompt, return_tensors="pt" ).to(model.device) outputs = model.generate( **inputs, max_new_tokens=512, temperature=0.7, top_p=0.9 ) response = tokenizer.decode( outputs[0], skip_special_tokens=True ) print(response)

🔄 Checkpointing & Recovery

Aspire_1.1B was trained using a robust checkpointing system that continuously saved training state directly to the Hugging Face Hub.

This workflow enabled:

  • uninterrupted TPU training continuation
  • session recovery across Kaggle runtime limits
  • persistent optimizer state management
  • scalable long-duration pretraining workflows

⚙️ Intended Use Cases

Domain Purpose Long-Context Chat Persistent conversational memory Document Analysis Large-scale text understanding Frontier Research Long-sequence experimentation Instruction Following Complex prompt execution Retrieval Systems RAG & memory augmentation Agentic Workflows Persistent reasoning systems

⚠️ Limitations

Aspire_1.1B is an experimental open language model. Human verification is recommended for:

  • medical information
  • legal advice
  • financial decisions
  • safety-critical applications

🌵 Origin

Developed through independent frontier AI experimentation using:

  • Kaggle TPU infrastructure
  • Hugging Face Transformers
  • open reasoning datasets
  • long-context architecture research

Focused on:

  • efficient frontier models
  • scalable context systems
  • accessible open AI research
  • persistent reasoning architectures

👑 Final Motto

“Long context is memory. Memory is continuity. Continuity is intelligence.”