i3 Model - Memory-Optimized Efficient Conversational Language Model

Model Description

The i3 Model is a memory-optimized language model designed for conversational understanding. This version uses streaming tokenization to minimize RAM usage during training.

Model Statistics

Vocabulary Size: 4,466 (variable-length chunks)
Hidden Dimension: 512
Number of Layers: 24
Max Sequence Length: 256
Total Parameters: 22,640,626
Tokenization: Memory-efficient variable-length chunking (2-3 characters)

To use the model check the user.py.

Key Features

Memory-Optimized: Streaming tokenization reduces RAM usage significantly
Proprietary Hybrid Architecture: Advanced sequence processing with linear complexity
Variable-Length Tokenization: Smart chunking strategy for better compression
Conversational Focus: Specialized for dialogue and emotional understanding

Training Details

Dataset: TinyChat
Training Objective: Next-token prediction with proprietary optimization
Framework: PyTorch
Memory Optimization: Streaming dataset processing

Technical Report: i3 Pre-training

Executive Summary The i3 model, a small-scale text generation architecture, successfully completed its initial pre-training phase. This training was conducted on an NVIDIA GeForce RTX 3060 and required approximately 17 hours of continuous processing. The resulting model artifacts are configured for deployment on the HuggingFace platform. The model is characterized by a compact architecture featuring 24 layers and a hidden dimension of 512, paired with a custom "chunk" tokenization strategy designed for efficiency on conversational data.

Model Configuration and Architecture The i3Model architecture is designed to be highly efficient, likely incorporating elements of a State Space Model (SSM) due to the low-rank and state-space parameters (rank and d_state).

Parameter	Value	Description
Model Type	i3Model	Custom, high-efficiency architecture (likely SSM-enhanced).
Hidden Dimension (d_{model})	512	The size of the vector space for internal representations.
Number of Layers (n_{layers})	24	The depth of the model's processing blocks.
Attention Heads (n_{heads})	16	The number of parallel attention mechanisms (if applicable).
State Dimension (d_{state})	64	Indicates the size of the recurrent state, common in SSMs.
Rank	128	Potentially used for low-rank projection in attention or state mechanisms.
Max Sequence Length	256	The maximum number of tokens/chunks the model can process at once.
Vocabulary Size	4,466	The total number of unique chunks/tokens in the vocabulary.

Training Environment and Duration The training phase was characterized by high hardware efficiency, achieving a complete pre-training run on consumer-grade hardware in a short timeframe.

Hardware Used: NVIDIA GeForce RTX 3060 (12GB VRAM assumed).
Total Training Time: Approximately 17 hours.
Framework: PyTorch (with HuggingFace Transformers for generation of final files).

Training Data and Procedure Dataset The model was pre-trained using the TinyChat dataset, which comprised 1,000,000 conversations. This suggests the model is optimized for rapid, short-form conversational tasks. Tokenization Strategy A crucial element of the model's efficiency is its custom tokenization approach:

Tokenizer Type: chunk
Strategy: variable_2_3

Vocabulary: The vocabulary size is notably small (4,466 chunks), indicating that the tokenizer is designed to aggregate common sequences of text into single tokens, significantly reducing the effective sequence length and computational cost during training. Performance Metrics Training showed consistent iteration steps, with the log reporting final metrics as the process concluded:

Metric	Range (Last 500 Iterations)	Observation
Loss	1.98 - 2.27	Training loss remained relatively stable, suggesting convergence towards the end of the run.
Perplexity (PPL)	7.29 - 9.70	Perplexity is a measure of how well the model predicts the next token. This range is typical for raw pre-training logs and indicates the model has learned basic sequence dependencies.
Time per Iteration	\sim 8.2 \text{s} - 12.7 \text{s}	Processing time per iteration shows a sustained and efficient training throughput.

Deliverables Upon completion, the necessary files for deployment were generated into the i3_model_hf/ directory, ensuring immediate compatibility with the HuggingFace ecosystem:

pytorch_model.bin (Model Weights)
config.json (Model Configuration)
tokenizer.json (Vocabulary File)
tokenizer_config.json (Tokenizer Configuration) The model is now ready for fine-tuning on a specific downstream task or for evaluation of its foundational text generation capabilities.

Downloads last month: 2

Safetensors

Model size

22.6M params

Tensor type

F32

Dataset used to train i3-lab/i3-22m

Collection including i3-lab/i3-22m

i3

Collection

8 items • Updated Jan 8