i3-lab
/

i3-tiny

@@ -7,88 +7,84 @@ library_name: transformers
 tags:
 - i3-arhitecture
 ---
-# i3-tiny: The i3 Architecture (v1)
-## Model Description
-The **i3-tiny** is an experimental Language Model (LM) utilizing the proprietary **i3** Architecture. It is designed for ultra-efficiency and a low memory footprint.
-This model is a **character-level autoregressive decoder** trained on a small English corpus.
----
-## Key Architectural Features
-The **i3 Block** is a novel, single-layer design that achieves high efficiency by minimizing parameter and computational costs. It integrates advanced mechanisms for sequence processing:
-* **Proprietary Recurrence Mechanism:**
-  A specialized hybrid recurrence component manages sequential dependencies efficiently, avoiding the quadratic complexity of standard self-attention.
-* **Low-Rank Attention:**
-  Attention mechanisms use highly factorized, low-rank projections, significantly reducing memory and compute costs associated with Key, Query, and Value matrices.
-* **Low-Rank Feedforward:**
-  The standard FFN is replaced by proprietary low-rank factorization layers to maximize parameter efficiency throughout the block.
----
-## Model Scale (v1 Configuration)
-| **Parameter**       | **Value**                 | **Notes**                                          |
-| ------------------- | ------------------------- | -------------------------------------------------- |
-| **Model Size**      | Approx. 40–50M parameters | Exact count is printed during training initiation. |
-| **d_model**         | 256                       | Hidden dimension size.                             |
-| **n_layers**        | 6                         | Number of hybrid i3 blocks.                        |
-| **n_heads**         | 8                         | Number of attention heads.                         |
-| **Recurrence Rank** | 16 (d_state for Mamba)    | Size of the proprietary recurrence state.          |
-| **Low-Rank Rank**   | 8                         | Rank used for low-rank factorizations.             |
 ---
-## Intended Uses and Limitations
-### Intended Uses
-* **Benchmarking & Research:** Exploring the training speed and final loss achievement of the i3 architecture against fully Transformer-based models of similar scale.
-* **Proof of Concept:** Demonstrating an ultra-efficient training and inference paradigm.
-### Out-of-Scope Use and Limitations
-* **Production Use:** *Do not* use this model for real-world text generation, translation, or conversation.
-* **General Language Tasks:** Due to the extremely small and repetitive training dataset (even with 10× repetition), the model has a very limited understanding of grammar, syntax, and semantics. It will primarily generate repetitive and fragmented text based on corpus patterns.
 ---
-## Training Details
-### Training Data
-| **Parameter**     | **Value**                 | **Notes**                                                                 |
-| ----------------- | ------------------------- | ------------------------------------------------------------------------- |
-| **Source**        | Public Domain Sample Text | The original sample text provided in the source code.                     |
-| **Volume**        | 10× Repetition            | The original text was repeated 10 times to increase training data volume. |
-| **Tokenization**  | Character-level           | Vocabulary size is determined by unique characters (≈32–35).              |
-| **Preprocessing** | Lowercased                | All training data is normalized to lowercase characters.                  |
 ---
-### Hyperparameters
-| **Parameter**       | **Value**          | **Notes** |
-| ------------------- | ------------------ | --------- |
-| **Optimization**    | AdamW              |           |
-| **Learning Rate**   | 3e-4               |           |
-| **Max Iterations**  | 2000               |           |
-| **Batch Size**      | 2                  |           |
-| **Sequence Length** | 128                |           |
-| **Loss Function**   | Cross-Entropy Loss |           |
 ---
-## Performance Metrics
-* **Initial Loss (Expected):**
-  The model should start with a Cross-Entropy Loss between **3.0** and **4.0**, depending on the final character vocabulary size (≈ ln V).
-* **Target Loss:**
-  With the increased data volume (10×), the model should drop the training loss well below **2.0** and aim closer to **1.0** to be considered successfully trained and leveraging its increased capacity.

 tags:
 - i3-arhitecture
 ---
+# Model Card: i3Model (Hybrid Efficient LLM)
+## Overview
+**i3Model** is a research-focused, ultra-efficient large language model prototype designed for exploring advanced hybrid architectures that balance **performance, scalability, and memory efficiency**. It integrates several experimental mechanisms for sequence modeling, low-rank parameterization, and quantization-aware training to achieve strong performance under resource constraints.
+This model was developed for experimentation in lightweight large language modeling, particularly for tasks such as:
+* Character- or token-level language modeling
+* Text generation and continuation
+* Research into efficient training and deployment techniques
+> **Note:** Architectural details are proprietary and are intentionally omitted.
+---
+## Intended Use
+The model is intended for:
+* Academic research on hybrid recurrent-transformer architectures
+* Prototyping efficient LLMs for low-resource environments
+* Studying low-rank adaptation and quantization for model compression
+It is **not** optimized or tested for production deployment, safety-critical applications, or real-world text generation beyond controlled research settings.
 ---
+## Key Features
+* **Hybrid Recurrent–Sequence Modeling:** Combines sequence-mixing and dynamic state-space mechanisms for temporal reasoning.
+* **Low-Rank Parameterization:** Reduces parameter footprint while maintaining expressivity.
+* **Quantization-Aware Design:** Uses a 4-bit quantization scheme with FP32 master weights for training stability.
+* **Causal Autoregressive Training:** Enables next-token prediction and controlled text generation.
+* **Modular and Extensible:** Supports layer-wise experimentation and scalable configuration.
+---
+## Training Details
+* **Training Objective:** Next-token prediction (causal language modeling)
+* **Dataset:** Custom small-scale character-level corpus derived from public domain text passages
+* **Sequence Length:** 128 tokens (for prototype training)
+* **Optimization:** AdamW optimizer with weight decay and gradient-based updates
+* **Learning Rate:** 3e-4
+* **Training Duration:** ~2000 iterations on a small dataset
+* **Batch Size:** 2
+The model was trained primarily for demonstration and performance measurement purposes rather than benchmark-level convergence.
 ---
+## Evaluation
+Evaluation focused on:
+* **Training stability**
+* **Generation coherence at small scale**
+* **Speed and memory performance metrics**
+While not benchmarked on large-scale NLP datasets, the model demonstrates promising early results in lightweight text generation with efficient runtime characteristics.
 ---
+## Limitations
+* The model is trained on a very limited dataset and may produce incoherent or repetitive outputs.
+* It lacks fine-tuning for alignment, safety, or factual consistency.
+* It is unsuitable for deployment in sensitive or user-facing contexts.
+* Generation quality is constrained by vocabulary and training corpus diversity.
 ---
+## Ethical Considerations
+This model is intended solely for **research and educational use**. Users should:
+* Avoid using it to generate misleading or harmful content.
+* Not deploy it in systems interacting with the public without additional alignment and safety layers.
+* Attribute the model appropriately if adapted or redistributed.