Instructions to use vyomie/Artha-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use vyomie/Artha-1 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("vyomie/Artha-1", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Artha-1
A compact JEPA-style LNN language model. Artha-1 is built around a single core idea: a Liquid Neural Network operating as a JEPA-style predictor over a frozen text embedding space. There is no token-level loss, no autoregressive token-by-token decoder inside the model, and no separately trained generative head. The entire learned component of Artha-1 is one JEPA-style LNN.
| Parameters | ~400M |
| Model type | JEPA-style LNN |
| Embedding space | Bottleneck-T5 latents (frozen, 768-dim) |
| Learned component | Liquid Neural Network JEPA-style predictor |
| Format | PyTorch (.pth) with a Python pipeline wrapper |
| License | MIT |
| Repo | vyomie/artha-1 |
Overview
Artha-1 is, end to end, a JEPA-style LNN. The term is meant literally:
- JEPA-style because the model predicts in an abstract embedding space rather than at the token level, following the Joint Embedding Predictive Architecture framework proposed by Yann LeCun and collaborators.
- LNN because the predictor doing that JEPA-style prediction is a Liquid Neural Network, with continuous-time dynamics and input-dependent behavior.
The frozen Bottleneck-T5 autoencoder is not part of the "model" in any learned sense. It is a fixed embedding space, the same way pixels are a fixed input space for a vision model. All learning happens inside the JEPA-style LNN.
This makes Artha-1 one of the first independent open-source attempts at a JEPA-style LNN for language. At ~400M parameters it sits in the small-model range, light enough to load and experiment with on a single consumer GPU, and intentionally compact so the JEPA-style LNN can be studied in isolation without the overhead of a frontier-scale stack.
What a JEPA-style LNN Is
A JEPA-style LNN is the fusion of two ideas:
The JEPA-style part. Joint Embedding Predictive Architectures replace generative, output-level prediction with prediction in a learned embedding space. Loss is measured between embeddings, not between raw outputs. This means:
- The model focuses on what comes next semantically, not how it is spelled.
- Cross-entropy on tokens, which heavily penalizes valid alternative phrasings, is gone. Cosine similarity between embeddings rewards "close enough in meaning."
- The predictor is a small, focused network. The heavy lifting of language understanding lives in the frozen encoder/decoder.
The LNN part. Liquid Neural Networks have continuous-time, ODE-based dynamics and adaptive, input-dependent behavior. They are well suited to navigating a continuous manifold, which is exactly what JEPA-style prediction asks a predictor to do: take a point in embedding space (the context) and produce another point (the target). Discrete-token Transformers excel at sequences of symbols. LNNs are a more natural fit for "move smoothly from one point in meaning-space to another."
Put together, a JEPA-style LNN is a model whose only learned component is a Liquid Neural Network trained to predict embeddings from embeddings under a JEPA-style objective. That is Artha-1.
Architecture
Three components, only one of them learned:
- Context Encoder (frozen). Bottleneck-T5 from
thesephist/contra-bottleneck-t5-base-wikipedia. Maps the input prompt to a 768-dim latent. - JEPA-style LNN (trained). A Liquid Neural Network with gated recurrent layers, dynamic temporal memory, and 4,000 hidden units. Takes the context latent and predicts a target latent in the same embedding space.
- Decoder (frozen). The same Bottleneck-T5 in reverse, used at inference time to turn the predicted latent back into text.
The training signal is cosine similarity between the JEPA-style LNN's predicted latent and the encoder's latent for the ground-truth continuation. No token-level loss. No autoregressive loop inside the JEPA-style LNN. Prediction happens in one shot in embedding space. Tokens are only unrolled by the frozen decoder at inference time, and they are downstream of the JEPA-style LNN, not part of its training objective.
JEPA roles in Artha-1
| JEPA-style role | Artha-1 component | Trained? |
|---|---|---|
| Context encoder | Bottleneck-T5 encoder | โ frozen |
| Target encoder | Bottleneck-T5 encoder (same network) | โ frozen |
| Predictor | JEPA-style LNN | โ trained |
| Decoder (inference only) | Bottleneck-T5 decoder | โ frozen |
Classical I-JEPA and V-JEPA train the encoder jointly with the predictor and use a separate EMA target network. Artha-1 fixes the encoder to keep training cheap on consumer hardware, but the predictor itself, the part that actually learns, is fully JEPA-style and fully an LNN.
Training
| Data | Synthetic Q&A pairs generated with open-source LLMs |
| Embedding dim | 768 (Bottleneck-T5 latent) |
| JEPA-style LNN hidden units | 4,000 |
| Optimizer | AdamW with Stochastic Weight Averaging (SWA) |
| Objective | Cosine similarity between predicted and target embeddings |
| Token-level loss | None |
| Duration | ~2 to 3 days on mid-range consumer GPUs |
About the Author
Artha-1 was built by Vyom N. Patel, an independent teen researcher working on alternative LLM architectures, low-resource training, and JEPA-style LNNs. The project was designed and trained end-to-end on consumer hardware without institutional backing.
Intended Use
Artha-1 is suitable for:
- Research on JEPA-style LNNs for language
- Experimentation with embedding-space prediction as an alternative to next-token modeling
- Educational demos and architecture walkthroughs of JEPA-style systems
- Distillation, probing, or fine-tuning experiments on compact JEPA-style LNNs
Not Intended For
- High-stakes domains (medical, legal, safety-critical)
- Tasks requiring factual reliability or robustness
- A drop-in replacement for general-purpose LLMs (GPT, LLaMA, Claude, and similar)
Outputs may be inconsistent, factually wrong, or incoherent. Treat anything Artha-1 says as a research artifact, not a source of truth. The JEPA-style LNN setup is experimental, and behavior on out-of-distribution prompts is unpredictable.
Usage
Option 1: Install the package
pip install arthaLM
from arthaLM import Pipeline
pipe = Pipeline(model_name="vyomie/artha-1")
print(pipe("Hello, how are you?"))
Option 2: Load directly from the Hub
pip install torch transformers==4.36.1 huggingface_hub
import sys, importlib.util
from huggingface_hub import snapshot_download
snapshot_download(
"vyomie/artha-1",
local_dir="/tmp/vyomie_artha-1",
local_dir_use_symlinks=False,
)
spec = importlib.util.spec_from_file_location(
"model", "/tmp/vyomie_artha-1/model.py"
)
model = importlib.util.module_from_spec(spec)
sys.modules["model"] = model
spec.loader.exec_module(model)
pipe = model.Pipeline("vyomie/artha-1")
print(pipe("Hi, how are you?"))
Limitations
- Trained on synthetic data, so outputs reflect the biases and gaps of the generator models.
- The JEPA-style LNN setup is experimental; behavior on out-of-distribution prompts is unpredictable.
- The encoder and decoder are frozen, so the embedding space is whatever Bottleneck-T5 provides. The JEPA-style LNN cannot reshape that space.
- No safety tuning, alignment, or RLHF has been applied.
License
Released under the MIT License.
Citation
@misc{artha1,
title = {Artha-1: A JEPA-Style LNN for Language},
author = {Vyom N. Patel},
url = {https://huggingface.co/vyomie/artha-1}
}
- Downloads last month
- 15