Artha-1

Artha-1

A compact JEPA-style LNN language model. Artha-1 is built around a single core idea: a Liquid Neural Network operating as a JEPA-style predictor over a frozen text embedding space. There is no token-level loss, no autoregressive token-by-token decoder inside the model, and no separately trained generative head. The entire learned component of Artha-1 is one JEPA-style LNN.

Parameters ~400M
Model type JEPA-style LNN
Embedding space Bottleneck-T5 latents (frozen, 768-dim)
Learned component Liquid Neural Network JEPA-style predictor
Format PyTorch (.pth) with a Python pipeline wrapper
License MIT
Repo vyomie/artha-1

Overview

Artha-1 is, end to end, a JEPA-style LNN. The term is meant literally:

  • JEPA-style because the model predicts in an abstract embedding space rather than at the token level, following the Joint Embedding Predictive Architecture framework proposed by Yann LeCun and collaborators.
  • LNN because the predictor doing that JEPA-style prediction is a Liquid Neural Network, with continuous-time dynamics and input-dependent behavior.

The frozen Bottleneck-T5 autoencoder is not part of the "model" in any learned sense. It is a fixed embedding space, the same way pixels are a fixed input space for a vision model. All learning happens inside the JEPA-style LNN.

This makes Artha-1 one of the first independent open-source attempts at a JEPA-style LNN for language. At ~400M parameters it sits in the small-model range, light enough to load and experiment with on a single consumer GPU, and intentionally compact so the JEPA-style LNN can be studied in isolation without the overhead of a frontier-scale stack.

What a JEPA-style LNN Is

A JEPA-style LNN is the fusion of two ideas:

The JEPA-style part. Joint Embedding Predictive Architectures replace generative, output-level prediction with prediction in a learned embedding space. Loss is measured between embeddings, not between raw outputs. This means:

  • The model focuses on what comes next semantically, not how it is spelled.
  • Cross-entropy on tokens, which heavily penalizes valid alternative phrasings, is gone. Cosine similarity between embeddings rewards "close enough in meaning."
  • The predictor is a small, focused network. The heavy lifting of language understanding lives in the frozen encoder/decoder.

The LNN part. Liquid Neural Networks have continuous-time, ODE-based dynamics and adaptive, input-dependent behavior. They are well suited to navigating a continuous manifold, which is exactly what JEPA-style prediction asks a predictor to do: take a point in embedding space (the context) and produce another point (the target). Discrete-token Transformers excel at sequences of symbols. LNNs are a more natural fit for "move smoothly from one point in meaning-space to another."

Put together, a JEPA-style LNN is a model whose only learned component is a Liquid Neural Network trained to predict embeddings from embeddings under a JEPA-style objective. That is Artha-1.

Architecture

Three components, only one of them learned:

  1. Context Encoder (frozen). Bottleneck-T5 from thesephist/contra-bottleneck-t5-base-wikipedia. Maps the input prompt to a 768-dim latent.
  2. JEPA-style LNN (trained). A Liquid Neural Network with gated recurrent layers, dynamic temporal memory, and 4,000 hidden units. Takes the context latent and predicts a target latent in the same embedding space.
  3. Decoder (frozen). The same Bottleneck-T5 in reverse, used at inference time to turn the predicted latent back into text.

The training signal is cosine similarity between the JEPA-style LNN's predicted latent and the encoder's latent for the ground-truth continuation. No token-level loss. No autoregressive loop inside the JEPA-style LNN. Prediction happens in one shot in embedding space. Tokens are only unrolled by the frozen decoder at inference time, and they are downstream of the JEPA-style LNN, not part of its training objective.

JEPA roles in Artha-1

JEPA-style role Artha-1 component Trained?
Context encoder Bottleneck-T5 encoder โŒ frozen
Target encoder Bottleneck-T5 encoder (same network) โŒ frozen
Predictor JEPA-style LNN โœ… trained
Decoder (inference only) Bottleneck-T5 decoder โŒ frozen

Classical I-JEPA and V-JEPA train the encoder jointly with the predictor and use a separate EMA target network. Artha-1 fixes the encoder to keep training cheap on consumer hardware, but the predictor itself, the part that actually learns, is fully JEPA-style and fully an LNN.

Training

Data Synthetic Q&A pairs generated with open-source LLMs
Embedding dim 768 (Bottleneck-T5 latent)
JEPA-style LNN hidden units 4,000
Optimizer AdamW with Stochastic Weight Averaging (SWA)
Objective Cosine similarity between predicted and target embeddings
Token-level loss None
Duration ~2 to 3 days on mid-range consumer GPUs

About the Author

Artha-1 was built by Vyom N. Patel, an independent teen researcher working on alternative LLM architectures, low-resource training, and JEPA-style LNNs. The project was designed and trained end-to-end on consumer hardware without institutional backing.

Intended Use

Artha-1 is suitable for:

  • Research on JEPA-style LNNs for language
  • Experimentation with embedding-space prediction as an alternative to next-token modeling
  • Educational demos and architecture walkthroughs of JEPA-style systems
  • Distillation, probing, or fine-tuning experiments on compact JEPA-style LNNs

Not Intended For

  • High-stakes domains (medical, legal, safety-critical)
  • Tasks requiring factual reliability or robustness
  • A drop-in replacement for general-purpose LLMs (GPT, LLaMA, Claude, and similar)

Outputs may be inconsistent, factually wrong, or incoherent. Treat anything Artha-1 says as a research artifact, not a source of truth. The JEPA-style LNN setup is experimental, and behavior on out-of-distribution prompts is unpredictable.

Usage

Option 1: Install the package

pip install arthaLM
from arthaLM import Pipeline

pipe = Pipeline(model_name="vyomie/artha-1")
print(pipe("Hello, how are you?"))

Option 2: Load directly from the Hub

pip install torch transformers==4.36.1 huggingface_hub
import sys, importlib.util
from huggingface_hub import snapshot_download

snapshot_download(
    "vyomie/artha-1",
    local_dir="/tmp/vyomie_artha-1",
    local_dir_use_symlinks=False,
)

spec = importlib.util.spec_from_file_location(
    "model", "/tmp/vyomie_artha-1/model.py"
)
model = importlib.util.module_from_spec(spec)
sys.modules["model"] = model
spec.loader.exec_module(model)

pipe = model.Pipeline("vyomie/artha-1")
print(pipe("Hi, how are you?"))

Limitations

  • Trained on synthetic data, so outputs reflect the biases and gaps of the generator models.
  • The JEPA-style LNN setup is experimental; behavior on out-of-distribution prompts is unpredictable.
  • The encoder and decoder are frozen, so the embedding space is whatever Bottleneck-T5 provides. The JEPA-style LNN cannot reshape that space.
  • No safety tuning, alignment, or RLHF has been applied.

License

Released under the MIT License.

Citation

@misc{artha1,
  title  = {Artha-1: A JEPA-Style LNN for Language},
  author = {Vyom N. Patel},
  url    = {https://huggingface.co/vyomie/artha-1}
}
Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 2 Ask for provider support