Instructions to use codewithdark/latent-recurrent-depth-lm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use codewithdark/latent-recurrent-depth-lm with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="codewithdark/latent-recurrent-depth-lm", trust_remote_code=True)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("codewithdark/latent-recurrent-depth-lm", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use codewithdark/latent-recurrent-depth-lm with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "codewithdark/latent-recurrent-depth-lm"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "codewithdark/latent-recurrent-depth-lm",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/codewithdark/latent-recurrent-depth-lm

SGLang

How to use codewithdark/latent-recurrent-depth-lm with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "codewithdark/latent-recurrent-depth-lm" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "codewithdark/latent-recurrent-depth-lm",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "codewithdark/latent-recurrent-depth-lm" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "codewithdark/latent-recurrent-depth-lm",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use codewithdark/latent-recurrent-depth-lm with Docker Model Runner:
```
docker model run hf.co/codewithdark/latent-recurrent-depth-lm
```

Latent Recurrent Depth Language Model

Overview

The Latent Recurrent Depth Language Model (LRD-LM) is an experimental text-generation architecture designed to capture deeper contextual information through iterative, latent processing. Instead of generating verbose chain-of-thought sequences, LRD-LM refines its internal state over multiple recurrent iterations to improve text generation quality while keeping the parameter count modest.

Architecture

The model is built around three key components:

Prelude Block:
This block handles the initial processing by embedding input tokens and applying self-attention with positional encodings.
Recurrent Block:
A core, weight-shared block that iteratively refines a latent state. By repeatedly processing the prelude output along with its own evolving state, the model effectively “thinks” over the input without outputting intermediate tokens.
Coda Block:
The final block decodes the refined latent state into output token probabilities.

Applications & Limitations

Intended Uses:

Text Generation:
Generate creative text, dialogue, code, or other natural language content.
Research:
Serve as a testbed for exploring novel architectures and techniques in language modeling.

Limitations:

Data Constraints:
Trained on a small subset (first 1000 samples) of the Wikitext-2-raw-v1 dataset, which may limit its performance compared to models trained on larger corpora.
Performance:
While it demonstrates the potential of latent recurrent depth, its overall performance is experimental and may not match state-of-the-art models.
Computational Overhead:
The iterative processing introduces extra computation.
Bias:
As with all language models, generated outputs may reflect biases present in the training data.

Training Details

The model was fine-tuned on a subset of the Wikitext-2-raw-v1 dataset (first 1000 samples) using the AdamW optimizer and a cosine annealing learning rate scheduler. The training configuration and hyperparameters are provided in the accompanying code, and adjustments may be needed for improved performance.

Usage

The model can be used for text generation via its integrated generate() method, which allows you to control parameters such as the maximum sequence length, number of recurrent iterations, temperature, and top‑k filtering.

Example: Direct Inference

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoModel

# Load the model and tokenizer from the hub
model = AutoModelForCausalLM.from_pretrained("codewithdark/latent-recurrent-depth-lm")
tokenizer = AutoTokenizer.from_pretrained("codewithdark/latent-recurrent-depth-lm")

prompt = "In the realm of language modeling"
input_ids = tokenizer(prompt, return_tensors='pt').input_ids

# Generate logits using a specified number of recurrent iterations
logits = model(input_ids, num_iterations=3)

# Sample from logits to produce generated text
import torch
probs = torch.softmax(logits[:, -1, :], dim=-1)
next_token = torch.multinomial(probs, num_samples=1)
generated_ids = torch.cat([input_ids, next_token], dim=1)
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
clean_text = generated_text.replace('Ġ','')
print(generated_text)

Alternative: Using the `generate()` Method

from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("codewithdark/latent-recurrent-depth-lm")
model = AutoModel.from_pretrained("codewithdark/latent-recurrent-depth-lm", trust_remote_code=True)

prompt = "In the realm of language modeling"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
generated_ids = model.generate(input_ids, max_length=50, num_iterations=10, temperature=0.5, top_k=50)
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
clean_text = generated_text.replace('Ġ','')
print(clean_text)

Ethical Considerations

This model is intended for research and experimental use. Users must ensure ethical application and carefully consider potential biases and misuse when deploying or further developing this technology.

License

This project is licensed under the MIT License.

Downloads last month: 27

Safetensors

Model size

92.8M params

Tensor type

F32

codewithdark
/

latent-recurrent-depth-lm

Latent Recurrent Depth Language Model

Overview

Architecture

Applications & Limitations

Training Details

Usage

Example: Direct Inference

Alternative: Using the `generate()` Method

Ethical Considerations

License

Dataset used to train codewithdark/latent-recurrent-depth-lm

Space using codewithdark/latent-recurrent-depth-lm 1

Latent Recurrent Depth Language Model

Overview

Architecture

Applications & Limitations

Training Details

Usage

Example: Direct Inference

Alternative: Using the generate() Method

Ethical Considerations

License

Dataset used to train codewithdark/latent-recurrent-depth-lm

Space using codewithdark/latent-recurrent-depth-lm 1

Alternative: Using the `generate()` Method