Instructions to use nsschw/echo-Llama-3.1-8B-Instruct-eng with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nsschw/echo-Llama-3.1-8B-Instruct-eng with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="nsschw/echo-Llama-3.1-8B-Instruct-eng")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("nsschw/echo-Llama-3.1-8B-Instruct-eng")
model = AutoModelForCausalLM.from_pretrained("nsschw/echo-Llama-3.1-8B-Instruct-eng")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use nsschw/echo-Llama-3.1-8B-Instruct-eng with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nsschw/echo-Llama-3.1-8B-Instruct-eng"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nsschw/echo-Llama-3.1-8B-Instruct-eng",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/nsschw/echo-Llama-3.1-8B-Instruct-eng

SGLang

How to use nsschw/echo-Llama-3.1-8B-Instruct-eng with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nsschw/echo-Llama-3.1-8B-Instruct-eng" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nsschw/echo-Llama-3.1-8B-Instruct-eng",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nsschw/echo-Llama-3.1-8B-Instruct-eng" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nsschw/echo-Llama-3.1-8B-Instruct-eng",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use nsschw/echo-Llama-3.1-8B-Instruct-eng with Docker Model Runner:
```
docker model run hf.co/nsschw/echo-Llama-3.1-8B-Instruct-eng
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Model Card for echo-Llama-3.1-8B-Instruct

This model performs Conditioned Comment Prediction (CCP), designed to act as a "silicon subject" for computational social science. It predicts how a specific social media user will respond to a given stimulus by utilizing both an explicit user profile and implicit behavioral history.

Model Details

Model Description

The model was fine-tuned using Supervised Fine-Tuning (SFT) to optimize its ability to generate high-fidelity, user-specific replies to online content. It isolates the capability of response generation, prioritizing operational validity over surface-level plausibility by benchmarking against authentic digital traces.

Developed by: Nils Schwager, Simon Münker, Alistair Plum, Achim Rettinger (Trier University, University of Luxembourg)
Funded by: EU's Horizon Europe Framework (HORIZON-CL2-2022-DEMOCRACY-01-07)
Model type: Autoregressive Large Language Model (Instruction-tuned)
Language(s) (NLP): English (en)
License: llama3.1
Finetuned from model: meta-llama/Llama-3.1-8B-Instruct

Model Sources

Repository: https://github.com/nsschw/Conditioned-Comment-Prediction
Paper: Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction (arXiv:2602.22752v1)

Uses

Direct Use

The model is built for computational social scientists and researchers modeling discourse dynamics and individual-level behavioral patterns. It is strictly optimized for predicting first-order text replies to textual stimuli (such as social media posts or news articles) given a specific user context.

Out-of-Scope Use

The model cannot process multi-modal inputs (e.g., URLs, images, GIFs) or simulate non-verbal interactions (e.g., liking behavior). It is not intended for generating coordinated inauthentic behavior, micro-targeted disinformation, or impersonating individuals for deceptive purposes.

Bias, Risks, and Limitations

Privacy: The training utilized authentic, public digital traces from real X (Twitter) users. These individuals did not provide explicit informed consent for their communication patterns to be replicated by generative models.
Dual-Use Risks: The framework demonstrated here enables the generation of synthetic content that mimics individual communication patterns with measurable fidelity. Malicious actors could exploit this for sophisticated bot campaigns or targeted persuasion.
Semantic Boundaries: While SFT aligns the surface-level structure of the text output, the model's semantic grounding remains bounded by the representational capacity of the 8B parameter class.

How to Get Started with the Model

The model requires a combined explicit and implicit conditioning format. Structure your prompt as a native chat sequence containing the generated user profile (system prompt) followed by up to 29 historical interactions.

System: [Insert Generated Biography]
User: Comment on the following content: [Historical Stimulus 1]
Assistant: [Authentic Historical Reply 1]
...
User: Comment on the following content: [New Target Stimulus]
Assistant:

Training Details

Training Data

The model was fine-tuned on a corpus of English X (Twitter) discourse collected up to August 2023.

Sample Size: 3,800 politically active users.
Context Limit: Up to 30 stimulus-response interactions per user.
Exclusions: Interactions containing media, and users with fewer than four historical replies.

Training Procedure

Training Hyperparameters

Training regime: 8-bit quantization, Paged AdamW optimizer.
Epochs: 1
Max sequence length: 4,500 tokens
Training paradigm: SFT on complete input sequences (system prompt, user prompts, and model completions) using TRL defaults.

Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluations were conducted on a deterministic, held-out test split of 650 English users to prevent cross-user data leakage.

Metrics

The evaluation utilized automated metrics across five independent generation runs (decoding temperature 0.75, max new tokens 500) to capture both lexical overlap and semantic alignment.

BLEU & ROUGE-1: Measures precision-oriented n-gram overlap and unigram overlap.
Length Ratio: Quantifies output volume alignment against the authentic reference length.
Embedding Distance: Assesses semantic intent alignment via cosine distance using Qwen3-Embedding-8B.

Results

BLEU: 0.083
ROUGE-1: 0.229
Length Ratio: 0.961
Embedding Distance: 0.397

Observation: Explicit biography conditioning becomes largely redundant post-fine-tuning, as the model successfully performs latent inference directly from the provided behavioral histories.

Technical Specifications

Compute Infrastructure

Hardware

The model was fine-tuned on a single NVIDIA L40S GPU (48GB VRAM).

Citation

BibTeX:

@article{schwager2026towards,
  title={Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction},
  author={Schwager, Nils and M{\"u}nker, Simon and Plum, Alistair and Rettinger, Achim},
  journal={arXiv preprint arXiv:2602.22752},
  year={2026}
}