You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Card for echo-Llama-3.1-8B-Instruct

This model performs Conditioned Comment Prediction (CCP), designed to act as a "silicon subject" for computational social science. It predicts how a specific social media user will respond to a given stimulus by utilizing both an explicit user profile and implicit behavioral history.

Model Details

Model Description

The model was fine-tuned using Supervised Fine-Tuning (SFT) to optimize its ability to generate high-fidelity, user-specific replies to online content. It isolates the capability of response generation, prioritizing operational validity over surface-level plausibility by benchmarking against authentic digital traces.

  • Developed by: Nils Schwager, Simon Münker, Alistair Plum, Achim Rettinger (Trier University, University of Luxembourg)
  • Funded by: EU's Horizon Europe Framework (HORIZON-CL2-2022-DEMOCRACY-01-07)
  • Model type: Autoregressive Large Language Model (Instruction-tuned)
  • Language(s) (NLP): English (en)
  • License: llama3.1
  • Finetuned from model: meta-llama/Llama-3.1-8B-Instruct

Model Sources

Uses

Direct Use

The model is built for computational social scientists and researchers modeling discourse dynamics and individual-level behavioral patterns. It is strictly optimized for predicting first-order text replies to textual stimuli (such as social media posts or news articles) given a specific user context.

Out-of-Scope Use

The model cannot process multi-modal inputs (e.g., URLs, images, GIFs) or simulate non-verbal interactions (e.g., liking behavior). It is not intended for generating coordinated inauthentic behavior, micro-targeted disinformation, or impersonating individuals for deceptive purposes.

Bias, Risks, and Limitations

  • Privacy: The training utilized authentic, public digital traces from real X (Twitter) users. These individuals did not provide explicit informed consent for their communication patterns to be replicated by generative models.
  • Dual-Use Risks: The framework demonstrated here enables the generation of synthetic content that mimics individual communication patterns with measurable fidelity. Malicious actors could exploit this for sophisticated bot campaigns or targeted persuasion.
  • Semantic Boundaries: While SFT aligns the surface-level structure of the text output, the model's semantic grounding remains bounded by the representational capacity of the 8B parameter class.

How to Get Started with the Model

The model requires a combined explicit and implicit conditioning format. Structure your prompt as a native chat sequence containing the generated user profile (system prompt) followed by up to 29 historical interactions.

System: [Insert Generated Biography]
User: Comment on the following content: [Historical Stimulus 1]
Assistant: [Authentic Historical Reply 1]
...
User: Comment on the following content: [New Target Stimulus]
Assistant:

Training Details

Training Data

The model was fine-tuned on a corpus of English X (Twitter) discourse collected up to August 2023.

  • Sample Size: 3,800 politically active users.
  • Context Limit: Up to 30 stimulus-response interactions per user.
  • Exclusions: Interactions containing media, and users with fewer than four historical replies.

Training Procedure

Training Hyperparameters

  • Training regime: 8-bit quantization, Paged AdamW optimizer.
  • Epochs: 1
  • Max sequence length: 4,500 tokens
  • Training paradigm: SFT on complete input sequences (system prompt, user prompts, and model completions) using TRL defaults.

Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluations were conducted on a deterministic, held-out test split of 650 English users to prevent cross-user data leakage.

Metrics

The evaluation utilized automated metrics across five independent generation runs (decoding temperature 0.75, max new tokens 500) to capture both lexical overlap and semantic alignment.

  • BLEU & ROUGE-1: Measures precision-oriented n-gram overlap and unigram overlap.
  • Length Ratio: Quantifies output volume alignment against the authentic reference length.
  • Embedding Distance: Assesses semantic intent alignment via cosine distance using Qwen3-Embedding-8B.

Results

  • BLEU: 0.083
  • ROUGE-1: 0.229
  • Length Ratio: 0.961
  • Embedding Distance: 0.397

Observation: Explicit biography conditioning becomes largely redundant post-fine-tuning, as the model successfully performs latent inference directly from the provided behavioral histories.

Technical Specifications

Compute Infrastructure

Hardware

The model was fine-tuned on a single NVIDIA L40S GPU (48GB VRAM).

Citation

BibTeX:

@article{schwager2026towards,
  title={Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction},
  author={Schwager, Nils and M{\"u}nker, Simon and Plum, Alistair and Rettinger, Achim},
  journal={arXiv preprint arXiv:2602.22752},
  year={2026}
}
Downloads last month
19
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nsschw/echo-Llama-3.1-8B-Instruct-eng

Finetuned
(2448)
this model

Collection including nsschw/echo-Llama-3.1-8B-Instruct-eng

Paper for nsschw/echo-Llama-3.1-8B-Instruct-eng