Model Card for echo-Llama-3.1-8B-Instruct
This model performs Conditioned Comment Prediction (CCP), designed to act as a "silicon subject" for computational social science. It predicts how a specific social media user will respond to a given stimulus by utilizing both an explicit user profile and implicit behavioral history.
Model Details
Model Description
The model was fine-tuned using Supervised Fine-Tuning (SFT) to optimize its ability to generate high-fidelity, user-specific replies to online content. It isolates the capability of response generation, prioritizing operational validity over surface-level plausibility by benchmarking against authentic digital traces.
- Developed by: Nils Schwager, Simon Münker, Alistair Plum, Achim Rettinger (Trier University, University of Luxembourg)
- Funded by: EU's Horizon Europe Framework (HORIZON-CL2-2022-DEMOCRACY-01-07)
- Model type: Autoregressive Large Language Model (Instruction-tuned)
- Language(s) (NLP): English (en)
- License: llama3.1
- Finetuned from model: meta-llama/Llama-3.1-8B-Instruct
Model Sources
- Repository: https://github.com/nsschw/Conditioned-Comment-Prediction
- Paper: Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction (arXiv:2602.22752v1)
Uses
Direct Use
The model is built for computational social scientists and researchers modeling discourse dynamics and individual-level behavioral patterns. It is strictly optimized for predicting first-order text replies to textual stimuli (such as social media posts or news articles) given a specific user context.
Out-of-Scope Use
The model cannot process multi-modal inputs (e.g., URLs, images, GIFs) or simulate non-verbal interactions (e.g., liking behavior). It is not intended for generating coordinated inauthentic behavior, micro-targeted disinformation, or impersonating individuals for deceptive purposes.
Bias, Risks, and Limitations
- Privacy: The training utilized authentic, public digital traces from real X (Twitter) users. These individuals did not provide explicit informed consent for their communication patterns to be replicated by generative models.
- Dual-Use Risks: The framework demonstrated here enables the generation of synthetic content that mimics individual communication patterns with measurable fidelity. Malicious actors could exploit this for sophisticated bot campaigns or targeted persuasion.
- Semantic Boundaries: While SFT aligns the surface-level structure of the text output, the model's semantic grounding remains bounded by the representational capacity of the 8B parameter class.
How to Get Started with the Model
The model requires a combined explicit and implicit conditioning format. Structure your prompt as a native chat sequence containing the generated user profile (system prompt) followed by up to 29 historical interactions.
System: [Insert Generated Biography]
User: Comment on the following content: [Historical Stimulus 1]
Assistant: [Authentic Historical Reply 1]
...
User: Comment on the following content: [New Target Stimulus]
Assistant:
Training Details
Training Data
The model was fine-tuned on a corpus of English X (Twitter) discourse collected up to August 2023.
- Sample Size: 3,800 politically active users.
- Context Limit: Up to 30 stimulus-response interactions per user.
- Exclusions: Interactions containing media, and users with fewer than four historical replies.
Training Procedure
Training Hyperparameters
- Training regime: 8-bit quantization, Paged AdamW optimizer.
- Epochs: 1
- Max sequence length: 4,500 tokens
- Training paradigm: SFT on complete input sequences (system prompt, user prompts, and model completions) using TRL defaults.
Evaluation
Testing Data, Factors & Metrics
Testing Data
Evaluations were conducted on a deterministic, held-out test split of 650 English users to prevent cross-user data leakage.
Metrics
The evaluation utilized automated metrics across five independent generation runs (decoding temperature 0.75, max new tokens 500) to capture both lexical overlap and semantic alignment.
- BLEU & ROUGE-1: Measures precision-oriented n-gram overlap and unigram overlap.
- Length Ratio: Quantifies output volume alignment against the authentic reference length.
- Embedding Distance: Assesses semantic intent alignment via cosine distance using Qwen3-Embedding-8B.
Results
- BLEU: 0.083
- ROUGE-1: 0.229
- Length Ratio: 0.961
- Embedding Distance: 0.397
Observation: Explicit biography conditioning becomes largely redundant post-fine-tuning, as the model successfully performs latent inference directly from the provided behavioral histories.
Technical Specifications
Compute Infrastructure
Hardware
The model was fine-tuned on a single NVIDIA L40S GPU (48GB VRAM).
Citation
BibTeX:
@article{schwager2026towards,
title={Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction},
author={Schwager, Nils and M{\"u}nker, Simon and Plum, Alistair and Rettinger, Achim},
journal={arXiv preprint arXiv:2602.22752},
year={2026}
}
- Downloads last month
- 19
Model tree for nsschw/echo-Llama-3.1-8B-Instruct-eng
Base model
meta-llama/Llama-3.1-8B