echo-Llama-3.1-8B-Instruct-de
This model performs Conditioned Comment Prediction (CCP), designed to act as a "silicon subject" for computational social science. It predicts how a specific social media user will respond to a given stimulus by utilizing both an explicit user profile and implicit behavioral history.
Model Details
Model Description
The model was fine-tuned using Supervised Fine-Tuning (SFT) to optimize its ability to generate high-fidelity, user-specific replies to online content. It isolates the capability of response generation, prioritizing operational validity over surface-level plausibility by benchmarking against authentic digital traces.
- Developed by: Nils Schwager, Simon Münker, Alistair Plum, Achim Rettinger (Trier University, University of Luxembourg)
- Funded by: EU's Horizon Europe Framework (HORIZON-CL2-2022-DEMOCRACY-01-07) under grant agreement number 101095095
- Model type: Autoregressive Large Language Model (Instruction-tuned)
- Language(s) (NLP): German (de)
- License: llama3.1
- Finetuned from model: meta-llama/Llama-3.1-8B-Instruct
Model Sources
- Repository: https://github.com/nsschw/Conditioned-Comment-Prediction
- Paper: Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction
Uses
Direct Use
The model is built for computational social scientists and researchers modeling discourse dynamics and individual-level behavioral patterns. It is strictly optimized for predicting first-order text replies to textual stimuli (such as social media posts or news articles) given a specific user context.
Out-of-Scope Use
The model cannot process multi-modal inputs (e.g., URLs, images, GIFs) or simulate non-verbal interactions (e.g., liking behavior). It is not intended for generating coordinated inauthentic behavior, micro-targeted disinformation, or impersonating individuals for deceptive purposes.
Bias, Risks, and Limitations
- Privacy: The training utilized authentic, public digital traces from real X users. These individuals did not provide explicit informed consent for their communication patterns to be replicated by generative models.
- Dual-Use Risks: The framework demonstrated here enables the generation of synthetic content that mimics individual communication patterns with measurable fidelity. Malicious actors could exploit this for sophisticated bot campaigns or targeted persuasion.
- Semantic Boundaries: SFT in this linguistic environment primarily serves as a tool for formatting control rather than semantic enhancement. While SFT refines style and surface-level structure, the model's semantic grounding struggles to deepen beyond the base model's capabilities.
How to Get Started with the Model
The model requires a combined explicit and implicit conditioning format. Structure your prompt as a native chat sequence containing the generated user profile (system prompt) followed by up to 29 historical interactions.
System: [Insert Generated Biography]
User: Comment on the following content: [Historical Stimulus 1]
Assistant: [Authentic Historical Reply 1]
...
User: Comment on the following content: [New Target Stimulus]
Assistant:
Training Details
Training Data
The model was fine-tuned on a corpus of German X data collected around keywords related to German political discourse during the first half of 2023.
- Sample Size: 3,800 users sampled from a raw corpus of 3.38M tweets.
- Context Limit: Up to 30 stimulus-response interactions per user.
- Exclusions: Interactions containing URLs, images, or GIFs, and users with fewer than four historical replies.
Training Procedure
Training Hyperparameters
- Training regime: 8-bit quantization, Paged AdamW optimizer.
- Epochs: 1
- Max sequence length: 4,500 tokens
- Training paradigm: SFT on complete input sequences (system prompt, user prompts, and model completions) using TRL defaults.
Evaluation
Testing Data, Factors & Metrics
Testing Data
Evaluations were conducted on a deterministic, held-out test split of 650 German users to prevent cross-user data leakage.
Metrics
The evaluation utilized automated metrics across five independent generation runs (decoding temperature 0.75, max new tokens 500) to capture both lexical overlap and semantic alignment.
- BLEU & ROUGE-1: Measures precision-oriented n-gram overlap and unigram overlap.
- Length Ratio: Quantifies output volume alignment against the authentic reference length.
- Embedding Distance: Assesses semantic intent alignment via cosine distance using Qwen3-Embedding-8B.
Results
- BLEU: 0.095
- ROUGE-1: 0.192
- Length Ratio: 0.915
- Embedding Distance: 0.504
Observation: A critical limitation emerges in the German language scenario: while SFT successfully improves lexical metrics (BLEU 0.065 to 0.095), the semantic alignment remains stagnant (embedding distance ~0.50). This indicates that SFT refines style but struggles to deepen semantic grounding beyond the base model's inherent capabilities.
Technical Specifications
Compute Infrastructure
Hardware
The model was fine-tuned on a single NVIDIA L40S GPU (48GB VRAM).
Citation
BibTeX: @article{schwager2026towards, title={Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction}, author={Schwager, Nils and M{"u}nker, Simon and Plum, Alistair and Rettinger, Achim}, journal={arXiv preprint arXiv:2602.22752}, year={2026} }
- Downloads last month
- -
Model tree for nsschw/echo-Llama-3.1-8B-Instruct-ger
Base model
meta-llama/Llama-3.1-8B