You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

echo-Llama-3.1-8B-Instruct-de

This model performs Conditioned Comment Prediction (CCP), designed to act as a "silicon subject" for computational social science. It predicts how a specific social media user will respond to a given stimulus by utilizing both an explicit user profile and implicit behavioral history.

Model Details

Model Description

The model was fine-tuned using Supervised Fine-Tuning (SFT) to optimize its ability to generate high-fidelity, user-specific replies to online content. It isolates the capability of response generation, prioritizing operational validity over surface-level plausibility by benchmarking against authentic digital traces.

  • Developed by: Nils Schwager, Simon Münker, Alistair Plum, Achim Rettinger (Trier University, University of Luxembourg)
  • Funded by: EU's Horizon Europe Framework (HORIZON-CL2-2022-DEMOCRACY-01-07) under grant agreement number 101095095
  • Model type: Autoregressive Large Language Model (Instruction-tuned)
  • Language(s) (NLP): German (de)
  • License: llama3.1
  • Finetuned from model: meta-llama/Llama-3.1-8B-Instruct

Model Sources

Uses

Direct Use

The model is built for computational social scientists and researchers modeling discourse dynamics and individual-level behavioral patterns. It is strictly optimized for predicting first-order text replies to textual stimuli (such as social media posts or news articles) given a specific user context.

Out-of-Scope Use

The model cannot process multi-modal inputs (e.g., URLs, images, GIFs) or simulate non-verbal interactions (e.g., liking behavior). It is not intended for generating coordinated inauthentic behavior, micro-targeted disinformation, or impersonating individuals for deceptive purposes.

Bias, Risks, and Limitations

  • Privacy: The training utilized authentic, public digital traces from real X users. These individuals did not provide explicit informed consent for their communication patterns to be replicated by generative models.
  • Dual-Use Risks: The framework demonstrated here enables the generation of synthetic content that mimics individual communication patterns with measurable fidelity. Malicious actors could exploit this for sophisticated bot campaigns or targeted persuasion.
  • Semantic Boundaries: SFT in this linguistic environment primarily serves as a tool for formatting control rather than semantic enhancement. While SFT refines style and surface-level structure, the model's semantic grounding struggles to deepen beyond the base model's capabilities.

How to Get Started with the Model

The model requires a combined explicit and implicit conditioning format. Structure your prompt as a native chat sequence containing the generated user profile (system prompt) followed by up to 29 historical interactions.

System: [Insert Generated Biography]
User: Comment on the following content: [Historical Stimulus 1]
Assistant: [Authentic Historical Reply 1]
...
User: Comment on the following content: [New Target Stimulus]
Assistant:

Training Details

Training Data

The model was fine-tuned on a corpus of German X data collected around keywords related to German political discourse during the first half of 2023.

  • Sample Size: 3,800 users sampled from a raw corpus of 3.38M tweets.
  • Context Limit: Up to 30 stimulus-response interactions per user.
  • Exclusions: Interactions containing URLs, images, or GIFs, and users with fewer than four historical replies.

Training Procedure

Training Hyperparameters

  • Training regime: 8-bit quantization, Paged AdamW optimizer.
  • Epochs: 1
  • Max sequence length: 4,500 tokens
  • Training paradigm: SFT on complete input sequences (system prompt, user prompts, and model completions) using TRL defaults.

Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluations were conducted on a deterministic, held-out test split of 650 German users to prevent cross-user data leakage.

Metrics

The evaluation utilized automated metrics across five independent generation runs (decoding temperature 0.75, max new tokens 500) to capture both lexical overlap and semantic alignment.

  • BLEU & ROUGE-1: Measures precision-oriented n-gram overlap and unigram overlap.
  • Length Ratio: Quantifies output volume alignment against the authentic reference length.
  • Embedding Distance: Assesses semantic intent alignment via cosine distance using Qwen3-Embedding-8B.

Results

  • BLEU: 0.095
  • ROUGE-1: 0.192
  • Length Ratio: 0.915
  • Embedding Distance: 0.504

Observation: A critical limitation emerges in the German language scenario: while SFT successfully improves lexical metrics (BLEU 0.065 to 0.095), the semantic alignment remains stagnant (embedding distance ~0.50). This indicates that SFT refines style but struggles to deepen semantic grounding beyond the base model's inherent capabilities.

Technical Specifications

Compute Infrastructure

Hardware

The model was fine-tuned on a single NVIDIA L40S GPU (48GB VRAM).

Citation

BibTeX: @article{schwager2026towards, title={Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction}, author={Schwager, Nils and M{"u}nker, Simon and Plum, Alistair and Rettinger, Achim}, journal={arXiv preprint arXiv:2602.22752}, year={2026} }

Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nsschw/echo-Llama-3.1-8B-Instruct-ger

Finetuned
(2448)
this model

Collection including nsschw/echo-Llama-3.1-8B-Instruct-ger

Paper for nsschw/echo-Llama-3.1-8B-Instruct-ger