dotoking commited on
Commit
a7fedd9
·
verified ·
1 Parent(s): 7b06c22

Upload 5 files

Browse files
Files changed (5) hide show
  1. README.md +99 -12
  2. app.py +49 -0
  3. cear_model.py +69 -0
  4. platform_weights.json +9 -0
  5. requirements.txt +7 -0
README.md CHANGED
@@ -1,12 +1,99 @@
1
- ---
2
- title: CEAR
3
- emoji: 📚
4
- colorFrom: indigo
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 6.0.2
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Cultural Exposure and Algorithmic Risk Model
3
+ emoji: "🧭"
4
+ colorFrom: "blue"
5
+ colorTo: "green"
6
+ sdk: gradio # THIS IS THE CRITICAL LINE
7
+ app_file: app.py
8
+ ---
9
+ # Cultural Exposure & Algorithmic Risk (CEAR) Baseline v1.0
10
+
11
+ ## Model Description
12
+
13
+ The **Cultural Exposure & Algorithmic Risk (CEAR) Model** is an **analytic, rule-based scoring system** designed to help users and researchers interpret social media usage in terms of its potential impact on cultural awareness and algorithmic vulnerability.
14
+
15
+ This version is a V1 Baseline: it is **deterministic** (theory-driven by fixed rules and weights) and does not rely on supervised machine learning or proprietary user data.
16
+
17
+ ### 🎯 Key Outputs
18
+
19
+ 1. **Cultural Connectedness Score (C-Score):** Estimates exposure to viral and trending content, modeled with diminishing returns on time.
20
+ 2. **Algorithmic Risk Score (A-Risk):** Quantifies vulnerability incurred from concentrated time on high-intensity, opaque algorithmic feeds.
21
+ 3. **Platform Diversity Index (D-Index):** Measures the concentration/spread of usage across platforms (using $1/\text{HHI}$).
22
+ 4. **Cultural Efficiency:** Per-platform estimates of C-Score gained per minute spent.
23
+
24
+ ## ⚙️ Analytic Basis & Scoring Logic
25
+
26
+ The model is defined by transparent assumptions encoded in the Python code (`cear_model.py`) and the platform weights (`platform_weights.json`).
27
+
28
+ ### Core Formulas
29
+
30
+ The key to the C-Score is the **Diminishing Returns Function** ($f_{DR}$), which prevents the C-Score from increasing linearly with time, acknowledging that the first hour is likely more valuable than the tenth.
31
+
32
+ $$f_{DR}(\text{Min}) = \log_{10}(\text{Min} + 1)$$
33
+
34
+ The final scores are calculated as:
35
+
36
+ $$C_{Score} = \sum_{i} \left[ W_{C,i} \times f_{DR}(\text{Min}_i) \right]$$
37
+
38
+ $$A_{Risk} = \sum_{i} \left[ W_{A,i} \times \text{Min}_i \right]$$
39
+
40
+ *(Where $W_{C}$ is the Trend Density Weight and $W_{A}$ is the Algorithmic Risk Weight, defined in `platform_weights.json`.)*
41
+
42
+ ## 🚀 Deployment & Usage (Hugging Face Space)
43
+
44
+ This repository contains the core logic (`cear_model.py`) and the application interface (`app.py`) for a Hugging Face Space.
45
+
46
+ ### Model Integration (The Engine)
47
+
48
+ The core logic can be imported and run in any environment:
49
+
50
+ ```python
51
+ import pandas as pd
52
+ from cear_model import CEARModel
53
+
54
+ # Example Input Data
55
+ user_data = pd.DataFrame([
56
+ {'platform_name': 'TikTok', 'minutes_per_week': 450},
57
+ {'platform_name': 'YouTube', 'minutes_per_week': 200},
58
+ {'platform_name': 'Reddit', 'minutes_per_week': 50},
59
+ ])
60
+
61
+ model = CEARModel()
62
+ results = model.calculate_scores(user_data)
63
+ # {'C_Score': 3.75, 'A_Risk': 565.0, ...}
64
+
65
+ # Application Interface (The App - app.py)
66
+
67
+ The app.py script uses the Gradio library to create an interactive web interface. It handles:
68
+
69
+ Collecting user input via a table component.
70
+
71
+ Calling the CEARModel.calculate_scores() method.
72
+
73
+ Generating a qualitative natural language summary based on the quadrant of the C-Score and A-Risk (e.g., "High C, Low A").
74
+
75
+ ⚠️ Limitations and Ethical Considerations
76
+
77
+ 1. Theoretical, Not Validated: The scores are based on fixed, theoretical assumptions about platform design. They are not calibrated against real-world user survey data or outcomes (e.g., actual cultural literacy, actual regret). Scores are relative estimates only.
78
+
79
+ 2. No Content Analysis: The model only uses time and platform. It cannot distinguish between a productive hour watching educational content and an unproductive hour scrolling low-quality content.
80
+
81
+ 3. Future Work: This deterministic model serves as a foundation. Future versions are intended to use the same input schema to train supervised machine learning models that directly predict outcomes (e.g., predicting user-reported "felt caught up" or "post-scroll regret").
82
+
83
+
84
+ ---
85
+
86
+ ## 2. `requirements.txt` (For Deployment)
87
+
88
+ This file lists the necessary Python packages for the Gradio Space to run your model and interface correctly.
89
+
90
+ ```text
91
+ # requirements.txt
92
+
93
+ # Core Model Dependencies
94
+ pandas
95
+ numpy
96
+
97
+ # Gradio Space Dependencies
98
+ # Gradio is used to build the simple web application interface (app.py)
99
+ gradio
app.py ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # app.py (Simplified Gradio code)
2
+
3
+ import gradio as gr
4
+ from cear_model import CEARModel
5
+ import pandas as pd
6
+ # ... (include logic to load PLATFORM_WEIGHTS)
7
+
8
+ # Instantiate the model globally
9
+ cear_analyzer = CEARModel()
10
+
11
+ def analyze_user_data(input_table):
12
+ # 1. Convert Gradio input (list of lists) to DataFrame
13
+ user_data_df = pd.DataFrame(input_table, columns=['platform_name', 'minutes_per_week'])
14
+ user_data_df['minutes_per_week'] = pd.to_numeric(user_data_df['minutes_per_week'], errors='coerce').fillna(0)
15
+
16
+ # 2. Call the core model
17
+ raw_scores = cear_analyzer.calculate_scores(user_data_df)
18
+
19
+ # 3. Format output for the user (The "App" layer)
20
+ summary = f"""
21
+ ## 📊 Analysis Summary
22
+ - **Cultural Connectedness Score (C-Score):** **{raw_scores['C_Score']:.2f}**
23
+ - **Algorithmic Risk Score (A-Risk):** **{raw_scores['A_Risk']:.2f}**
24
+ - **Platform Diversity Index (D-Index):** **{raw_scores['D_Index']:.2f}**
25
+ ---
26
+ ### 📝 Interpretation
27
+ *Your C-Score is based on logarithmically scaled time, reflecting diminishing returns. Your A-Risk is based on raw time, reflecting concentrated attention.*
28
+ """
29
+
30
+ # Return the formatted string and potentially a table of efficiency
31
+ return summary, pd.DataFrame(raw_scores['Per_Platform_Efficiency'])
32
+
33
+ # Define the Gradio interface
34
+ iface = gr.Interface(
35
+ fn=analyze_user_data,
36
+ inputs=gr.Dataframe(
37
+ headers=['platform_name', 'minutes_per_week'],
38
+ row_count=5,
39
+ col_count=(2, 'fixed'),
40
+ label="Weekly Screen Time Input (Source data from OS Tracker)"
41
+ ),
42
+ outputs=[
43
+ gr.Markdown(label="Score Results"),
44
+ gr.Dataframe(label="Per-Platform Cultural Efficiency")
45
+ ],
46
+ title="CEAR Baseline: Cultural Exposure & Algorithmic Risk Analyzer"
47
+ )
48
+
49
+ iface.launch()
cear_model.py ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # cear_model.py
2
+ import numpy as np
3
+ import pandas as pd
4
+ import json
5
+ import os # Necessary for finding the JSON file
6
+
7
+ # --- 1. Load PLATFORM_WEIGHTS variable from JSON ---
8
+ PLATFORM_WEIGHTS = {} # Default value
9
+
10
+ try:
11
+ # Get the directory of the current script (cear_model.py)
12
+ script_dir = os.path.dirname(os.path.abspath(__file__))
13
+ json_path = os.path.join(script_dir, 'platform_weights.json')
14
+
15
+ with open(json_path, 'r') as f:
16
+ # Load the configuration data into the global variable
17
+ PLATFORM_WEIGHTS = json.load(f)
18
+
19
+ except FileNotFoundError:
20
+ # This warning is useful for debugging if the file is missing
21
+ print("FATAL ERROR: platform_weights.json not found! Using empty weights.")
22
+ # The default empty {} dict is used if the file is missing
23
+
24
+ # --- 2. Define the Model Class ---
25
+ # The class can now safely reference the global PLATFORM_WEIGHTS variable
26
+ class CEARModel:
27
+ def __init__(self, weights=PLATFORM_WEIGHTS):
28
+ # The weights dictionary is passed as a default parameter
29
+ self.weights = weights
30
+
31
+ def _diminishing_returns(self, minutes):
32
+ # ... your method code ...
33
+ return np.log10(minutes + 1)
34
+ def calculate_scores(self, user_input_df: pd.DataFrame):
35
+ # 1. Merge weights with user input
36
+ df = user_input_df.merge(
37
+ pd.DataFrame.from_dict(self.weights, orient='index'),
38
+ left_on='platform_name',
39
+ right_index=True,
40
+ how='left'
41
+ ).fillna(0) # Fills missing weights with 0 for platforms not in list
42
+
43
+ total_mins = df['minutes_per_week'].sum()
44
+
45
+ # 2. Calculate Core Scores
46
+ df['C_Contrib'] = df.apply(lambda row: row['W_C'] * self._diminishing_returns(row['minutes_per_week']), axis=1)
47
+ df['A_Contrib'] = df.apply(lambda row: row['W_A'] * row['minutes_per_week'], axis=1)
48
+
49
+ C_Score = df['C_Contrib'].sum()
50
+ A_Risk = df['A_Contrib'].sum()
51
+
52
+ # 3. Calculate D-Index (Platform Diversity)
53
+ df['Min_Share'] = df['minutes_per_week'] / total_mins
54
+ D_Index = 1 / (df['Min_Share']**2).sum() if total_mins > 0 else 0
55
+
56
+ # 4. Calculate Cultural Efficiency
57
+ df['Cultural_Efficiency'] = df['C_Contrib'] / df['minutes_per_week'].replace(0, np.nan) # Avoid div by zero
58
+
59
+ return {
60
+ "C_Score": C_Score,
61
+ "A_Risk": A_Risk,
62
+ "D_Index": D_Index,
63
+ "Per_Platform_Efficiency": df[['platform_name', 'Cultural_Efficiency']].dropna().to_dict('records')
64
+ }
65
+
66
+ # Example Usage:
67
+ # user_data = pd.DataFrame([{'platform_name': 'TikTok', 'minutes_per_week': 300}, ...])
68
+ # model = CEARModel()
69
+ # model.calculate_scores(user_data)
platform_weights.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "TikTok": {"W_C": 0.95, "W_A": 0.90},
3
+ "Instagram": {"W_C": 0.85, "W_A": 0.85},
4
+ "YouTube": {"W_C": 0.70, "W_A": 0.75},
5
+ "X/Twitter": {"W_C": 0.80, "W_A": 0.70},
6
+ "Facebook": {"W_C": 0.50, "W_A": 0.60},
7
+ "Reddit": {"W_C": 0.60, "W_A": 0.40},
8
+ "LinkedIn": {"W_C": 0.10, "W_A": 0.20}
9
+ }
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ # Core Model Dependencies
2
+ pandas
3
+ numpy
4
+
5
+ # Gradio Space Dependencies
6
+ # Gradio is used to build the simple web application interface (app.py)
7
+ gradio