Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available:
1.53.0
test_config_utils.py
- Functions under test
load_config(path)— reads settings from a YAML file.get_secret(key)— retrieves a secret first fromos.environ, then fromstreamlit.secrets, else raises.
- Patching & mocking
- Environment variables via
os.environormonkeypatch.setenv()/monkeypatch.delenv(). reddit_analysis.config_utils.HAS_STREAMLITtoggled to simulate presence of Streamlit.streamlit.secretsreplaced with aMockSecretsobject exposing a.get(key)method.
- Environment variables via
- Example inputs
- A temporary
config.yamlwith keys likerepo_id: test/repo,batch_size: 16,replicate_model: test/model. - Secret key
"TEST_SECRET"set inos.environor returned byMockSecrets.get(). - Missing secret scenario triggers
ValueError("Required secret TEST_SECRET not found…").
- A temporary
test_scrape.py
- Methods under test
RedditScraper.get_posts(subreddit)— calls PRAW client’s.subreddit(...).top()and returns a DataFrame with columnspost_id, title, text, score, subreddit, created_utc, url, num_comments.RedditScraper.upload_to_hf(df, date)— downloads existing parquet viahf_hub_download, deduplicates bypost_id, then callshf_api.upload_file(...).main(date)CLI — loads config, checks for Reddit credentials, raises if missing.
- Patching & mocking
- A fake PRAW client (
mock_reddit_client) whose.subreddit().top()yields twoMocksubmissions (idspost0,post1). hf_hub_downloadpatched to return a path for a “previous” parquet file containingprev_df.mock_hf_api.upload_fileto capture the uploaded parquet path.- Environment via
monkeypatchandreddit_analysis.config_utils.HAS_STREAMLIT+streamlit.secrets.
- A fake PRAW client (
- Example inputs
get_postsuses two submissions withid='post0',title='Test Post 0', etc., expecting a 2‑row DataFrame.upload_to_hfcombinesprev_df(posts 0 & 1) withnew_df(posts 1 & 2), resulting in onlypost1&post2uploaded.- CLI invoked with no Reddit env vars, raising
ValueError("Missing required Reddit API credentials").
test_summarize.py
- Methods under test
RedditSummarizer.summarize_date(date)— downloads scored parquet, groups bysubreddit, and computesmean_sentiment,count,total_score,weighted_sentiment, plusdate.RedditSummarizer.update_summary(df)— appends to or createssummary_file, preserving chronological order.- CLI entrypoint in
main(date)— validates date format or scored-file existence.
- Patching & mocking
hf_hub_downloadpatched to return a temp parquet containingsample_scored_data(4 rows for two subreddits).reddit_analysis.config_utils.HAS_STREAMLITandstreamlit.secrets.get(...)for missing-file tests.
- Example inputs & expectations
summarize_date:– Expect two summary rows:sample_scored_data = pd.DataFrame({ 'subreddit': ['test1','test1','test2','test2'], 'sentiment': [0.8,0.6,0.4,0.2], 'score': [10,20,30,40], … })- test1:
mean_sentiment≈0.7,count=2,total_score=30,weighted_sentiment≈0.6667 - test2:
mean_sentiment≈0.3,count=2,total_score=70,weighted_sentiment≈0.2857
- test1:
update_summary: merges an initial 2‑row file for2025-04-19with a new 2‑row file for2025-04-20, ending with 4 total rows.- CLI invalid date:
main('2025-04-20-invalid')→ValueError("Invalid date format"). - Missing scored file: patched
hf_hub_downloadraises →ValueError("Failed to download scored file…").
test_score.py
- Class & functions under test
RedditScorer.score_date(date)— downloads input parquet, asserts required columns (text, score, post_id, subreddit), splits into batches, callsreplicate_client.run(), injectssentiment&confidence, writes parquet, then callshf_api.upload_file().- CLI
main(date)— reads.envorstreamlit.secrets, requiresREPLICATE_API_TOKEN, else raises.
- Patching & mocking
hf_hub_downloadpatched to return a temp parquet for the “input” DataFrame.mock_hf_apisupplying a stubbedupload_filemethod.mock_replicate_client.runside‑effect that:texts = json.loads(input['texts']) sentiments = ['positive' if i%2==0 else 'negative' for i in range(len(texts))] confidences = [0.9 if i%2==0 else 0.8 for i in range(len(texts))]reddit_analysis.config_utils.HAS_STREAMLIT+streamlit.secrets.get(...)for the CLI missing‑token test.
- Example inputs & expectations
test_score_date: input DataFrame with two rows ('Test text 1','Test text 2'), expects uploaded parquet to havesentiment=['positive','negative'],confidence=[0.9,0.8]and all six columns present.test_score_date_missing_columns: input missingpost_id/subreddit→ValueError("missing expected columns").test_score_date_batch_processing: input of 5 texts,batch_size=2→replicate_client.runcalled 3 times, final uploaded file contains all 5 rows.test_cli_missing_token: noREPLICATE_API_TOKENin env or secrets →ValueError("REPLICATE_API_TOKEN is required for scoring").