Spaces:
Running
Running
A newer version of the Streamlit SDK is available:
1.49.1
test_config_utils.py
- Functions under test
load_config(path)
— reads settings from a YAML file.get_secret(key)
— retrieves a secret first fromos.environ
, then fromstreamlit.secrets
, else raises.
- Patching & mocking
- Environment variables via
os.environ
ormonkeypatch.setenv()
/monkeypatch.delenv()
. reddit_analysis.config_utils.HAS_STREAMLIT
toggled to simulate presence of Streamlit.streamlit.secrets
replaced with aMockSecrets
object exposing a.get(key)
method.
- Environment variables via
- Example inputs
- A temporary
config.yaml
with keys likerepo_id: test/repo
,batch_size: 16
,replicate_model: test/model
. - Secret key
"TEST_SECRET"
set inos.environ
or returned byMockSecrets.get()
. - Missing secret scenario triggers
ValueError("Required secret TEST_SECRET not found…")
.
- A temporary
test_scrape.py
- Methods under test
RedditScraper.get_posts(subreddit)
— calls PRAW client’s.subreddit(...).top()
and returns a DataFrame with columnspost_id, title, text, score, subreddit, created_utc, url, num_comments
.RedditScraper.upload_to_hf(df, date)
— downloads existing parquet viahf_hub_download
, deduplicates bypost_id
, then callshf_api.upload_file(...)
.main(date)
CLI — loads config, checks for Reddit credentials, raises if missing.
- Patching & mocking
- A fake PRAW client (
mock_reddit_client
) whose.subreddit().top()
yields twoMock
submissions (idspost0
,post1
). hf_hub_download
patched to return a path for a “previous” parquet file containingprev_df
.mock_hf_api.upload_file
to capture the uploaded parquet path.- Environment via
monkeypatch
andreddit_analysis.config_utils.HAS_STREAMLIT
+streamlit.secrets
.
- A fake PRAW client (
- Example inputs
get_posts
uses two submissions withid='post0'
,title='Test Post 0'
, etc., expecting a 2‑row DataFrame.upload_to_hf
combinesprev_df
(posts 0 & 1) withnew_df
(posts 1 & 2), resulting in onlypost1
&post2
uploaded.- CLI invoked with no Reddit env vars, raising
ValueError("Missing required Reddit API credentials")
.
test_summarize.py
- Methods under test
RedditSummarizer.summarize_date(date)
— downloads scored parquet, groups bysubreddit
, and computesmean_sentiment
,count
,total_score
,weighted_sentiment
, plusdate
.RedditSummarizer.update_summary(df)
— appends to or createssummary_file
, preserving chronological order.- CLI entrypoint in
main(date)
— validates date format or scored-file existence.
- Patching & mocking
hf_hub_download
patched to return a temp parquet containingsample_scored_data
(4 rows for two subreddits).reddit_analysis.config_utils.HAS_STREAMLIT
andstreamlit.secrets.get(...)
for missing-file tests.
- Example inputs & expectations
summarize_date
:
– Expect two summary rows:sample_scored_data = pd.DataFrame({ 'subreddit': ['test1','test1','test2','test2'], 'sentiment': [0.8,0.6,0.4,0.2], 'score': [10,20,30,40], … })
- test1:
mean_sentiment≈0.7
,count=2
,total_score=30
,weighted_sentiment≈0.6667
- test2:
mean_sentiment≈0.3
,count=2
,total_score=70
,weighted_sentiment≈0.2857
- test1:
update_summary
: merges an initial 2‑row file for2025-04-19
with a new 2‑row file for2025-04-20
, ending with 4 total rows.- CLI invalid date:
main('2025-04-20-invalid')
→ValueError("Invalid date format")
. - Missing scored file: patched
hf_hub_download
raises →ValueError("Failed to download scored file…")
.
test_score.py
- Class & functions under test
RedditScorer.score_date(date)
— downloads input parquet, asserts required columns (text, score, post_id, subreddit
), splits into batches, callsreplicate_client.run()
, injectssentiment
&confidence
, writes parquet, then callshf_api.upload_file()
.- CLI
main(date)
— reads.env
orstreamlit.secrets
, requiresREPLICATE_API_TOKEN
, else raises.
- Patching & mocking
hf_hub_download
patched to return a temp parquet for the “input” DataFrame.mock_hf_api
supplying a stubbedupload_file
method.mock_replicate_client.run
side‑effect that:texts = json.loads(input['texts']) sentiments = ['positive' if i%2==0 else 'negative' for i in range(len(texts))] confidences = [0.9 if i%2==0 else 0.8 for i in range(len(texts))]
reddit_analysis.config_utils.HAS_STREAMLIT
+streamlit.secrets.get(...)
for the CLI missing‑token test.
- Example inputs & expectations
test_score_date
: input DataFrame with two rows ('Test text 1'
,'Test text 2'
), expects uploaded parquet to havesentiment=['positive','negative']
,confidence=[0.9,0.8]
and all six columns present.test_score_date_missing_columns
: input missingpost_id
/subreddit
→ValueError("missing expected columns")
.test_score_date_batch_processing
: input of 5 texts,batch_size=2
→replicate_client.run
called 3 times, final uploaded file contains all 5 rows.test_cli_missing_token
: noREPLICATE_API_TOKEN
in env or secrets →ValueError("REPLICATE_API_TOKEN is required for scoring")
.