Jump to content

Talk:Pliciloricus

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Q4

[edit]
# Q4 - Assessing the impact (15 points)
# To assess the impact from air quality, we will explore the aggregated data 
# from Behavioral Risk Factor Surveillance System (BRFSS) in brfss_06_20.json.
# The structure of the data is:
# yearmetric name{state code: average days out of 30 across all individuals}
# Please wrangle this data into a data frame, called brfss_df, where each row is a state and year combination, 
# and the columns are year, state_code, and every possible health metric that is in the data. 
# Here are the definitions for the metrics:
# "energetic_days": "How many days full of energy in past 30 days"
# "bad_mental_health_days": "Now thinking about your mental health,
#  which includes stress, depression, and problems with emotions, for how many days during the past 30 days was your mental health not good?"
# "bad_physical_health_days": "Now thinking about your physical health, which includes physical illness and injury, 
# for how many days during the past 30 days was your physical health not good?"
# Please report:
# the number of rows and columns in brfss_dfthe average for each health metric across all states and all years in brfss_df
# make sure that the averages are sensible, different years may have different metrics or different ways of handling the data
# Using negative values when the metric must be non-negative is an old way of encoding missing values.

import json
import pandas as pd
import numpy as np

with open('brfss_06_20.json', 'r') as f:
    brfss_data = json.load(f)

data_list = []

for year, metrics in brfss_data.items():
    for metric, state_data in metrics.items():
        for state_code, value in state_data.items():
            data_list.append({
                'year': int(year),
                'state_code': state_code,
                metric: float(value) if value >= 0 else np.nan
            })

brfss_df = pd.DataFrame(data_list)

brfss_df = brfss_df.pivot_table(
    index=['year', 'state_code'],
    columns='variable',
    values=['energetic_days', 'bad_mental_health_days', 'bad_physical_health_days'],
    aggfunc='first'
).reset_index()

brfss_df.columns = [' '.join(col).strip() for col in brfss_df.columns.values]

print(f"#rows in brfss_df: {brfss_df.shape[0]}, columns: {brfss_df.shape[1]}")

for metric in ['energetic_days', 'bad_mental_health_days', 'bad_physical_health_days']:
    avg = brfss_df[metric].mean()
    print(f"Average {metric}: {avg:.2f}")

这段代码没有经过调试,因为缺brfss_06_20.json,如果没有发给我的话请自行调试 160.30.98.69 (talk) 01:14, 7 December 2024 (UTC)[reply]