Feature Selection:A careful selection of features from result was undertaken. Essential columns, including ‘% DIABETIC’, ‘% OBESE’, and ‘% INACTIVE’, were identified for their relevance to health indicators. This strategic feature selection ensured a focused and meaningful exploration of the datasets.
Statistical Analysis – ANOVA:To unravel the statistical relationships with categorical variables, the Analysis of Variance (ANOVA) function was applied. Categorical predictors such as FIPS, COUNTY, and STATE were scrutinized for their correlation with the target variable ‘% DIABETIC’. The ANOVA results, gauged through p-values, informed the selection of influential variables impacting diabetes rates.
Linear Regression Models:The research delved into linear regression models to discern nuanced relationships. Individual analyses were executed to gauge the influence of specific health indicators. Linear regression models were employed to understand the impact of ‘% INACTIVE’ and ‘% OBESE’ on ‘% DIABETIC’.