I have conducted a data analysis focusing on the relationship between diabetes and inactivity. Initially, I analysed the data points using Microsoft Excel for the basic understanding and found that there are common data points among diabetes, inactivity and obesity. I found that FIPS data points of inactivity is a subset of FIPS data points of diabetes.
• After going through the pdf “CDC Diabetes 2018”, Observed the basic metric of evaluation such as mean, median, skewness, standard deviation for the data points.I understood that there is a slight skewness, with a kurtosis of about 4 for %diabetes. Also we can observe the deviation in normality from the quantile plot.
• Similarly, for %inactivity, the skewness is in other direction, with a kurtosis less than the kurtosis of normal distribution which is 3.
• From the scatter plot between diabetes and inactivity common data pairs and linear model that is fit for the data points, 20% (approx.) of the variation in diabetes can be interpreted for variation in inactivity.
• I understood that there is a deviation from the normality for the residuals of the data points from the linear model which resulted in Heteroscedasticity.
• The points in the plot between the residuals and the predicted values in the linear model shows the fanning out of the residuals which says that the linear model is not a suitable model.
However, I’m still very enthusiastic to learn all the above stats using python and I’m trying to do the same.