In this data-driven analysis, we explore the relationship between obesity, physical inactivity, and diabetes rates across various counties in the United States. Our primary goal is to perform Decision Tree regression model. We have split the data into training and testing sets, with 80% used for training and 20% for testing. We chose to focus our analysis on predicting %diabetic based on %obese and %inactive.Our analysis employed a Decision Tree regression model, a powerful tool for understanding how different variables influence a target variable. The Decision Tree was trained on the training data, and its performance was evaluated using Mean Squared Error (MSE) and R-squared (R2) metrics.
Mean Squared Error (MSE):
The MSE is a measure of the average squared difference between the actual values and the predicted values. In our case, an MSE of 0.71 suggests that, on average, the model’s predictions have a squared error of 0.71. This means that the model’s predictions deviate from the actual values by a relatively small amount, which is generally a positive sign.However, the interpretation of MSE values depends on the specific scale and context of the target variable.
R-squared (R2):
An R2 score of -0.08 indicates that the model does not explain much of the variance in %diabetic. In fact, it has a negative R2 score, which suggests that the model performs worse than a horizontal line (a constant prediction).A negative R2 score could indicate that the model doesn’t fit the data well and may not be a good choice for predicting %diabetic based solely on %obese and %inactive.
From the results, we see the Decision Tree model trained did not perform well in explaining the variance in %diabetic using %obese and %inactive as predictors. The negative R2 score indicates that the model’s predictions are bad which might not capture the underlying patterns.