上QQ阅读APP看书，第一时间看更新

Checking homoscedasticity with residual plots

Homoscedasticity simply means that we need the data to have constant variance in our residuals. To check for it, we can use the plot(fit) function call. However, this will show one plot at a time asking you to hit Enter on your keyboard to show the next one. This kind of mechanism is not friendly to the automation processes we are creating. So we need a little adjustment. We will use the par(mfrow = c(2, 2)) call to tell the plot() function to graph all four plots at the same time and show it in a single image. We wrap the command around our already familiar mechanism to save PNGs around the fit_plot() function, and we're all set:

fit_plot <- function(fit, save_to = "") {
    if (not_empty(save_to)) png(save_to)
    par(mfrow = c(2, 2))
    plot(fit)
    if (not_empty(save_to)) dev.off()
}

With the fit_plot() function in place, we can show the regressions graphical results with the following:

fit_plot(fit)

The information we're looking for is in the plots on the left-hand side, where we see fitted values in the x axis and residuals in the y axis. In these plots, we are looking for residuals to be randomly distributed in a tubular pattern, indicated by the dotted lines. We do not want residuals with a pattern that looks similar to a fan or funnel or in any way curvilinear. As we can see, the pattern we see does resemble a tubular pattern, so we can say the assumption of homoscedasticity holds for the data. As an extra, you can also see, in the top-right quantile-quantile plot, that the residuals follow a normal distribution which is also good. The plot on the lower-right shows a statistics concept, which we won't go into, called Cook's distance, which is used to find influential observations in a regression. To read more about it, you may look at John Fox's, Regression Diagnostics, 1991.