Results of model fitting to the average daily fat content data from @Henderson1990-bd. a) observed average daily fat content (points) and estimated lactation curves from Wood's [-@Wood1967-re] model, a Tweedie GLM, and a Tweedie GAM (lines) with associated 95% confidence (Wood's model) or 95% credible intervals (GLM and GAM). Response residuals for Wood's model (b), Tweedie GLM (c), and Tweedie GAM (d), plus scatter plot smoothers (lines) and 95% credible intervals (shaded ribbons).
The fitted lactation curves are like an inverted U, with an extended longer tail to the right (later in lactation). The GAM curve fits the data well, but the fitted curves from Wood's model and the GLM equivalent do not provide good fits to the data, and over predict the amount of fat produced at the peak of lactation, and only grossly capture the decline in fat production later in lactation. The remaining panels show the raw response residuals for the three models, drawing attention to the poor fit; for Wood's model and the GLM there is significant pattern in the residuals, while for the GAM no residual pattern is observed.
Quantities of interest derived from Wood's model, a Tweedie GLM, and a Tweedie GAM fitted to the lactation data example: a) the estimated week of peak average daily fat content, b) the estimated average daily fat content at the peak, and c) the rate of change (first derivative) of the lactation curve estimated at a point that is midway between the peak fat content and the end of lacation. The points are the estimated values and the lines are a 95% uncertainty interval. The uncertainty interval is based on the 0.025 and 0.975 percentiles of the bootstrap distribution of model coefficient estimates (Wood's model) or of the posterior distribution (GLM and GAM).
Each panel shows three point estimates and an uncertainty range. The three points are the estimates from a GAM, a GLM, and Wood's lactation model. The first panel shows the estimated timing of the peak of lactation, with the GAM capturing the fact that the peak in the data occurs much later in lactation (~ week 11) while the other two models confidently estimate that the peak is in week ~8-9. The GAM estimate has a much wider credible interval, which does include the estimates of Wood's model & the GLM at the extreme end. This reflects the uncertainty in the estimation of the peak timing arising from the data having a wide flat peak.
The other panels show the estimates of fat content at the peak, which are broadly similar at ~ 0.7 kg fat per day. The final panel showing the persistency estimate shows the GAM estimate diverging from those of the GLM and Wood's model. Again, the latter two models are overly confident in their estimation of this biologically relevant parameter, despite the fited lactation curve not really following the lactation data.
a) Estimated daily growth rate on November 15^th^, 2021 and 95% Bayesian credible interval for the 18 pigs in the pig growth example. b) Posterior distribution of daily growth rate on November 15^th^, 2021, for three pigs (numbers 2, 13, and 17), for whom weight observations ceased before November 1^st^, 2021. In b), the shaded region is the posterior distribution, the point, and thick and thin bars are the posterior median, and 66% and 95% posterior intervals respectively.
With the fitted growth curves, we can estimate for any day what the growth rate of each pig was. In this figure I'm showing the estimated growth rate of each pig in the example on November 21st. This growth rate is the first derivative of the fitted growth curve (smooth function). I used posterior sampling to produce the posterior distribution of the growth rate for each pig. These are summarised as a point estimate (median) and ccredible interval in the first panel with most pigs growin at ~1-1.5 kg per day by November 21st, with uncertainties on the order of +/- 0.5 kg per day.
The second panel shows the entire posterior distribution of the estimated growth rate for three pigs (2, 13, and 17) for whom there were no weight estimates after November 1st. Here, the model is drawing power from the other pigs to help extrapolate the growth curves for these three pigs, but pig-specific details remain, with the posterior distribution for pig 17 being much more diffuse (wider) than for either pigs 2 or 13, reflecting greater uncertainty for the former animal.
Just updated my manuscript on using #GAMs in #AnimalScience, now on arXiv: doi.org/10.48550/arX...
🐄🐖🪶
Extended examples now show how GAMs go beyond prediction, helping estimate biologically meaningful traits from data.
Code: github.com/gavinsimpson...
🧪 #RStats #mgcv #Statistics #OpenScience