Chapter 4 Introduction {mod_intro}

Some significant applications are demonstrated in this chapter.

4.1 Example one

4.2 Generalized additive models

  • splines

    • A cubic spline is a piecewise polynomial with the property of being continuously differentiable until second order.
    • The number of knots act as an smoothing parameter in unpenalized splines.
    • Knots defined based on the quantiles of the data make the spline flexible in dense areas and less flexible in sparse areas, which is desirable.
    • In general, it is more appropiate to select more knots than expected and use a penalty term to control for smothness avoiding the need to select number of knots.
    • For l knots and degree r, the space of polynomial splines is a vector space with dimension equals to the number of free parameters. l1 polynomial functions of degree r have (r+1)(l1) parameters. The condition of r1 times continuously differentiable generate r constrains for all the l2 inside knots. Then, the number of free parameters is (r+1)(l1)r(l2)=r+l1.
    • Natural splines assumes that the curvature, the second derivative, at the first and last knot is zero. Then, a natural cubic spline will have l free parameters.

    Simon Wood 2016

    B-splines, whose construction from polynomial pieces gives them many attractive computational properties, as described by de Boor (1978). 2016 donnell

4.3 GAM

  • smoothing bases
    • natural cubic splines are smoothest interpolators
    • cubic smoothing splines
    • cubic regression splines
    • cyclic cubic regression spline
    • p-splines
    • thin plate regression splines
    • tensor products smooths
  • polynomial spline
  • cubic spline
    • on each data
  • regression spline
    • on knots evenly spaced or with quantiles
    • penalizing to avoid overfitting
    • for λ known, it is still a augmented linear model
    • select λ with ordinary cross validation
    • computational and invariante advante og generalized cross validation
    • Implementation
    • initialize lambda
    • given lambda, obtain beta
    • compute gcv, and interate with previous step
    • Identifiability, constrain one the rest of intercept parameters to zero.

4.4 GAM

  • penalized likelihood maximization
    • no simple trick to produce an unpenalized glm whose likelihood is equivalent to the penalized likelihood of the GAM
    • penalized iteratively re-weighted least squares
    • The suggestion of representing GAMs using spline like penalized regression smoothers was made in section 9.3.6 of Hastie and Tibshirani (1990)

4.5 Practice MGCV

4.5.1 Basics of gam model

  • When the relationship is almost linear df=1, the confidence interval are zero when the estimates is zero due to the identifiability constrain. This restriction sets the mean value of f to zero, such as there is no uncertainty when f=0.
  • The points on the smoothed effects are just the partial residual, which simple are the Pearson residuals plus the smooth function for the corresponding covariate being plotted.
  • Considering an initial model with k1 knots and df<k11; then, increasing the number of knot to k2, can modified the number of effective degrees of freedom. It happens because different subspace of functions are obtained when k=k1 or k+k2 for a particular df.
  • Smoother functions can be obtained introducing and additional parameter to the GCV score γ. For example, γ=1.4 is suggested to avoid overfitting without compromising model fit.

4.5.2 Smoothing several variables

  • We can use thin plate regression spline or tensor products.