Model at aggregated resolution


Simple modelling for each class

We model the counts of 25 \(m^2\) cells corresponding to a specific land class per 1 km^2 using generalized additive models (GAM) with Binomial distribution. This type of model is an extension of generalized linear models (GLM) with the advantage that it allows to include flexible non-linear functions of the covariates in the linear predictor. Non-linear functions of growing degree day, maximum annual temperature, absolute minimum temperature, soil moisture deficit, and soil moisture surplus, are used to model all classes with exception of urban. In addition, we also included non-linear functions of elevation and slope for arable and forest classes. The urban class is modelled using distance to major cities, population, and gross disposable household income.

Overview

The observed values are counts \(y_i = (y_{i1}, y_{i2}, \dots, y_{i(K+1)})\) of land cover for cell \(i\) and \(K+1\) categories. The associated random variable has a multinomial probability function:

\[\begin{equation} \text{Pr}(y_i) = \binom{n_i}{y_{i1}, \dots, y_{i(K+1)}} \pi_{i1}^{y_{i1}} \dots \pi_{i(K+1)}^{y_{i(K+1)}} \end{equation}\]

where \(\pi_{ij}\) is the probability that the land cover of a cell at location \(i\) is the land class \(j\).

An interpretable approach?

Let consider \(LS_i\) and \(LC_i\) random variables representing land suitability and land cover, respectively, for a cell inside \(i\).

\[\begin{equation} \pi_{ij} = \sum_{k=1}^{K+1}\text{Pr}(LS_{i} = k)\text{Pr}(LC_i = j\mid LS_{i} = k) \end{equation}\]

For flexibility, we can use an auxiliary variable \(Z_{ij}\) that defines of land suitability such as: \[\begin{equation} \text{Pr}(LS_{i} = j) = \text{Pr}(\max(Z_{i1}, \dots, Z_{i(K+1)}) = Z_{ij}). \end{equation}\] Notice that land suitability \(LS_i\) is deterministic given \(Z_{i} = (Z_{i1}, \dots Z_{i(K+1)})\). Then \(Z_{ij}\) should include a variety of terms that explain the uncertainty of land suitability: \[\begin{equation} Z_{ij} = f_{1j}(\text{elev}_i) + f_{2j}(\text{soil water avail}) + \dots + f_{pj}(\text{temperature indicators}) + W_{ij} + \epsilon_{ij}, \end{equation}\] where \(f_i\)’s and \(W_{ij}\)’s represent non-linear functions and spatial stochastic processes respectively. It could also include spatial correlation between categories if necessary.

The second term, \(\text{Pr}(LC_i = j\mid LS_{i} = k) = p_{ikj}\), can be think of elements of \((K+1)\times(K+1)\) transition matrices \(P_{i}\) between categories explained by human behaviour. Modelling this term might get complicated and make the model unidentifiable if no assumptions are made. In general, it can be modelled as: \[\begin{equation} \text{logit}(p_{ikj}) = g_{1kj}(\text{population}) + g_{2kj}(\text{distance to major cities}) + \dots \end{equation}\]

In order to make the model identifiable, we can make certain assumptions for \(P_{i}\). For example:

  • Letting \(P_i=I\) is the same of assuming that human behaviour has no influence on land cover, so it is the same as land suitability.
  • The next simple option is to assume that \(P_i=P\) meaning that the transition matrix does not depend of location \(i\).
  • We could add more flexibility by setting \(P_i = \alpha_iP + (1-\alpha_i)I\), where \(\alpha_i \in [0,1]\) is an standardization of distance to major cities. This implies that \(P_i\) shrinks to \(I\) at greater distances.

A simpler approach

In order to include a variety of flexible terms. We can use a latent variable \(Z_{ij}\) that defines the probability \(\pi_{ij}\) such as: \[\begin{equation} \pi_{ij} = \text{Pr}(\max(Z_{i1}, \dots, Z_{i(K+1)}) = Z_{ij}) \end{equation}\]

Then \(Z_{ij}\) can include a varity of terms without restriction (at least theoretically).

\[\begin{equation} Z_{ij} = LS_{ij} + HB_i \end{equation}\]

Land suitability is defined by environmental variables and some spatial structure:

\[\begin{equation} LS_{ij} = f_1(\text{elevation}_i) + f_2(\text{soil water availability}) + \dots + f_p(\text{temperature indicators}) + W_{ij} \end{equation}\]

\[\begin{equation} HB_{i} = g_1(\text{population}) + g_2(\text{distance to mayor cities}) + \dots \end{equation}\]