Model at aggregated resolution


Simple modelling for each class

We model the counts of 25 m2 cells corresponding to a specific land class per 1 km^2 using generalized additive models (GAM) with Binomial distribution. This type of model is an extension of generalized linear models (GLM) with the advantage that it allows to include flexible non-linear functions of the covariates in the linear predictor. Non-linear functions of growing degree day, maximum annual temperature, absolute minimum temperature, soil moisture deficit, and soil moisture surplus, are used to model all classes with exception of urban. In addition, we also included non-linear functions of elevation and slope for arable and forest classes. The urban class is modelled using distance to major cities, population, and gross disposable household income.

Overview

The observed values are counts yi=(yi1,yi2,,yi(K+1)) of land cover for cell i and K+1 categories. The associated random variable has a multinomial probability function:

Pr(yi)=(niyi1,,yi(K+1))πyi1i1πyi(K+1)i(K+1)

where πij is the probability that the land cover of a cell at location i is the land class j.

An interpretable approach?

Let consider LSi and LCi random variables representing land suitability and land cover, respectively, for a cell inside i.

πij=K+1k=1Pr(LSi=k)Pr(LCi=jLSi=k)

For flexibility, we can use an auxiliary variable Zij that defines of land suitability such as: Pr(LSi=j)=Pr(max(Zi1,,Zi(K+1))=Zij). Notice that land suitability LSi is deterministic given Zi=(Zi1,Zi(K+1)). Then Zij should include a variety of terms that explain the uncertainty of land suitability: Zij=f1j(elevi)+f2j(soil water avail)++fpj(temperature indicators)+Wij+ϵij, where fi’s and Wij’s represent non-linear functions and spatial stochastic processes respectively. It could also include spatial correlation between categories if necessary.

The second term, Pr(LCi=jLSi=k)=pikj, can be think of elements of (K+1)×(K+1) transition matrices Pi between categories explained by human behaviour. Modelling this term might get complicated and make the model unidentifiable if no assumptions are made. In general, it can be modelled as: logit(pikj)=g1kj(population)+g2kj(distance to major cities)+

In order to make the model identifiable, we can make certain assumptions for Pi. For example:

  • Letting Pi=I is the same of assuming that human behaviour has no influence on land cover, so it is the same as land suitability.
  • The next simple option is to assume that Pi=P meaning that the transition matrix does not depend of location i.
  • We could add more flexibility by setting Pi=αiP+(1αi)I, where αi[0,1] is an standardization of distance to major cities. This implies that Pi shrinks to I at greater distances.

A simpler approach

In order to include a variety of flexible terms. We can use a latent variable Zij that defines the probability πij such as: πij=Pr(max(Zi1,,Zi(K+1))=Zij)

Then Zij can include a varity of terms without restriction (at least theoretically).

Zij=LSij+HBi

Land suitability is defined by environmental variables and some spatial structure:

LSij=f1(elevationi)+f2(soil water availability)++fp(temperature indicators)+Wij

HBi=g1(population)+g2(distance to mayor cities)+