10 Local Measures of Spatial Autocorrelation

Published

January 25, 2023

Modified

September 26, 2025

10.1 Overview

Local Measures of Spatial Autocorrelation (LMSA) focus on the relationships between each observation and its surroundings, rather than providing a single summary of these relationships across the map. In this sense, they are not summary statistics but scores that allow us to learn more about the spatial structure in our data. The general intuition behind the metrics however is similar to that of global ones. Some of them are even mathematically connected, where the global version can be decomposed into a collection of local ones. One such example are Local Indicators of Spatial Association (LISA). Beside LISA, Getis-Ord’s Gi-statistics will be introduce as an alternative LMSA statistics that present complementary information or allow us to obtain similar insights for geographically referenced data.

In this hands-on exercise, you will learn how to compute Local Measures of Spatial Autocorrelation (LMSA) by using sfdep package. By the end to this hands-on exercise, you will be able to:

import geospatial data using appropriate function(s) of sf package,
import csv file using appropriate function of readr package,
perform relational join using appropriate join function of dplyr package,
compute Local Indicator of Spatial Association (LISA) statistics for detecting clusters and outliers by using appropriate functions sfdep package;
compute Getis-Ord’s Gi-statistics for detecting hot spot or/and cold spot area by using appropriate functions of sfdep package; and
to visualise the analysis output by using tmap package.

10.2 Getting Started

10.2.1 The analytical question

In spatial policy, one of the main development objective of the local govenment and planners is to ensure equal distribution of development in the province. Our task in this study, hence, is to apply appropriate spatial statistical methods to discover if development are even distributed geographically. If the answer is No. Then, our next question will be “is there sign of spatial clustering?”. And, if the answer for this question is yes, then our next question will be “where are these clusters?”

In this case study, we are interested to examine the spatial pattern of a selected development indicator (i.e. GDP per capita) of Hunan Provice, People Republic of China.(https://en.wikipedia.org/wiki/Hunan)

10.2.2 The Study Area and Data

Two data sets will be used in this hands-on exercise, they are:

Hunan province administrative boundary layer at county level. This is a geospatial data set in ESRI shapefile format.
Hunan_2012.csv: This csv file contains selected Hunan’s local development indicators in 2012.

10.2.3 Setting the Analytical Toolls

Before we get started, we need to ensure that sf, sfdep, tmap and tidyverse packages of R are currently installed in your R.

sf is use for importing and handling geospatial data in R,
tidyverse is mainly use for wrangling attribute data in R,
sfdep will be used to compute spatial weights, global and local spatial autocorrelation statistics, and
tmap will be used to prepare cartographic quality choropleth map.

The code chunk below is used to perform the following tasks:

creating a package list containing the necessary R packages,
checking if the R packages in the package list have been installed in R,
- if they have yet to be installed, RStudio will installed the missing packages,
launching the packages into R environment.

pacman::p_load(sf, sfdep, tmap, tidyverse)

10.3 Getting the Data Into R Environment

In this section, you will learn how to bring a geospatial data and its associated attribute table into R environment. The geospatial data is in ESRI shapefile format and the attribute table is in csv fomat.

10.3.1 Import shapefile into r environment

The code chunk below uses st_read() of sf package to import Hunan shapefile into R. The imported shapefile will be simple features Object of sf.

hunan <- st_read(dsn = "chap10/data/geospatial", 
                 layer = "Hunan") %>% 
  st_transform(crs = 32650)

Reading layer `Hunan' from data source `C:\tskam\r4gdsa\chap10\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 88 features and 7 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 108.7831 ymin: 24.6342 xmax: 114.2544 ymax: 30.12812
Geodetic CRS:  WGS 84

Tip

The raw data is in WGS 84 geographic coordinates system. For geospatial analysis, it is appropriate to use projected coordinates system. In the code chunk above, st_transform() is used to transform Hunan geospatial data from WGS 84 to UTM zone 50N (i.e. EPSG: 32650).

10.3.2 Import csv file into r environment

Next, we will import Hunan_2012.csv into R by using read_csv() of readr package. The output is R data frame class.

hunan2012 <- read_csv("chap10/data/aspatial/Hunan_2012.csv")

10.3.3 Performing relational join

The code chunk below will be used to update the attribute table of hunan’s sf data frame with the attribute fields of hunan2012 dataframe. This is performed by using left_join() of dplyr package.

hunan <- left_join(hunan,hunan2012) %>%
  select(1:4, 7, 15)

10.3.4 Visualising Regional Development Indicator

In the code chunks below, tmap functions are used:

to build two choropleth maps by using equal interval (i.e equal) and quantile (i.e. quantile) classification methods, and
to plot both maps next to each other by using tmap_arrange().

equal <- tm_shape(hunan) +
  tm_polygons(fill = "GDPPC",
              fill.scale = tm_scale_intervals(
                style = "equal",
                n = 5,
                values = "brewer.blues"),
              fill.legend = tm_legend(
                title = "GDPPC",
                position = tm_pos_in(
                  "left", "bottom"))) + 
  tm_borders(fill_alpha = 0.5) +
  tm_title("Equal interval classification")

quantile <- tm_shape(hunan) +
  tm_polygons(fill = "GDPPC",
              fill.scale = tm_scale_intervals(
                style = "quantile",
                n = 5,
                values = "brewer.blues"),
              fill.legend = tm_legend(
                title = "GDPPC",
                position = tm_pos_in(
                  "left", "bottom"))) + 
  tm_borders(fill_alpha = 0.5) +
  tm_title("Quantile interval classification")

tmap_arrange(equal, 
             quantile, 
             asp=1, 
             ncol=2)

Important

Does the plot above reveal any outliers or clusters?
Does the plot above indicate the presence of hot spots or cold spots?

10.4 Local Indicators of Spatial Association(LISA)

Local Indicators of Spatial Association or LISA are statistics that evaluate the existence of clusters and/or outliers in the spatial arrangement of a given variable. For instance if we are studying distribution of GDP per capita of Hunan Provice, People Republic of China, local clusters in GDP per capita mean that there are counties that have higher or lower rates than is to be expected by chance alone; that is, the values occurring are above or below those of a random distribution in space.

In this section, you will learn how to apply appropriate Local Indicators for Spatial Association (LISA), especially local Moran’I to detect cluster and/or outlier from GDP per capita 2012 of Hunan Province, PRC.

10.4.1 Computing Contiguity Spatial Weights

Before we can compute the local spatial autocorrelation statistics, we need to construct a spatial weights of the study area. The spatial weights is used to define the neighbourhood relationships between the geographical units (i.e. county) in the study area.

In the code chunk below, st_contiguity() of sfdep package is used to compute contiguity weight matrices for the study area. This function builds a neighbours list based on regions with contiguous boundaries. If you look at the documentation you will see that you can pass a “queen” argument that takes TRUE or FALSE as options. If you do not specify this argument the default is set to TRUE, that is, if you don’t specify queen = FALSE this function will return a list of first order neighbours using the Queen criteria.

Next, st_weights() is used to calculate polygon spatial weights from the nb list.

st_weights() provides three arguments, they are:

nb: A neighbor list object as created by st_neighbors().
style: Default “W” for row standardized weights. This value can also be “B”, “C”, “U”, “minmax”, and “S”. B is the basic binary coding, W is row standardised (sums over all links to n), C is globally standardised (sums over all links to n), U is equal to C divided by the number of neighbours (sums over all links to unity), while S is the variance-stabilizing coding scheme proposed by Tiefelsdorf et al. 1999, p. 167-168 (sums over all links to n).
allow_zero: If TRUE, assigns zero as lagged value to zone without neighbors.

wm_q <- hunan %>%
  mutate(nb = st_contiguity(geometry),
         wt = st_weights(
           nb, style = "W"),
         .before = 1)

Tip

In the code chunk above, mutate() is used to write the computed nb and wt values back into hunan sf data table.

To reveal the neighbour list, code chunk below is used.

summary(wm_q$nb)

Neighbour list object:
Number of regions: 88 
Number of nonzero links: 448 
Percentage nonzero weights: 5.785124 
Average number of links: 5.090909 
Link number distribution:

 1  2  3  4  5  6  7  8  9 11 
 2  2 12 16 24 14 11  4  2  1 
2 least connected regions:
30 65 with 1 link
1 most connected region:
85 with 11 links

The summary report above shows that there are 88 area units in Hunan. The most connected area unit has 11 neighbours. There are two area units with only one neighbour.

10.4.2 Row-standardised weights matrix

Next, we need to assign weights to each neighboring polygon. In our case, each neighboring polygon will be assigned equal weight (style=“W”). This is accomplished by assigning the fraction 1/(#ofneighbors) to each neighboring county then summing the weighted income values. While this is the most intuitive way to summaries the neighbors’ values it has one drawback in that polygons along the edges of the study area will base their lagged values on fewer polygons thus potentially over- or under-estimating the true nature of the spatial autocorrelation in the data. For this example, we’ll stick with the style=“W” option for simplicity’s sake but note that other more robust options are available, notably style=“B”.

What can we learn from the code chunk above?

The input of nb2listw() must be an object of class nb. The syntax of the function has two major arguments, namely style and zero.poly.
style can take values “W”, “B”, “C”, “U”, “minmax” and “S”. B is the basic binary coding, W is row standardised (sums over all links to n), C is globally standardised (sums over all links to n), U is equal to C divided by the number of neighbours (sums over all links to unity), while S is the variance-stabilizing coding scheme proposed by Tiefelsdorf et al. 1999, p. 167-168 (sums over all links to n).
If zero policy is set to TRUE, weights vectors of zero length are inserted for regions without neighbour in the neighbours list. These will in turn generate lag values of zero, equivalent to the sum of products of the zero row t(rep(0, length=length(neighbours))) %*% x, for arbitrary numerical vector x of length length(neighbours). The spatially lagged value of x for the zero-neighbour region will then be zero, which may (or may not) be a sensible choice.

10.4.3 Computing local Moran’s I

To compute local Moran’s I, the local_moran() function of sfdep will be used. It computes Ii values, given a set of zi values and a listw object providing neighbour weighting information for the polygon associated with the zi values.

The code chunks below are used to compute local Moran’s I of GDPPC2012 at the county level.

set.seed(1234)

lisa <- wm_q %>% 
  mutate(local_moran = local_moran(
    GDPPC, nb, wt, nsim = 99),
         .before = 1) %>%
  unnest(local_moran)

local_moran() function returns a matrix of values whose columns are:

Ii: the local Moran’s I statistics
E.Ii: the expectation of local moran statistic under the randomisation hypothesis
Var.Ii: the variance of local moran statistic under the randomisation hypothesis
Z.Ii:the standard deviate of local moran statistic
Pr(): the p-value of local moran statistic

glimpse(lisa)

Rows: 88
Columns: 21
$ ii           <dbl> -1.468468e-03, 2.587817e-02, -1.198765e-02, 1.022468e-03,…
$ eii          <dbl> 1.472724e-03, -1.337157e-02, 2.542108e-03, -9.522978e-05,…
$ var_ii       <dbl> 5.187992e-04, 1.413972e-02, 1.109689e-01, 4.768012e-06, 1…
$ z_ii         <dbl> -0.12912898, 0.33007787, -0.04361717, 0.51186518, 0.33919…
$ p_ii         <dbl> 8.972556e-01, 7.413411e-01, 9.652096e-01, 6.087454e-01, 7…
$ p_ii_sim     <dbl> 0.80, 0.98, 0.92, 0.56, 0.58, 0.86, 0.06, 0.12, 0.02, 0.2…
$ p_folded_sim <dbl> 0.40, 0.49, 0.46, 0.28, 0.29, 0.43, 0.03, 0.06, 0.01, 0.1…
$ skewness     <dbl> -1.0560692, -1.0439328, 0.8429629, 0.8423923, 1.3810032, …
$ kurtosis     <dbl> 1.31492710, 0.76654076, 0.12251048, 0.37514702, 2.3883407…
$ mean         <fct> Low-High, Low-Low, High-Low, High-High, High-High, High-L…
$ median       <fct> High-High, High-High, High-High, High-High, High-High, Hi…
$ pysal        <fct> Low-High, Low-Low, High-Low, High-High, High-High, High-L…
$ nb           <nb> <2, 3, 4, 57, 85>, <1, 57, 58, 78, 85>, <1, 4, 5, 85>, <1,…
$ wt           <list> <0.2, 0.2, 0.2, 0.2, 0.2>, <0.2, 0.2, 0.2, 0.2, 0.2>, <0…
$ NAME_2       <chr> "Changde", "Changde", "Changde", "Changde", "Changde", "C…
$ ID_3         <int> 21098, 21100, 21101, 21102, 21103, 21104, 21109, 21110, 2…
$ NAME_3       <chr> "Anxiang", "Hanshou", "Jinshi", "Li", "Linli", "Shimen", …
$ ENGTYPE_3    <chr> "County", "County", "County City", "County", "County", "C…
$ County       <chr> "Anxiang", "Hanshou", "Jinshi", "Li", "Linli", "Shimen", …
$ GDPPC        <dbl> 23667, 20981, 34592, 24473, 25554, 27137, 63118, 62202, 7…
$ geometry     <POLYGON [m]> POLYGON ((22320.48 3301894,..., POLYGON ((35522.9…

10.4.3.1 Mapping the local Moran’s I

10.4.3.2 Mapping local Moran’s I values

Using choropleth mapping functions of tmap package, we can plot the local Moran’s I values by using the code chinks below.

tm_shape(lisa) +
  tm_polygons(fill = "ii",
              fill.scale = tm_scale_intervals(
                style = "pretty",
                n = 5,
                values = "brewer.RdBu"),
              fill.legend = tm_legend(
                title = "Local Morans'I",
                position = tm_pos_out())) + 
  tm_borders(fill_alpha = 0.5) +
  tm_title("Loal Morans'I of GDPPC (Queen's method)")

10.4.3.3 Mapping local Moran’s I p-values

The choropleth shows there is evidence for both positive and negative Ii values. However, it is useful to consider the p-values for each of these values, as consider above.

The code chunks below produce a choropleth map of Moran’s I p-values by using functions of tmap package.

tm_shape(lisa) +
  tm_polygons(fill = "p_ii", 
              fill.scale = tm_scale_intervals(
                breaks = c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
                values = "-brewer.Reds"),
              fill.legend = tm_legend(
                title = "p-value",
      position = tm_pos_out())) + 
  tm_borders(fill_alpha = 0.5) +
  tm_title("p-values of Loal Moran's I of GDPPC (Queen's method)")

10.4.3.4 Mapping both local Moran’s I values and p-values

For effective interpretation, it is better to plot both the local Moran’s I values map and its corresponding p-values map next to each other.

The code chunk below will be used to create such visualisation.

ii.map <- tm_shape(lisa) +
  tm_polygons(fill = "ii",
              fill.scale = tm_scale_intervals(
                style = "pretty",
                n = 5,
                values = "brewer.RdBu"),
              fill.legend = tm_legend(
                title = "Local Moran's I",
                position = tm_pos_in(
                  "left", "bottom"))) + 
  tm_borders(fill_alpha = 0.5) +
  tm_title("Loal Moran's I of GDPPC (Queen's method)")

p_ii.map <- tm_shape(lisa) +
  tm_polygons(fill = "p_ii", 
              fill.scale = tm_scale_intervals(
                breaks = c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
                values = "-brewer.Reds"),
              fill.legend = tm_legend(
                title = "p-value",
      position = tm_pos_in("left", "bottom")
    )) + 
  tm_borders(fill_alpha = 0.5) +
  tm_title("p-values of Loal Moran's I of GDPPC (Queen's method)")

tmap_arrange(ii.map, p_ii.map, asp=1, ncol=2)

10.4.4 Preparing and Visualising LISA Map

Visualizing LISA involves creating maps that highlight significant spatial patterns, such as hot spots (high values clustered together) and cold spots (low values clustered together), as well as identifying spatial outliers. Common methods include choropleth maps showing LISA values, Moran scatterplots which depict local spatial autocorrelation for individual locations, and hot spot analysis maps.

The LISA Cluster Map shows the significant locations color coded by type of spatial autocorrelation. The first step before we can generate the LISA cluster map is to plot the Moran scatterplot.

10.4.4.1 Plotting Moran scatterplot

A Moran Scatterplot is a graphical tool in spatial analysis that visualizes the relationship between a variable’s values and the average values of its neighboring locations (its spatial lag) to identify spatial autocorrelation. By plotting standardized values of the variable on the x-axis and its spatial lag on the y-axis, the plot’s slope represents Moran’s I, revealing patterns of high-high or low-low value clusters and high-low or low-high value outliers.

In the code chunk below, st_lag() of sfdep package is used to calculates the spatial lag of GDPPC given a neighbor (i.e. nb) and weights list (i.e. wt).

lisa <- lisa %>%
  mutate(lag_GDPPC = st_lag(
    GDPPC, nb, wt),
    .before = 1) %>%
  unnest(lag_GDPPC)

Then, the code chunk below is used to plot the Moran Scatterplot of GDPPC against its spatial lag version as shown in the code cunk below.

ggplot(data = lisa, 
       aes(x = GDPPC, 
           y = lag_GDPPC)) +
  geom_point() +
  geom_smooth(method = "lm", 
              se = FALSE, 
              color = "red") +
  labs(x = "GDPPC",
       y = "Spatial Lag of GDPPC",
       title = "Moran Scatterplot") +
  theme_minimal()

The Moran Scatterplot above is not very intuitive. To support effective visual communication, the code chunk below is used.

ggplot(data = lisa, 
       aes(x = GDPPC, 
           y = lag_GDPPC, 
           color = mean)) +
  geom_point(size = 2) +
  geom_smooth(method = "lm", 
              se = FALSE, 
              color = "black") +
  geom_hline(yintercept=mean(lisa$lag_GDPPC), lty=2) + 
    geom_vline(xintercept=mean(lisa$GDPPC), lty=2) +
  scale_color_manual(
    values = c("High-High" = "red", 
               "Low-Low" = "blue",
               "Low-High" = "lightblue", 
               "High-Low" = "pink")) +
  labs(x = "GDPPC",
       y = "Spatial Lag of GDPPC",
       title = "Moran Scatterplot with LISA Quadrants") +
  theme_minimal()

Note

Notice that the plot is split in 4 quadrants. The top right corner belongs to areas that have high GDPPC and are surrounded by other areas that have the average level of GDPPC. This are the high-high locations in the lesson slide.

10.4.4.2 Plotting Moran scatterplot with standardised variable

First we will use scale() to centers and scales the variable. Here centering is done by subtracting the mean (omitting NAs) the corresponding columns, and scaling is done by dividing the (centered) variable by their standard deviations.

lisa <- lisa %>%
  mutate(z_GDPPC = scale(GDPPC),
         z_lag_GDPPC = scale(lag_GDPPC),
         .before = 1)

The as.vector() added to the end is to make sure that the data type we get out of this is a vector, that map neatly into out dataframe.

Now, we are ready to plot the Moran scatterplot again by using the code chunk below.

ggplot(data = lisa, 
       aes(x = z_GDPPC, 
           y = z_lag_GDPPC, 
           color = mean)) +
  geom_point(size = 2) +
  geom_smooth(method = "lm", 
              se = FALSE, 
              color = "black") +
  geom_hline(yintercept=mean(lisa$z_lag_GDPPC), lty=2) + 
    geom_vline(xintercept=mean(lisa$z_GDPPC), lty=2) +
  scale_color_manual(
    values = c("High-High" = "red", 
               "Low-Low" = "blue",
               "Low-High" = "lightblue", 
               "High-Low" = "pink")) +
  labs(x = "Standardised GDPPC",
       y = "Standardised Spatial Lag of GDPPC",
       title = "Standardised Moran Scatterplot with LISA Quadrants") +
  theme_minimal()

10.4.4.3 Preparing LISA map classes

The code chunks below show the steps to prepare a LISA cluster map.

Next, derives the spatially lagged variable of interest (i.e. GDPPC) and centers the spatially lagged variable around its mean.

This is follow by centering the local Moran’s around the mean.

Next, we will set a statistical significance level for the local Moran.

signif <- 0.05

10.4.4.4 Plotting and visualising LISA map

By default, the LISA classes are stored in mean, median and pysal fields of lisa sf data frame. To prepare the LISA map, we will derive a new field called LISA-cluster from either one of these three field by using the code chunk below.

lisa <- lisa %>%
  mutate(
    LISA_cluster = ifelse(
      p_ii < 0.05,
      as.character(mean), 
      "Insignificant"),
    LISA_cluster = factor(
      LISA_cluster,
      levels = c("Insignificant", "Low-Low", "Low-High", "High-Low", "High-High")
    )
  )

Note

Notice that a new class called Insignificant has been added for records whereby p_ii > 0.05.

Now, we can plot the LISA map by using the code chunk below.

lisa_map <- tm_shape(lisa) + 
  tm_polygons(
    fill = "LISA_cluster",
    fill.scale = tm_scale_categorical(
      values = c(
        "grey80",      # Insignificant
        "blue",        # Low-Low
        "lightblue",   # Low-High
        "pink",        # High-Low
        "red"          # High-High
      )
    ),
    fill.legend = tm_legend(
      title = "LISA Cluster",
      position = tm_pos_in("left", "bottom"))
  ) +
  tm_borders() +
  tm_title("Local Moran's I Clusters (p < 0.05)")
lisa_map

For effective interpretation, it is better to plot both the local Moran’s I values map and its corresponding p-values map next to each other.

The code chunk below will be used to create such visualisation.

We can also include the local Moran’s I map and p-value map as shown below for easy comparison.

Question: What statistical observations can you draw from the LISA map above?

10.5 Hot Spots and Cold Spots Analysis (HCSA)

Hot spots and cold spots analysis is a technique used in spatial statistics to identify statistically significant clusters of high values (hot spots) and low values (cold spots) of a geographically referenced attribute (i.e. GDPPC). These analyses help visualize and understand spatial patterns, revealing areas where data points are clustered more densely than would be expected by random chance.

Figure below shows the hot spots and cold spots of GDP per capita of Hunan Provice, People Republic of China.

The term hot spots refer to areas where high values are concentrated and surrounded by other high values. It has been used generically across disciplines to describe a region or value that is higher relative to its surroundings (Lepers et al 2005, Aben et al 2012, Isobe et al 2015). For example, a hot spot analysis might reveal clusters of high GDP per capita in a province, indicating areas with significantly higher GDP per capita than other parts of the province.

On the other hands, the term cold spots refer to areas where low values are concentrated and surrounded by other low values. An example could be clusters of low GDP per capita, indicating areas where GDP per capita is consistently low.

HCSA uses statistical methods, like the Getis-Ord Gi* statistic, to determine if the observed clustering is statistically significant or could have occurred randomly.

The analysis consists of four steps:

Deriving spatial weight matrix
Computing Gi* statistics
Mapping Gi* statistics
Communicating the analysis results

10.5.1 Deriving distance weight matrix

Whist the spatial autocorrelation considered units which shared borders, for Getis-Ord Gi* statistics, we are defining neighbours based on distance.

There are two type of distance-based proximity matrix, they are:

fixed distance weight matrix,
adaptive distance weight matrix, and
inverse distance weights.

10.5.1.1 Computing fixed distance weights

In this section, you will learn how to compute fixed distance weight matrix by using st_dist_band() and st_weights() of sfdep package.

Firstly, we need to determine the upper limit for distance band by using the steps below:

ct <- critical_threshold(st_geometry(hunan))
ct

[1] 60799.91

The summary report shows that the largest first nearest neighbour distance is 60.80 km, so using this as the upper threshold gives certainty that all units will have at least one neighbour.

Using the critical threshold value computed above, code chunk below is used to compute the fixed distance weight.

hunan_fdw <- hunan %>% 
  mutate(nb = include_self(
    st_dist_band(
      st_geometry(geometry), 
                 upper = ct)),
    wt = st_weights(nb, 
                      style = "W"),
    .before = 1)

Note

Noted that for Gi* include_self() must be used to endure that .

10.5.1.2 Computing adaptive distance weights

One of the characteristics of fixed distance weight matrix is that more densely settled areas (usually the urban areas) tend to have more neighbours and the less densely settled areas (usually the rural counties) tend to have lesser neighbours. Having many neighbours smooth the neighbour relationship across more neighbours.

It is possible to control the numbers of neighbours directly using k-nearest neighbours, either accepting asymmetric neighbours or imposing symmetry as shown in the code chunk below.

In this section, you will learn how to compute fixed distance weight matrix by using st_dist_band() and st_weights() of sfdep package.

hunan_adw <- hunan %>%
  mutate(nb = include_self(
    st_knn(
      st_geometry(geometry),
      k = 6)),
    wt = st_weights(
      nb, style = "W"),
    .before = 1)

10.5.1.3 Computing adaptive distance weights

10.5.2 Computing local Gi*

Local Gi* (pronounced “G-i-star”) is a Getis-Ord statistic used in spatial analysis to identify statistically significant hot spots and cold spots of high or low attribute values within a dataset. It measures the spatial clustering of values by comparing the total value within a specified neighborhood to the global total, producing a Z-score and p-value for each location to determine the likelihood that the observed pattern is not due to random chance.

In this sub-section, you will gain hands-on experience on performing Gi* analysis by using local_gstar_perm() of sfdep package.

HCSA_fdw <- hunan_fdw %>%
  mutate(
    gistar = local_gstar_perm(
      GDPPC, nb, wt, nsim = 99),
    .before = 1) %>%
  unnest(gistar)

10.5.3 Preparing and Visualising HCSA Map

10.5.3.1 Mapping Gi* with fix distance weights

In the code chunk below, thematic mapping functions of tmaps are used to prepare a choropleth map showing the distribution of Gi* values.

tm_shape(HCSA_fdw) +
  tm_polygons(fill = "gi_star",
              fill.scale = tm_scale_intervals(
                style = "pretty",
                n = 6,
                values = "brewer.rd_bu"),
              fill.legend = tm_legend(
                title = "Gi*",
                position = tm_pos_out())) + 
  tm_borders(fill_alpha = 0.5) +
  tm_title("Gi* of GDPPC (Fixed Bandwidth d = 60799.91m)")

10.5.3.2 Mapping local Gi* p-values

The choropleth shows there is evidence for both positive and negative Ii values. However, it is useful to consider the p-values for each of these values, as consider above.

The code chunk below plots a choropleth map of p-values of Gi* by using functions of tmap package.

tm_shape(HCSA_fdw) +
  tm_polygons(fill = "p_sim", 
              fill.scale = tm_scale_intervals(
                breaks = c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
                values = "-brewer.Reds"),
              fill.legend = tm_legend(
                title = "simulated p-value",
      position = tm_pos_out())) + 
  tm_borders(fill_alpha = 0.5) +
  tm_title("p-values of local Gi* of GDPPC (Fixed distance)")

10.5.3.3 Mapping both Gi* values and p-values

For effective interpretation, it is better to plot both the Gi* values map and its corresponding p-values map next to each other.

The code chunk below will be used to create such visualisation.

Gi_star_map <- tm_shape(HCSA_fdw) +
  tm_polygons(fill = "gi_star",
              fill.scale = tm_scale_intervals(
                style = "pretty",
                n = 5,
                values = "-brewer.rd_bu"),
              fill.legend = tm_legend(
                title = "local Gi*",
                position = tm_pos_in(
                  "left", "bottom"))) + 
  tm_borders(fill_alpha = 0.5) +
  tm_title("Local Gi* of GDPPC")

p_values_map <- tm_shape(HCSA_fdw) +
  tm_polygons(fill = "p_sim", 
              fill.scale = tm_scale_intervals(
                breaks = c(0, 0.001, 0.01, 0.05, 0.1, 1),
                values = "-brewer.reds"),
              fill.legend = tm_legend(
                title = "p-value",
      position = tm_pos_in("left", "bottom")
    )) + 
  tm_borders(fill_alpha = 0.5) +
  tm_title("p-values of local Gi* of GDPPC (fixed distance)")

tmap_arrange(Gi_star_map, p_values_map, asp=1, ncol=2)

10.5.3.4 Plotting and visualising HCSA Map

By default, the HCSA classes are stored in cluster field of HCSA_fdw sf data frame. To prepare the HSCA map, we will derive a new field called HCSA_cluster from cluster field by using the code chunk below.

HCSA_fdw <- HCSA_fdw %>%
  mutate(HCSA_cluster = case_when(
    p_sim > 0.05 ~ "Insignificant",
    p_sim <= 0.05 & cluster == "High" ~ "Hot spot",
    p_sim <= 0.05 & cluster == "Low"  ~ "Cold spot",
    TRUE ~ "Other"),
    HCSA_cluster = factor(
      HCSA_cluster,
      levels = c("Insignificant", "Hot spot", "Cold spot")
    ),
    .before = 1
  )

Note

Notice that a new class called Insignificant has been added for records whereby p_sim > 0.05.

Now, we can plot the HCSA map by using the code chunk below.

HCSA_map <- tm_shape(HCSA_fdw) + 
  tm_polygons(
    fill = "HCSA_cluster",
    fill.scale = tm_scale_categorical(
      values = c(
        "grey80",      # Insignificant
        "red",        # Low-Low
        "blue"          # High-High
      )
    ),
    fill.legend = tm_legend(
      title = "HSCA Cluster",
      position = tm_pos_in("left", "bottom"))
  ) +
  tm_borders() +
  tm_title("HCSA Clusters (p < 0.05)")
HCSA_map

For effective interpretation, it is better to plot both the local Gi* values map and its corresponding HCSA map next to each other.

The code chunk below will be used to create such visualisation.

tmap_arrange(Gi_star_map, HCSA_map, asp=1, ncol=2)

Question: What statistical observations can you draw from the LISA map above?