18 Market Area Analysis for Retail and Service Locations using Huff model

Published

November 9, 2025

Modified

November 15, 2025

18.1 Overview

Market area analysis (also known as Trade Area Analysis) is the process of defining and studying the geographical area from which a business draws its customers. It involves analyzing data on consumer demographics, spending habits, and competitors to understand market viability, identify opportunities, and inform decisions about location, marketing, and expansion. It often combines empirical observation, like customer spotting, with mathematical models to identify market segments and understand customer behavior.

Huff model is a geospatial analysis method commonly used in market area analysis to predict the probability that a consumer from a specific area will patronize a particular store, based on the store’s attractiveness and distance relative to competitors. It calculates a store’s market share by considering factors like size or sales for attractiveness and distance or travel time, with the probability decreasing as distance or competition increases. This tool helps businesses assess potential revenue, define trade areas, and make informed decisions about site selection

18.1.1 Learning Outcome

In this hands-on exercise, you will gain hands-on experience on how to perform market area analysis by using Huff model of MCI packages.

By the end of this hands-on exercise, you will be able:

to import GIS data into R and save them as simple feature data frame by using appropriate functions of sf package of R;
to import aspatial data into R and save them as simple feature data frame by using appropriate functions of sf package of R;
to clean and tidy the data sets so that they are ready for the analysis,
to compute distance matrix between the origins (i.e. villages) and destinations (i.e stores),
to computer interaction probability using Huff model, and
to visualise the market area by using tmap and ggplot2 packages.

18.1.2 Setting the scene

A fast-food franchise is a business model in which an individual (the franchisee) acquires the rights to operate a restaurant under an established brand owned by a franchisor. This arrangement typically involves franchise fees and adherence to standardized operational procedures. Key considerations prior to investment include the scale of initial capital outlay, the strength of the brand, and the strategic importance of store location. Capital investment requirements vary significantly, generally ranging from approximately $200,000 to more than $2 million. Major global fast-food franchise brands include McDonald’s, Burger King, Pizza Hut, KFC, and Taco Bell.

In this exercise, you will assume the role of a data analyst within the business development team of a fast-food franchise. Your objective is to delineate the trade areas of existing stores in Taichung City and to produce corresponding spatial maps to support market analysis and strategic decision-making.

18.1.3 The data

Four data sets will be used in this hands-on exercise, they are:

Geospatial data
- STORES_TN, store locations data in ESRI shapefile format.
- VILLAGE_TN, village boundary data in ESRI shapefile format.
Aspatial data
- pop2019.csv, population data of Taichung City in 2019. H_CNT, P_CNT, M_CNT, and F_CNT indicate Number of household, Total population, Male population and Female population respectively.
- Store_sales_data.csv provides store sales data, and

18.2 Getting Started

Before we getting started, it is important for us to install the necessary R packages and launch them into RStudio environment.

The R packages need for this exercise are as follows:

sf for handling geospatial data and performing geoprocessing function such as computing distance matrix,
MCI for performing Huff and MCI modelling,
tidyverse for performing data science tasks including visualisation,
ggstatsplot, an R package that extends the ggplot2 visualization package to combine data visualization with statistical analysis, creating plots that include statistical test results directly, and
tmap for plotting cartographic quality thematic maps.

The code chunk below installs and launches these R packages into RStudio environment.

pacman::p_load(sf, tidyverse, MCI, tmap, ggstatsplot)

18.3 Importing Data

18.3.1 Importing geospatial data

First, st_read() of sf package will be used to import the three geospatial data sets into R environment by using the code chunk below.

stores <- st_read(
  dsn = "chap18/data/geospatial",
  layer = "STORES_TC")

Reading layer `STORES_TC' from data source 
  `C:\tskam\r4gdsa\chap18\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 28 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 202702.9 ymin: 2662196 xmax: 232773.2 ymax: 2693557
Projected CRS: TWD97 / TM2 zone 121

town <- st_read(
  dsn = "chap18/data/geospatial",
  layer = "TOWN_TC")

Reading layer `TOWN_TC' from data source `C:\tskam\r4gdsa\chap18\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 28 features and 7 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 194797.1 ymin: 2654887 xmax: 240450.2 ymax: 2703838
Projected CRS: TWD97 / TM2 zone 121

village <- st_read(
  dsn = "chap18/data/geospatial",
  layer = "VILLAGE_TC")

Reading layer `VILLAGE_TC' from data source 
  `C:\tskam\r4gdsa\chap18\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 622 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 194797.1 ymin: 2654887 xmax: 240450.2 ymax: 2703838
Projected CRS: TWD97 / TM2 zone 121

The output print above shown that the three data sets are in TWD97 / TM2 zone 121 (i.e epsg: 3826) projected coordinates systems.

Important

For any geospatial analysis, it is important to check and confirm that all the geospatial data are in the similar projected coodinates system.

tmap_mode("view")
tm_shape(town) +
  tm_polygons() +
tm_shape(village) +
  tm_polygons() + 
tm_shape(stores) +
  tm_dots()

tmap_mode("plot")

18.3.2 Importing aspatial data

Next, read_csv() of readr package will be used to import the two aspatial data sets into R environment.

pop2019 <- read_csv(
  "chap18/data/aspatial/pop2019.csv")

store_sales <- read_csv(
  "chap18/data/aspatial/Store_sales_data.csv")

summary(store_sales)

  Store Code          POD code             Bills            Amount       
 Length:5499        Length:5499        Min.   :   1.0   Min.   :      0  
 Class :character   Class :character   1st Qu.:  37.0   1st Qu.:  26702  
 Mode  :character   Mode  :character   Median : 186.0   Median : 139643  
                                       Mean   : 331.9   Mean   : 236787  
                                       3rd Qu.: 457.0   3rd Qu.: 331952  
                                       Max.   :7764.0   Max.   :6076060  
    Ave Bill      
 Min.   :    0.0  
 1st Qu.:  661.3  
 Median :  730.5  
 Mean   :  825.1  
 3rd Qu.:  862.2  
 Max.   :24356.0

18.4 Data Preparation

18.4.1 Aggregating store sales

Code chunk below is used to aggregate the store sales data at Store Code.

store_sales_aggre <- store_sales %>%
  group_by(`Store Code`) %>%
  summarise(
    "Total_Bill" = sum(Bills, na.rm = TRUE),
    "Total_Sales" = sum(Amount, na.rm = TRUE))

18.4.2 Relational join

In the code chunk below, left_join() of dplyr package is used to append data from store_sales data.frame onto stores sf data.frame. At the same time a new field called Store_size has been created.

stores <- stores %>%
  left_join(store_sales_aggre,
            by = join_by(
              Store_CD == `Store Code`)) %>%
  mutate(Store_size = 100)

Next, summary() is used to check if the are any missing values in the data.frame.

summary(stores)

  Store_Name         Store_Addr          Store_CD           Total_Bill   
 Length:28          Length:28          Length:28          Min.   : 1767  
 Class :character   Class :character   Class :character   1st Qu.: 5698  
 Mode  :character   Mode  :character   Mode  :character   Median : 8634  
                                                          Mean   : 8021  
                                                          3rd Qu.:10349  
                                                          Max.   :14468  
  Total_Sales               geometry    Store_size 
 Min.   :1475963   POINT        :28   Min.   :100  
 1st Qu.:4044821   epsg:3826    : 0   1st Qu.:100  
 Median :6116336   +proj=tmer...: 0   Median :100  
 Mean   :5542852                      Mean   :100  
 3rd Qu.:6959911                      3rd Qu.:100  
 Max.   :8234234                      Max.   :100

18.4.3 Tidying pop2019 data.frame

We are planning to append data in pop2019 data.frame onto village sf data.frame. Before we can do so, it is wise to check in to values in the join field are the same.

glimpse(pop2019)

Rows: 625
Columns: 11
$ COUNTY_ID <dbl> 66000, 66000, 66000, 66000, 66000, 66000, 66000, 66000, 6600…
$ COUNTY    <chr> "臺中市", "臺中市", "臺中市", "臺中市", "臺中市", "臺中市", "臺中市", "臺中市", "臺中市…
$ TOWN_ID   <dbl> 66000110, 66000110, 66000110, 66000110, 66000110, 66000110, …
$ TOWN      <chr> "大甲區", "大甲區", "大甲區", "大甲區", "大甲區", "大甲區", "大甲區", "大甲區", "大甲區…
$ V_ID      <chr> "66000110-002", "66000110-009", "66000110-021", "66000110-00…
$ VILLAGE   <chr> "大甲里", "中山里", "太白里", "孔門里", "文曲里", "文武里", "日南里", "平安里", "江南里…
$ H_CNT     <dbl> 454, 1781, 406, 383, 355, 2071, 2039, 785, 461, 559, 769, 80…
$ P_CNT     <dbl> 1255, 5252, 1471, 1001, 1264, 6687, 6739, 2642, 1618, 1977, …
$ M_CNT     <dbl> 626, 2533, 758, 538, 640, 3244, 3406, 1310, 838, 983, 1273, …
$ F_CNT     <dbl> 629, 2719, 713, 463, 624, 3443, 3333, 1332, 780, 994, 1239, …
$ INFO_TIME <chr> "108Y12M", "108Y12M", "108Y12M", "108Y12M", "108Y12M", "108Y…

glimpse(village)

Rows: 622
Columns: 11
$ VILLCODE   <chr> "66000110029", "66000120001", "66000120002", "66000120003",…
$ COUNTYNAME <chr> "臺中市", "臺中市", "臺中市", "臺中市", "臺中市", "臺中市", "臺中市", "臺中市", "臺中…
$ TOWNNAME   <chr> "大甲區", "清水區", "清水區", "清水區", "清水區", "清水區", "清水區", "清水區", "清水…
$ VILLNAME   <chr> "建興里", "鰲峰里", "靈泉里", "清水里", "文昌里", "南寧里", "西寧里", "北寧里", "中興…
$ VILLENG    <chr> "Jianxing Vil.", "Aofeng Vil.", "Lingquan Vil.", "Qingshui …
$ COUNTYID   <chr> "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",…
$ COUNTYCODE <chr> "66000", "66000", "66000", "66000", "66000", "66000", "6600…
$ TOWNID     <chr> "B11", "B12", "B12", "B12", "B12", "B12", "B12", "B12", "B1…
$ TOWNCODE   <chr> "66000110", "66000120", "66000120", "66000120", "66000120",…
$ NOTE       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ geometry   <MULTIPOLYGON [m]> MULTIPOLYGON (((211953.9 27..., MULTIPOLYGON (…

The plot above shown that the village ID (i.e. V_ID) in pop2019 data.frame is not in the same structure as the village code in village sf data.frame.

Code chunk below will be used to remove the “-” sign from the values.

pop2019 <- pop2019 %>%
  mutate(V_ID = str_remove_all(
    V_ID, "-"))

Now, we are ready to append data in pop2019 data.frame onto village sf data.frame by using left_join() of dplyr package.

village <- village %>%
  left_join(pop2019,
            by = join_by(
              VILLCODE == V_ID))

18.4.4 Check for missing or zero values

Code chunk below is used to check in any records with missing values.

summary(village)

   VILLCODE          COUNTYNAME          TOWNNAME           VILLNAME        
 Length:622         Length:622         Length:622         Length:622        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
   VILLENG            COUNTYID          COUNTYCODE           TOWNID         
 Length:622         Length:622         Length:622         Length:622        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
   TOWNCODE             NOTE             COUNTY_ID        COUNTY         
 Length:622         Length:622         Min.   :66000   Length:622        
 Class :character   Class :character   1st Qu.:66000   Class :character  
 Mode  :character   Mode  :character   Median :66000   Mode  :character  
                                       Mean   :66000                     
                                       3rd Qu.:66000                     
                                       Max.   :66000                     
                                       NA's   :5                         
    TOWN_ID            TOWN             VILLAGE              H_CNT     
 Min.   :6.6e+07   Length:622         Length:622         Min.   : 180  
 1st Qu.:6.6e+07   Class :character   Class :character   1st Qu.: 724  
 Median :6.6e+07   Mode  :character   Mode  :character   Median :1335  
 Mean   :6.6e+07                                         Mean   :1592  
 3rd Qu.:6.6e+07                                         3rd Qu.:2195  
 Max.   :6.6e+07                                         Max.   :7599  
 NA's   :5                                               NA's   :5     
     P_CNT           M_CNT          F_CNT       INFO_TIME        
 Min.   :  487   Min.   : 257   Min.   : 230   Length:622        
 1st Qu.: 2139   1st Qu.:1095   1st Qu.:1053   Class :character  
 Median : 3862   Median :1911   Median :1934   Mode  :character  
 Mean   : 4545   Mean   :2234   Mean   :2311                     
 3rd Qu.: 6151   3rd Qu.:3022   3rd Qu.:3171                     
 Max.   :18652   Max.   :8741   Max.   :9911                     
 NA's   :5       NA's   :5      NA's   :5                        
          geometry  
 MULTIPOLYGON :622  
 epsg:3826    :  0  
 +proj=tmer...:  0

The output print above revealed that the are five missing values in H_CNT, P_CNT, M_CNT and F_CNT fields.

Hence, code chunk below is used to remove the five records with missing values from village sf data.frame.

village <- village %>%
  filter(!is.na(P_CNT))

any(is.na(village$H_CNT))

[1] FALSE

any(is.na(village$P_CNT))

[1] FALSE

any(is.na(village$M_CNT))

[1] FALSE

any(is.na(village$F_CNT))

[1] FALSE

18.5 Computing Distance Matrix

Instead of calculate the distance between the villages and stores, we will calculate the centroid of each village using the code chunk below. Then, compute the distance between the centroids of the villages and the stores.

village_center <- st_point_on_surface(village)
distmat <- st_distance(village_center, stores)

Next, the code chunk below is used to convert the distance matrix into long data.frame format

rownames(distmat) <- village$`VILLCODE`
colnames(distmat) <- stores$`Store_CD`

distmat_long <- distmat %>%
  as.data.frame() %>%
  rownames_to_column(var = "VILLCODE") %>%
  pivot_longer(
    cols = -VILLCODE,
    names_to = "Store_CD",
    values_to = "distance"
  ) %>%
  mutate(distance = as.numeric(distance))

18.6 Preparing Interaction Matrix

Code chunk below will be used to build the interaction matrix required by MCI package.

interaction_matrix <- distmat_long %>%
  left_join(
    stores %>% 
      st_drop_geometry() %>% 
      select(Store_CD, Store_size),
    by = "Store_CD") %>%
  relocate(
    distance, .after = last_col())

summary(interaction_matrix)

   VILLCODE           Store_CD           Store_size     distance      
 Length:17276       Length:17276       Min.   :100   Min.   :  108.5  
 Class :character   Class :character   1st Qu.:100   1st Qu.: 6678.7  
 Mode  :character   Mode  :character   Median :100   Median :11965.3  
                                       Mean   :100   Mean   :12762.7  
                                       3rd Qu.:100   3rd Qu.:18122.3  
                                       Max.   :100   Max.   :41395.4

18.7 Computing Huff model

huff_share <- huff.shares(
  interaction_matrix, "VILLCODE", "Store_CD", 
  "Store_size", "distance")

18.7.1 Trade area analysis for a single store

First, let us select market share data of a store by usign Store_CD field (i.e. TG).

market_share <- huff_share %>%
  filter(Store_CD == "TG")

Next, trade areas of the selected will be derived by using the code chun below.

Trade_area <- village %>%
  left_join(market_share)

Before plotting the trade area map, code chunk below is used to extract the selected store from stores sf data.frame.

selected_store <- stores %>%
  filter(Store_CD == "TG")

Now, use the code chunk below to plot the trade areas of the selected store.

tm_shape(Trade_area) +
  tm_polygons(fill = "p_ij",
              fill.scale = tm_scale_intervals(
                style = "quantile", 
                n = 10,
                values = "brewer.blues"),
              fill.legend = tm_legend(
                title = "Patronage Probability")) +
  tm_shape(selected_store) +
    tm_dots(size = 0.3,
            fill = "red",
            col = "black") +
  tm_title("Distribution of Patronage Probability of Store BT by village") +
  tm_layout(frame = TRUE) +
  tm_borders(fill_alpha = 0.5) +
  tm_compass(type="8star", size = 2) +
  tm_scalebar() +
  tm_grid(alpha =0.2)

18.7.2 Store sales analysis

One of the usage of Huff model is to estimate the store sales by using village population.

Firstly, let us append P_CNT field from village sf.data.frame to huff_share data.frame by using the code chunk below.

huff_share_pp <- huff_share %>%
  left_join(village %>% 
      st_drop_geometry() %>% 
      select(VILLCODE, P_CNT),
    by = "VILLCODE") %>%
  relocate(
    distance, .after = last_col())

Next, shares.total() of MCI package will be used to calculate the total sales by using the code chunk below.

huff_total <- shares.total(
  huff_share_pp, "VILLCODE", "Store_CD",
  "p_ij", "P_CNT")

Code chunk below is used to append the estimate store sales (Ej) onto stores sf data.frame.

store_sales_est <- stores %>%
  left_join(huff_total,
            by = join_by(
              "Store_CD" == "suppliers_single"))

Now, we can visualise how well the estimate sales as compare to the actual sales by using the code chunk below.

store_sales_nongeom <- store_sales_est %>% st_drop_geometry()

ggscatterstats(
  store_sales_nongeom,
  x = sum_E_j,
  y = Total_Sales)

huff_share_wider <- huff_share %>%
  select(VILLCODE, Store_CD, p_ij) %>%
    pivot_wider(names_from = Store_CD, 
                values_from = p_ij)