The impact of scale on spatial connections: an exploratory analysis

Quantifying the intensity of spatial connections has been a crucial topic in many research fields, such as urban transportation, migration, and trade. Researchers have proposed various models, such as the gravity model and the radiation model, to quantify the magnitude of spatial connections. Traditionally, modeling the connections (relatedness) between spatial entities is limited to the physical space, but with the rapid growth of information technologies, the scope of spatial connections extends to the virtual space. However, one topic that has not been fully studied is how spatial scale may impact spatial connections in the virtual space and how this influence can be reflected in spatial decay models. In this study, we used two types of datasets (mass media and social media data) to explore the impact of scale on fitting the distance decay coefficient. The results confirmed that spatial scale can impact the magnitude of spatial decay effects in datasets with different characteristics.


Introduction
Modeling the intensity of spatial connections has been a crucial topic in many research fields, such as urban transportation, migration, and trade (Rodrigue, Comtois, and Slack 2013;Lewer and Van den Berg 2008;Chen, Gong, and Xie 2017). Spatial connections measure how two geographic entities relate to each other and cover a broader range of connections than "spatial interactions" (Yuan, Liu, and Wei 2017). From a geographic perspective, a crucial element that impacts the magnitude of spatial connections is the distance decay effect, which is best summarized by Tobler's first law of geography (TFL) (1970), "near things are more related than distant things." Researchers have proposed various models, such as the gravity model and the radiation model (Rodrigue, Comtois, and Slack 2013;Hardy, Frew, and Goodchild 2012;Simini et al. 2012), to quantify the magnitude of spatial connections, among which the gravity model is commonly used due to its calculation simplicity and the extensibility of the parameters (Sgrignoli et al. 2015; Allen and Arkolakis 2014; Anderson 2011).
Traditionally, spatial interactions or connections occur due to the physical movement of objects or people, such as transporting goods from one city to another. In the information age, the cause of spatial connection goes beyond physical movements (Liu et al. 2014b;Hu 2018).
With the rapid growth of information and communication technologies (ICTs), the scope of spatial connections expands from the physical space to the virtual space. For example, Liu et al. (2014b) investigated the relatedness between Chinese provinces based on their co-occurrence on webpages. Yuan, Liu, and Wei (2017) classified the ties and connections between China and other countries based on international news reports. These studies provided valuable insight for understanding various types of spatial connections in the big data era.
However, one topic that has not been fully studied is how spatial scale may impact spatial connections in the virtual space and how this influence can be reflected on spatial decay models. In geography, the "scale effect" refers to the fact that analytical results may vary with different analytical units (Batty 2008;Chen et al. 2019;Brockmann, Hufnagel, and Geisel 2006). For example, extracting crime hotspots at the census block group level may yield different results from extracting crime hotspots from the census tract level (Chainey, Tompson, and Uhlig 2008). This scale effect in spatial analysis is often referred to as the Modifiable Areal Unit Problem (MAUP) (Openshaw and Vanderknaap 1983;Jelinski and Wu 1996). This scale effect can potentially impact how two geographic entities are related to each other, both in the physical space and in the visual space. For example, previous studies have used online news reports to explore the distance decay effect between Chinese provinces (Yuan 2017); however, the distance friction coefficient may be different when fitting a gravity model to calculate the relatedness of places at different spatial scales (e.g., at a country level or city level).
In this study, we use two types of datasets to explore the impact of scale on fitting the distance decay coefficient. The first dataset is an open-source dataset, "The Global Data on Events, Location and Tone" (GDELT), which is a CAMEO-coded dataset updated daily to capture over a quarter-billion news event records worldwide dating back to 1979 (Schrodt 2012;Yonamine 2013;Ward et al. 2013;Leetaru and Schrodt 2013). The spatial connections based on GDELT data are defined as the frequency of "cooccurrence" of two places. For example, if countries A and B frequently appear in the same news record, then the two countries have a strong spatial connection. The second dataset was collected from social media platforms Flickr and Weibo, which include geotagged posts or photos from platform users. The spatial connection between places A and B based on the social media data is defined as the likelihood of a user visiting two places. For example, if more users visited both countries A and B, then the two countries have a strong spatial connection (Yuan, Liu, and Wei 2017). We fit a gravity model for both datasets and calculated the distance friction coefficient β for the two datasets both at two spatial scales: 1) country-level, which measures the spatial connections between countries, and 2) provincial-level, which measures the connections between Chinese provinces. The hypothesis is that the spatial scale has an impact on the magnitude of the distance decay coefficient.

Related Work
With the development of Information and Communication Technologies (ICTs), various new datasets, new strategies, and new analytical methods have been developed to examine traditional geography laws and principles (Yuan, Raubal, and Liu 2012;Gaster 1996). ICTs provide a wide range of spatio-temporal data sources and can capture realtime geographic patterns more efficiently and effectively (Song et al. 2010;Yuan 2009;Miller 2009;Zook, Kraak, and Ahas 2015;Yuan et al. 2019).
As one of the most fundamental theories in geography, Tobler's first law of geography (1970) specifies that near things are more related than things far apart (Sui 2004). Researchers have applied various models, such as the gravity model and the radiation model, to quantity how physical distance affects the connections (relatedness) between two geographic entities. With the rapid growth of ICT datasets, the exploration of spatial connections goes beyond the physical environment and extends into the virtual space (Miller 2004;Hecht and Moxley 2009). For example, Liu et al. (2014a) used gravity models to explore relatedness between Chinese provinces based on the cooccurrence of province names on webpages. Their assumption was that more co-occurrences of two province names implies a stronger connection. Hardy, Frew, and Goodchild (2012) applied an invariant exponential gravity model to investigate how distance impacts the generation volunteered geographic information (VGI) data on Wikipedia. Their results show that distance plays a crucial factor on the diffusion of information in the virtual space. Yuan, Liu, and Wei (2017) examined the magnitude of spatial connection from three datasets: mass media news reports, location-based social media (LBSM) check-ins, and airline transportation data. The results show that, at the country level, mass media data indicate a stronger distance decay effect than social media, but a weaker effect than air transportation data. Although many previous studies explored the impact of distance on spatial connections in the age of instant access, 1 The alliance between the US and the UK should be known as the "indestructible relationship there is one crucial factor that has yet to be taken into considerationthe scale. Researchers often need to aggregate the raw data or divide the study area into spatial units, and the size of the spatial unit inevitably impacts the level of detail captured in an analysis (Zhang et al. 2018;Chen, Gong, and Xie 2017

Data
As mentioned in Section 1, we used two types of data in this study: The first dataset, GDELT, consists of over a quarter-billion news event records dating back to 1979 and is updated daily (Yuan, Liu, and Wei 2017). The data includes various attributes such as date, time, actors, time, and approximated location of the event. For example, in a news report regarding the 47 th G7 summit held in Cornwall, England, Boris Johnson told the BBC after meeting U.S. President Joe Biden, "The alliance between the U.S. and the U.K. should be known as the "indestructible relationship 1 ." In this news report, Actor 1 will be "the United States government" and Actor 2 will be "the United Kingdom government". Table 1 shows the associated geographic locations of Actor 1, Actor 2, and the actual action. The second dataset consists of information from two social media platforms, Flickr and Weibo (a microblogging site in China). The Flickr data were collected globally from 2008-2012 and include more than 30 million geo-tagged Flickr images. The Weibo data were obtained from the official Weibo application program interface (API) between 05/01/2014 and 05/20/2014 and cover three million users. Each record captures the geographic coordinates (e.g., volunteered geographic information from the built-in positioning module of smart phones), date, time, and user identification (ID).

Methodology
To explore the connection between geographic entities, here we construct a gravity model as follows (Equation 1 where Pi and Pj are the "conceptual sizes" (relative importance) of geographic entities i and j, Dij represents the distance separating the geographic centroids of i and j, and Iij denotes the interaction/connection between i and j (Austin 1963). β (distance friction coefficient) represents the role of distance. Here we construct four gravity models to investigate the role of the friction of distance in determining the spatial connections in the aforementioned dataset. The specific parameters of the four models are defined as follows:

Model 1 -GDELT (country-level):
Iij -The frequency of "co-occurrence" of countries i and j in news records.
Pi -The total occurrence of country i in news records.
Pj -The total occurrence of country j in news records.

Model 2 -GDELT (provincial-level):
Iij -The frequency of "co-occurrence" of provinces i and j in news records.
Pi -The total occurrence of province i in news records.
Pj -The total occurrence of province j in news records.

Model 3 -Flickr (country-level):
Iij -The number of unique users who have uploaded images in both i and j.
Pi -The number of unique users who have uploaded images in country i.
Pj -The number of unique users who have uploaded images in country j.

Model 4 -Weibo (provincial-level):
Iij -The number of unique users who have checked-in at their locations in both provinces i and j.
Pi -The number of unique users who have checked-in at their locations in province i.
Pj -The number of unique users who have checked-in at their locations in province j.
Based on the preceding definitions, we calculated the best fit for coefficient β for the four models. The parameters were fitted through a Poisson regression. Because the correlation coefficient of regression models (R 2 ) is scalefree, the constant K does not affect our models. Table 2 shows the fitted β values for both datasets. The larger β is, the stronger the distance decay effect is for that particular dataset. As can be seen, the mass media data from GDELT and the crowed-sourced LBSM data demonstrate an opposite pattern. For GDELT data, news reports show a stronger distance decay effect at the country scale than at the provincial scale. This is potentially because national news is more influenced by distance than domestic news is in China. For social media data, countrylevel analysis shows a much weaker spatial decay effect than provincial level analysis. It is possible that when traveling internationally, users are less bounded by distance and focus more on the popularity of a destination.
In previous studies, researchers extracted the β value for different types of movements and interactions: 0.2 for Chinese province name co-occurrences on the web (Liu et al. 2014b), 1.59 for bank note circulation (Brockmann and Theis 2008), and 1.75 for human mobility extracted from mobile phone records (Gonzalez, Hidalgo, and Barabasi 2008). There is a general consensus that geographic entities experience a weaker distance delay in the virtual space (e.g., on social media) than in the physical environment. This study provides a unique perspective to investigate this problem and demonstrates that a unified answer may not be sufficient when exploring spatial delay in the information age, and it is necessary to consider the impact of other factors, such as scale, on spatial connections.

Conclusion
The development of information technologies has introduced exciting changes and new challenges in various research fields. In this study, we examined the impact of spatial scale on distance decay effects. The results confirmed that spatial scale can impact the magnitude of spatial decay effects in datasets with different characteristics. The consensus in previous studies is that the spatial decay effect is weaker in the virtual space than in the physical space; however, this study shows that the conclusion may not be as simple as it seems and may worth a deeper investigation. There are many factors that play into how distance impacts a certain type of spatial connection, and spatial scale is a factor that should not be overlooked. In the next step of this analysis, we will extend the analysis to other countries and include more datasets to test the robustness of the results. Future studies also can look into how various demographic variables, such as income, may impact the magnitude of spatial connections.