Analysis of Tourists’ Image of Seoul with Geotagged Photos using Convolutional Neural Networks

In this study we aim to analyze the urban image of Seoul that tourists feel through the photos uploaded on Flickr, which is one of Social Network Service (SNS) platforms that people can share Geo-tagged photos. We first categorize the photos uploaded on the site by tourists and then performed the image mining by utilizing Convolutional Neural Network (CNN), which is one of the artificial neural networks with deep learning capability. In this study we are able to find out that tourists are interested in old palaces, historical monuments, stores, food, etc. in which are considered to be the signatured sightseeing elements in Seoul. Those key elements are differentiated from the major sightseeing attractions within Seoul. The purpose of this study is two folds: First, we analyze the image of Seoul by applying the technology of image mining with the photos uploaded on Flickr by tourists. Second, we draw some significant sightseeing factors by region of attraction where tourists prefer to visit within Seoul.


Introduction
Today people prefer to share the posts such as texts, images, and videos via Social Network Services (SNS) without regard to time and location. Moreover, the geotagged photos uploaded on the site by tourist surface on the perception and the action of tourists as well as display the images that tourists feel about the sightseeing attractions (Donaire et al., 2014). As the images of touristic sites are closely associated with the tourists' attraction and intention, they serve as a reference for other tourists who seek to travel to those sites (Park et al., 2012). In addition, as the process of sharing the images on SNS consists of continually producing and reproducing touristic images, we are able to ascertain perceptions and trends of representative sightseeing elements and locations by analyzing the images uploaded on SNS. Furthermore, this process contributes to the basic research on tourism in relation to discovering, developing, and improving sightseeing attractions (Saito et al, 2018). We think that it is possible for us to analyze broader scope with more extracted information in tandem with preexisting methodologies of spatial data analysis because geo-tagged photos contain locational information. Especially we can make better use of Flickr data as they contain information on location and time, which are automatically affiliated with photo metadata. However, previous studies which have utilized geo-tagged data on SNS have mostly explored the location that users occupied (Kádár, 2014), patterns of movement (Yuan and Medel, 2016;Zheng et al., 2012), and analysis through uploaded texts (Jang and Cho, 2016;Hong and Shin, 2016;Kagaya and Aizawa, 2015;Kaneko and Yanai, 2013;Kisilevich et al., 2013;Okuyama and Yanai, 2013). On the other hand, the studies which have utilized the photos uploaded on the site by tourists is really rare. This study aims to track down representative images and elements of sightseeing attractions by analyzing the photos uploaded on Flickr by Seoul tourists by applying the technique of image mining based on deep learning. In Part 2 we review the researches related to the image data mining. In Part 3 we discuss the collection of Flickr data, differentiation of tourists, extraction of important touristic locations, and methodologies of image data mining. In Part 4 we compare the tourists' image of Seoul with the image of important touristic locations by applying the methodologies described in Part 3. In Part5 we summarize the study results and enumerate future tasks. For our analysis we apply Python version 3.6 and Tensorflow, open source machine learning library.

Research on Image Data Mining via Convolutional Neural Network (CNN)
Image data mining is the process of extracting information or knowledge from image data (Deepak et al., 2012). Recently, with the increase in the volume of image data as well as the improvement of training algorithm, techniques of image data mining using artificial neural networks have been applied to various fields such as medicine, environmental studies, information science, and computer graphics (Géron, 2017). Convolutional Neural Network (CNN) which is one of aritificial neural networks has been developed based on neurological knowledge surrounding the visual cortex of humans and animals (Ciresan et al., 2011). As CNN has been shown to be effective in distinguishing and categorizing the photo images, it has become a trend to make use of it in most image data mining research. CNN is basically composed of three layers such as a convolutional layer, a pooling layer, and a fully connected layer. One can not only produce a variety of models by changing the CNN configurations, but also train the CNN through the scan of the image characteristics. The studies that have executed image data mining using the images on SNS are as follows: Kaneko and Yanai (2013) researched to track down event photos such as festivals, sports game, earthquake and fires by analyzing geo-tagged photos on Tweeter. Okuyama and Yanai (2013) selected representative images of designated locations after extracting the locations from photos on Flickr where tourists visit. These studies have applied Speeded-Up Robust Features (SURF) technique out of various image data mining techniques. On the other hand, CNN has come to be used as an image mining method. Jang and Cho (2016) have proposed a method of extracting tags automatically from the images posted on Instagram. Hong and Shin (2016) have proposed a method of recommending followers (information providers) by extracting the categories with the huge number of images uploaded after categorizing the images posted by Instagram users. Kagaya and Aizawa (2015) distinguished the images that actually contained food from those that did not among the populated photos when searching "#food" on Instagram. In addition, CNN method has also been utilized in the field of medicine in order to categorize the images produced. Krishnan et al. (2018)

Data collection and extraction of important touristic locations
In this study we use a total of 86,304 data uploaded on Flickr via open API, which encompasses specific spatiotemporal parameters of Seoul consisting of latitude of 37.4°~ 37.8°, and longitude of 126.8° ~ 127.2° between January 1, 2015 and December 31, 2017. The number of users is 1,974 among the total of 86,304 data on Flickr. We divided the 1974 users into 868 users who had specified their place of residence and 1,106 users who either had not specified their place of residence, Republic of Korea, or did not provide enough evidence to locate an exact place of residence in order to distinguish tourists from others. We then further reduced the 868 users to 689 tourists, after eliminating 179 users who had specified their place of residence as Seoul. Those for whom we could not discern whether they resided in Seoul (1,106 users) were categorized as residents of Seoul if the first and the last photo posted throughout the duration of the study exceeded 30 days; using this procedure we concluded that 319 were residents of Seoul and 787 were tourists. A total of 1,476 users were categorized as tourists after sorting out the 689 users who had input their place of residence and 787 users who had not input their residence. Finally, we analyzed the image of Seoul based a total of 39,157 data on Flickr uploaded by a total of 1,476 tourist users. We then extracted 11 Regions of Attraction (RoA) from the 39,157 Flickr data uploaded by tourists through the use of Density Based Spatial Clustering of Application with Noise (DBSCAN) algorithm (Kim et al., 2018). Information on each RoA is shown below in Table 1 and Figure 1. Figure 2 illustrates analytical method and procedure of this study.

Image data collection, pre-processing, and image data mining
We performed data mining with a total of 38,691 photos after eliminating 465 images that were deleted from the 39,156 images posted by 1,476 tourists. We applied Inception v3 model of Google Net, which is one of various CNN models, for the photo data mining. Inception v3 is a model "trained" with ImageNet's image data set, which comprises of 14,197,122 images divided into 1,000 categories. The images in ImageNet are divided into 27 primary categories and 1,000 secondary categories. In case of categorizing images with the Inception v3 model, the model generates the category name that most resembles with the input image among 1000 categories and its accuracy value. In addition to GoogleNet, there are also LeNet-5, AlexNet, and ResNet, which are various variations of convolutional neural networks. The Inception module, a subnetwork included with GoogleNet, has a deep structure and makes GoogleNet use parameters more effectively than other models (Géron, 2017). Among the various models of GoogleNet using the Inception module, Inception v3 models are not only low-error rates, but also source code is widely available. As Inception v3 model uses TensorFlow to operate, it is necessary to pre-process the photos into appropriate formats before analyzing photo data. As data crawled from Flickr's API are in the format of image URL, we downloaded them in BMP format and then converted them into size of 299 * 299 RGB, which can be used in the Inception v3 model. It is not easy to derive the meaning by comparing 1,000 categories when each image is categorized into one of 1000 categories by applying the Inception v3 model in TensorFlow. Moreover, the 27 primary categories in ImageNet were also not easily applicable to the category--tourism. Given these constraints, we generated 14 new categories that were suitable for the field of tourism based on the values resulting from the categorization of the 38,691 images. Basically 14 categories have been created by referring the categories of major activities on the survey of the current state of foreign tourists conducted by the Korea Tourism Organization in 2017. These categories are as follows: "food," "entertainment," "shopping," "transportation," "cityscape," "facilities," "residence," "natural views/flora and fauna," "people," "religion," "clothing," "palace/historical monuments/cultural properties," "objects/miscellaneous," and "exhibits/sculptures" (

Images of Seoul as seen by tourists in Seoul
As a result of categorizing the 38,691 photos uploaded by Seoul tourists, we were able to produce 858 of 1,000 categories using ImageNet. The categories that had a proportion of 1% or above among 858 categories are shown in Figure 3. When looking the category into details, there are usually images of front gate for "palace", roof tiles for "bell cote" and "tile roof", and interior gardens for "patio, terrace". Like this, we can get an idea of the representative images of palaces that tourists have in mind when visiting Seoul. In the category of food, "plate" includes traditional Korean cuisine, sashimi, and pasta, "restaurant" does barbeque house, café and inner interiors, "food market" does the images of markets such as supermarkets, street markets, traditional market and street food, "hot pot" does the images of soup and "menu" does menu list. The "toyshop" contain the images of not only actual toy stores but also of objects including certain characters and interiors of various shops, such as variety stores and hardware stores. The "movie theatre" includes the images of shop exteriors such as clothing stores and restaurant. The "stage" includes the images of building interiors and those that emphasize equipment. Both "taxi" and "traffic light" include the images of streets, cars parked along the road, and neon signage decorating the outside of buildings. The "prison" and "monastery" include the images of crowded residential areas, museums, and the like. "Lakeside," on the other hand, includes images of natural views with not only lakes or rivers but also trees or sky, while "pier" does the images of rivers, streams, ponds, college campus, and so on. To sum it all up, we can deduce that tourists have a perception of Seoul that consists of palaces, food, buildings, and facilities. Because the ImageNet dataset used in the training of the Inception v3 model was not collected for Seoul tourist photo analysis, the actual classification accuracy could not be confirmed through the accuracy value returned by the model. In order to check the accuracy of the classification, the classification accuracy of each category was calculated by directly checking the photographs belonging to the category with the picture ratio of 1% or more. The results are shown at the bottom of each category photo in Figure 3. In the categories with a classification accuracy of more than 80%, there are 'plate', 'bell cote', 'palace', 'terrace' and 'hot pot'. In the categories with a classification accuracy of less than 20%, there are 'movie theater', 'prison', 'monastery', 'taxi', and 'toyshop' Figure 4 shows an example of misclassification for each category. The 'palace', which had a relatively high classification accuracy, includes buildings classified as European style such as the War memorial hall, city hall, and Seoul station. 'Bell cote' includes a bell tower shaped building or a building with a view looking up from below. In the case of 'prison', where the classification accuracy was low, the photographs posted by the tourists on Flickr are photos of low-rise multi-family houses with many windows, which are classified as 'prison' categories, judged to be similar to prison photos belonging to ImageNet's training data. In addition, although a Flickr image showed the exterior of a building, those image would be categorized as "movie theatre". The image was probably categorized this way because the images of movie theatre in ImageNet's training data mostly consist of exteriors of buildings, although it is rarely the case in Korea that a movie theatre occupies one part of an entire building. In addition, although the monastery is rarely seen in Seoul, Ewha Womans University buildings and photographs of buildings blended with trees are classified as 'monastery'. Given these factors, there appears to be a need to produce categories and a data set by considering the characteristics of relevant tourist attractions and locations.   Table 3 shows the results of assigning 1,000 categories to 14 primary categories for analysis by subjects. Figure 5 shows the results of further extracting the top five primary categories and examining their secondary categories. We can see that tourists who come to Seoul are generally interested in palaces, historical monuments, cultural properties, objects, food, facilities, natural views, and flora and fauna. More specifically, when looking into the category of "palace/historical monuments/cultural properties", "palace," and "bell cote," contain the images of palaces, tile-roofed houses, and Korean-style houses, "patio and terrace" contain the images of courtyards, and "tile roof" contains the images of rafters. From this we can deduce that a considerable number of tourists seem to consider palaces and traditional houses as representative images that can be seen in Seoul. "Umbrella" which belongs to a subcategory of "objects/miscellaneous" includes the images of not only actual umbrellas but also silhouettes that resemble the shape of an umbrella. Similarly, while there are some images of food on tray for the category of "tray," there are mostly images of objects that resemble a tray. And there are mostly images of historical monuments and exhibits in "book jacket." As mentioned before, this is probably due to the lack of adequate categories to properly categorize the images taken by tourists. "Plate" which belongs to a subcategory of "food" has numerous images of food such as traditional Korean cuisine and sashimi, and there are mostly images of restaurants and coffee shops in "restaurant." There are images of big supermarkets and traditional street markets for "food market," and images of food such as rice cake in hot sauce, soups, and teppanyaki for "hot pot". We can interpret this findings as indicative of how iconic dishes of Seoul are only available in Korea. "Pier" which belongs to a subcategory of "facilities" contains the images of Cheonggye Stream, the ECC building of Ewha Women's University, "planetarium" does the images of landmarks such as Dongdaemun Design Plaza while a subcategory of "natural views/flora and fauna" contains the images mostly of sky, the Han River, and mountains.

Comparison of image by RoA
We categorized the photos into 11 RoA in Seoul to compare their different characteristics. Table 4 shows the number of photos and proportions included in the photos of 11 RoA. There are 20,987 photos including Jongro and Namsan, which make up 54.2% of all the photos, and there are 2,584 photos of Shinchon and Hongdae, which make up 6.7%. Uploaded photos of other locations were generally similar in number. Figure 6 and 7show the results of dividing the photos of RoA into 1,000 categories. The photos of Jongro and Namsan were of specific elements such as palace facades, palace gates, walls, and other structures, while the photos of War Memorial and National Museum of Korea included various kinds of cultural properties and historical monuments. For Shinchon, Hongdae, and Itaewon, there are many photos that emphasize not only food itself but also the interiors of restaurants and other shops, especially for Itaewon, where there are many photos of alcohol, such as beers and cocktails. The photos of Samsung Station, Bongeunsa Temple, Coex Mall, Jamsil, Gangnam Station, Apgujeong, and Garosu-gil include various stores and sculptures. More specifically, there were photos of temples for Samsung Station, Bongeunsa Temple, and Coex Mall, ponds and amusement parks for Jamsil, urban scape for Gangnam Station, food for Garosu-gil and Apgujeong. Meanwhile, photos of Yeouido appear to include not only food and restaurants but also Han River. Figure 8 shows the results of assigning 1,000 categories to 14 primary categories for every RoA. We can see that tourists who visit Jongro, Namsan, War Memorial of Korea, and National Museum of Korea usually think of "palace/historical monuments/cultural properties," "facilities," and "objects/miscellaneous." As the images for National Museum of Korea categorized as "objects/miscellaneous" are mostly of historical monuments or cultural properties, we can see that tourists who visit the RoA have the images of palaces, historical monuments, and cultural properties in common. Meanwhile, tourists who visit Shinchon, Hongdaw, Itaewon, Gangnam Station, Garosu-gil, and Apgujeong have the images of "food", those who visit Samsung Station, Bongeunsa Temple, Coex Mall, Jamsil, and Yeouido have the images of "facilities", and those who visit Garosu-gil, Jamsil, Gangnam Station, Itaewon, Shinchon, Hongdae, and Apgujeong have the images of "shopping". While the images of Gangnam Station are related to "urban scape," the images of Jongro, Namsan, Samsung Station, Bongeunsa Temple, Coex Mall, and Yeouido are related to "natural views/flora and fauna." Figure 9 shows a map with the 14 primary categories and representative photos of all RoA.

Summary and Conclusion
In this study we aim to analyze the tourists' images of Seoul by making use of the photos uploaded by visitors on Flickr which is one SNS platforms, from January 1, 2015 until December 31, 2017. We were able to find out that tourists have a strong image of palaces and historical monuments and then the iconic cuisine of Seoul (food, restaurants, etc.) by analyzing the photos uploaded by tourists. These characteristics also differed from one RoA to another. The images that tourists feel about Jongro and Namsan are palace and cultural properties, while the images of Shinchon, Hongdae, Itaewon, Yeouido, Garosu-gil, and Apgujeong are food and restaurants. As for War Memorial of Korea and National Museum of Korea, there were many images of monuments that could be photographed on site as well as the images of artifacts that were on display in the museum. Moreover, there were a combination of images of facilities, temples, and cultural properties around Samsung Station, and the images of toyshops around Jamsil. Through this study we were able to verify which images tourists had of Seoul and its various RoA. However, we were also able to ascertain a research topic that must be improved upon in the future. On the other hand, we recognized that we had a limitation to apply the ImageNet's data set in Korea because it was developed abroad. It was not possible to accurately categorize certain iconic landmarks of Korea (e.g., Namsan Tower, Dongdaemun Design Plaza, etc.) or traditional elements that are not widely known (e.g., lamplight, Hanbok, etc.) because of the lack of suitable categories for these images. Photographs related to palaces and Hanok villages were also scattered in categories such as 'Palace', 'bell cote' and 'terrace'. Moreover, in this study we analyzed the images of Seoul uploaded during a three-year period and discovered that the number of users was severely limited while there were many images available for the study. This is why the number of images within a specific category may be overestimated when one user uploads multiple similar images. Given these factors, there is a need to train the programs with separate data sets based on the images uploaded by visitors of Seoul for future study. Further, there is also a need to prevent the overestimation of the number of images included in a specific category by eliminating redundant images uploaded by the same users.