Georeferencing skills and student profile. Results from a multivariate regression model

The currently IT tools provides increased opportunities to organize professional and recreational activities by interactive maps easily accessible for users. An experiment was conducted to understand if the great potential offered by new technologies match with the ability to produce good quality spatial data by users. The goal was to assess whether their knowledge of GIS affected the quality of their mapping activity by online map-based survey with maps as part of on line questionnaire. The attention was paid to university students as target of potential users of these tools, considering different skills acquired during their studies: from theoretical courses of geography, to theoretical and practical courses (dedicated labs) on GIS. The experiment involved more than 200 students of the University of Udine during the academic year 2019-2020. In this framework, a further study was developed investigating factors playing a role in students’ ability to complete the proposed exercises. The analysis was based on a multiple regression model which assumes the number of exercises completed as a dependent variable, and the student profile characteristics (gender, type of student, knowledge of GIS, and other IT skills) as independent variables. The estimated models pointed out both students’ willingness and deep knowledge of GIS as main factors effected the students’ ability to complete the proposed exercises. Fewer effects were associated to their gender and residence.


Introduction
In the framework of studies aimed to support the land and landscape analysis in favour of participatory planning (Brown and Kyttä, 2014;Guaran and Pascolini, 2019;Zaccomer, 2019), researches regarding landscapes of risk and degradation of Friuli Venezia Giulia region (North East Italy) have been launched at the University of Udine. The main tools were online questionnaires and interactive mapping (see an example in Figure 1) (Amaduzzi et al., 2019;Bressan and Amaduzzi, 2020;Bressan, Guaran, Zaccomer, 2021). The focus was both the potential of Volunteered Geographic Information (Goodchild, 2007;Borruso, 2010;Yan et al., 2020) and the characteristics of the data collected (Fonte et al., 2017, Bressan, Zaccomer andGrassetti, 2020). In particular, the interest is on intrinsic quality of data (Bressan, 2021) and ability of users to correctly recognize and geolocate specific places (assigning locations to geographical objects) by online interactive maps. As part of this researches and following the ideas of Poplin (2015), an interactive map-based survey experiment was developed 1 . In this experiment participants were asked to map specific places in the Friuli Venezia Giulia (FVG) region with the aim to study their ability and accuracy in exercises resolution . The experiment involved 217 students of the University of Udine during the academic year 2019-20 and allowed: i) to detect the student profile (demographic characteristics, geographical knowledge of the FVG region, computer and GIS skills); ii) to measure the completeness and accuracy of spatial data location. In this paper we present a further development of that study aimed to investigate factors playing a role in students' ability to complete the proposed exercises. Our hypothesis is that students' disposition to complete the exercises is influenced by factors related both to willingness to do the task and to previous knowledge of tools and technology. We believe that knowledge of Geographic Information Systems (GIS) could play an important role. This because tools based on interactive maps are often characterized by operational difficulties that could be easily overcome by a basic knowledge of GIS, as usually provided by university courses. This contribution therefore represents a first attempt to investigate the ability of students to complete some exercises that involve the geolocation of places through interactive maps.

Questionnaire and sample
The online questionnaire was composed by two sections: the first containing questions on personal data, university studies, geographical knowledge of the FVG region, computer skills, specific GIS knowledge and, finally, on students' opinions and use of georeferencing options on social networks. The second section contained the interactive maps and georeferencing exercises aimed to geolocate five places. The first three places were indicated by the researchers (Miramare Castle, near Trieste, the Tagliamento River and the Marano-Grado Lagoon). The last two exercises were presented as a user's choice, indicating places of decay or beauty of FVG region. The user interface was similar to the one in Figure 1. In Zaccomer and Bressan (2020) results of the first three exercises was assessed. In this step of the work we want to investigate which factors may influence the ability to complete the exercises since some students, after completing the first part of the online questionnaire, did not consider the second section with the georeferencing exercises. The sample was composed by students attending six courses at University of Udine (without students in common). Students selected were characterized by three different levels of exposure to GIS issues: Lev. I) no specific knowledge; Lev. II) only theoretical knowledge; Lev. III) both theoretical knowledge and practical skills developed in GIS labs (licensed and Open Source GIS software). The data collected were analysed by a statistical software package 2 . The univariate descriptive analysis (for more details see Zaccomer and Bressan, 2020), highlighted a prevalence of both female (64.5%) and students living in the FVG region (78.8%) as well as the frequency distribution of students in the three considered levels of GIS knowledge: Lev. I 29.5%, Lev. II 45.6% and Lev. III 24.9%. Due to the Covid-19 pandemic, the first semester courses were held in person, while the second semester courses were held online. To ensure as much homogeneity of conditions as possible, the experiments were preceded by the same presentation and conducted by the same researcher with the same online questionnaire. The compilation of 46.1% of the questionnaires was carried out in the classroom, while the remaining 53.9% from home.

Dataset
The data collected were entered into a dataset and analysed using a Multivariate Regression Analysis. Qualitative variables were transformed into dummy variables. This solution is common in econometric estimation to introduce qualitative variables into the regression model (Saber and Lee, 2003 To take into account the difference in operational conditions due to the Covid-19 restrictions (lessons in presence in classroom or online), the variable X1 was introduced in the dataset. The dummy variables from X2 to X4 refer to the demographic profile of the student, while the variable X5 relates to the geographical knowledge of the FVG region. This last was measured through the number of municipalities known by the students (excluding the municipality in which they live).
Variables from X6 to X15 are aimed to survey both IT knowledge and specific skills on GIS software. X6 and X7 regarding students' experiences of university or extrauniversity computer courses, X8 and X9 are numerical variables that survey how many hours a day the interviewees spend in front of the PC, smartphone or internet. The dummy variables from X10 to X13 investigate specific computer skills such as being a user of any GIS software, having experiences in website development, managing a blog or participating in other collaborative experiences with free content online. Information on these skills were summarized in numerical form as the sum of the related variables by variables X14 (skills in the use of GIS 3 ) and X15 (other computer skills). It should be noted that X14 is a finer measure than the dummy variable X10.
The dummy variables from X16 to X18 relate to the behaviour and opinions of students concerning the usefulness of georeferencing their photographs and their position on social media. Finally, the knowledge of GIS issues was identified by the dummy variables, from X19 to X21, while the last variable of the dataset, X22, express the number of free exercises done by students. This latter variable is assumed as a proxy index of the interest and so of the student's willingness to participate in the experiment. It should be noted that the students did not decide to voluntarily join the experiment, which was proposed during the lessons of the university courses. In the analysis we assume that students who did not complete the second part of questionnaire did not want to tackle the more challenging aspects of mapping. On the other hand, those who completed all the exercises had no problem mapping both the three places proposed by the researchers, and other places of decay or beauty they knew.

Descriptive analysis of independent variable
The description of the independent variable starts from the analysis of its composition: it has a first component represented by the three "guided exercises", where the 3 X14 was obtained as sum of four different collected dummy variables (not listed in Table 1 The 23.5% of involved students did not complete even one exercise (Table 2). If the average of carried out exercises is calculated considering the 166 students who have completed at least one exercise, the success rate rises to 2.25, showing that those who have decided to complete the assigned task have largely done it completely. In fact, 58% of involved students completed at least 2 out of 3 exercises.
In addition to the three exercises requested by the researchers, there was also the possibility to indicate on the interactive map at most two other positions linked to places of beauty or decay. More than two out of every three students (68.7%) finalized this task as reported in Table 3. It is interesting to note that only 17 students mapping this places despite not being able to identify the first three places indicated by the researchers. For this small group of students, we can assume that they were interested, but they failed to identify exactly the required places or there was a greater interest in indicating the places of landscape decay and beauty in FVG region.

Correlation analysis
Because of the large number of variables, it was needed to verify the existence of multicollinearity among them before the application of regression model to avoid instability in the estimated parameters. So the correlation matrix was studied considering both quantitative and dummy variables composing the dataset. The bivariate Bravais-Pearson correlation was applied to understand the existence of relationship and its direction. For the comparison of two dummy variables also other measures related to contingency tables were applied (e.g. chisquare, Kendall's tau-b and tau-c, Somers' d).
The correlation matrix was constructed taking into account the independent variable in order to have first evidence of which dependent variables are more related to it. Considering the significant correlations at the 5% level (two-tailed), the total number of completed exercises (including voluntary ones) resulted directly correlated to the questionnaire filling mode X1 ("in classroom", 0.226), to the gender X2 ("male", 0.180), to the third level of GIS knowledge and practical skills X21 (0.172), to the number of free exercises (0.710) and to the geographic knowledge of FVG region X5 (0.173). Surely the above variables will play a primary role in the statistical model definition.
From the independent variables analysis some relationships linked with multicollinearity are highlighted. First of all, it is not surprising, that the geographic knowledge of FVG region X5 is directly related to the condition to be resident of FVG region (0.270). Furthermore, the variable X1 (questionnaire filling mode "in classroom") resulted related with to be a GIS user X6 (0.501) and with the number of GIS skills X14 (0.455), and with the third level of GIS knowledge and practical skills X21 (0.623). Also the overall computer skills X15 are related with the individual skills such as having built an Internet site X11 (0.895), managing a blog X12 (0.590) and, finally, participating in collaborative network activities X13 (0.383). Different were the results obtained from the analysis of the variables concerning respectively the number of hours a day spent in front of a computer or smartphone X8 and the hours a day spent surfing the Internet X9. The empirical evidence shows how the answers to these two questions are very closely linked each other (0.720), but also with the experience of having developed Internet sites X11 (respectively, 0.302 and 0.307) and with computer skills in general X15 (0.250 and 0.290), but not with the skills in GIS technology X14. In fact, from one side these variables should give an idea of the students' aptitude to use modern digital tools, but from the other side it is reasonable consider that they can be linked with the recreational activities. In this last case we are not measuring technical skills, but something linked with leisure. Because of this mixed relationship in the case of further development of the research, these questions will have to be differentiating considering the use of IT technologies for fun, for study or for work. Finally, the variables related with behaviour and opinions of students on the use of georeferencing in social networks are not correlated with the other dependent variables. Anyway, a correlation between them emerges, in particular between the georeferencing of photographs X16 and geographic position on social media X17 (0.215) and between the latter variable and opinion of those who consider it useful to share own online X18 (0.166). The study of the correlation matrix confirms the existence of multicollinearity among the dependent variables, so it is necessary to select the variables in order to estimate a model without instability problems in the parameters estimation.

Results and discussion
As well known, a multiple linear regression model allows to understand the role played by a set of independent variables (X1, X2, …, XK) on a single independent variable Y (Saber and Lee, 2003). The general formula of the constant model can be written as: where k, k = 0…K, are the parameters to be estimated (constant included) using the least squares method; while  is the stochastic component which makes (1) a statistical model. As previously mentioned, the existence of multicollinearity imply the variables selection. A stepwise selection was used, i.e. a progressive ascending selection where, step by step, the role of the previously discarded dependent variables is re-examined (IBM, 2017). This procedure does not provide the best model, but a possible model considered the best one from the stop rule of the used procedure. The model is valid when it passes both the overall test based on Fisher-Snedecor's F and the battery of Student's t tests on each parameter of the model. Finally, the residuals of the model should also be characterized by a random error distribution. From the stepwise selection was obtained a model characterized, in addition to the constant, by four regressors as: X2, the gender variable (dummy variable where the unit value indicates males), X4, which identifies students that reside in FVG region, X21, which identifies the highest level of the GIS knowledge and skills, and, finally, the variable X22 which measures the number of exercises voluntarily performed by students. Formally, the model can therefore be written as: (2) About the characteristics of the estimated model, it was found that at the fourth iteration of the procedure it presents an R = 0.745, an R 2 = 0.555 and an adjusted R 2 = 0.546. The Analysis of Variance (ANOVA) for the regression highlighted a value of F = 58,360 with a sig F null. This means that the model is statistically valid only as a whole. Since the model is multivariate, this condition is not sufficient, so also each estimated parameter was tested. Results are reported in Table 5 Table 5. Estimated coefficients of the model (2) Finally, the Durbin-Watson test was applied to exclude autocorrelation of first order for residuals. The obtained value was 1.888 which, compared to the empirical values of the Savin-White (1997) table for models with an intercept 4 , allows to exclude the presence of autocorrelation for the first order residuals. However, statistics of standardized residuals with mean zero and variance unity highlighted the asymmetry value quite good (0.045), while kurtosis resulted quite high in absolute value (-1.004). So, the texts of Kolmogorov-Smirnov and Shapiro-Wilk were used to highlight the problems of normality of the residues. To complete the data processing, a third model was calculated. The best model with only two regressors, considering only the four variables of the model (2), was devised. The estimate model is as follows: The model (3) Table 6. Estimated coefficients of the model (3) The regression residuals of the model (3) are quite similar to those of model (2) based on the stepwise selection. The model (2) estimated by the stepwise selection confirms the starting hypothesis of the study, since both the elements composing the hypothesis are included. In fact, both the number of voluntary mapping exercises carried out X22 and the higher level (Lev. III) of GIS knowledge characterized by theoretical and practical skills X21 are represented. Also a demographic variable, with a positive value for male students (X2), and the residence of students in the FVG region are included (X4). Inclusion of the two last variables in the estimated model is not surprising as: 1) the presence of male students is generally more marked in scientific faculties at the University of Udine, where courses offered give a greater exposure on GIS technology; 2) as seen in the correlations analysis, residence is linked with a greater geographical knowledge of the region. If resident students report as well-known about 5.5 municipalities, in addition to that in which they are resident, non-resident students report no knowledge of territory outside the headquarter of university. The model obtained through stepwise selection explains 55.5% of the total variability, but using only the first two variables 53.4% of the results are already explained.

Conclusion
The obtained results confirm the research hypothesis highlighting that ability of students to geolocate places through interactive maps is mainly influenced by factors related both to willingness to do the task and to previous knowledge of GIS technology. However, we emphasize that this is only a first and exploring multivariate study. More generally, the interactive map-based experiment reported should be considered as a first exploratory step, aimed at detecting early evidence in order to develop more solid research in the future. For further development of research will be necessary to improve the questionnaire by adding specific questions aimed at deeper exploring the interest of students in this kind of technology and tools. It will also be necessary to expand the number of students and selected courses, in order to make the sample even more representative of the entire student community. A weakness of this step of research is given by the Covid-19 pandemic restrictions, which did not allow equal operational conditions. This will be improved in future experiments. Finally, future research developments should regard more refined statistical models that will allow to improve both the percentage of variability explained and the normality of the residues.

Acknowledgements
This contribution is a continuation of the work done for the departmental project called PaRiDe "Landscapes of risk and degradation: from perception to representation and territorialisation. Interdisciplinary knowledge and awareness in support of territorial government policies" and the subsequent project called PaRiDe2 "Using volunteered geographic information as a tool for mapping and analysing the landscapes of degradation and risk in the Friuli Venezia Giulia region", both conducted at the University of Udine, Italy. We would like to thank our colleagues Mauro Pascolini, Andrea Guaran, Salvatore Amaduzzi and especially Giorgia Bressan.