The present paper provides a basic analysis of selected historical data on Thailand. The dataset under analysis was created using the data from World Bank, an open database on historical data about the country’s metrics on economic performance, land use, climate change, and population change education. The present paper selected three indicators for quantitative analysis, including GDP, total population, and the number of arrivals of international tourists. Since Thailand’s economy is known to largely depend on tourism, it was assumed that there would be significant correlations between GDP and the number of international tourists. Additionally, it was expected that GDP was highly dependable on the total population of the country. Thus, the primary aim of the present paper was to test if there were significant correlations between different pairs of variables using Stata software. The paper utilizes data from the past ten years (2010-2019) to assess the assumptions.
Dataset Compilation and Description
In order to create the final dataset for analysis, the data was downloaded from World Bank (2021). Data on three variables were downloaded and compiled into a single Excel spreadsheet. All the data was cleared to convert into a CSV file for further use in Stata software. All the data on variables before 2010 was deleted to minimize the size of the dataset.
The final dataset included four variables: year, total population, GDP, and the number of international tourists. The total number of observations was 40. The CSV file was imported using the following command: import delimited “C:UserssuriaDesktopThailand Data.csv”, encoding(UTF-8) clear
After the dataset was imported, all the variables were renamed into Y, X1, X2, and X3, correspondingly using the rename operator. At the same time, all the variables were given a comprehendible label and formatted as “long.” Below is the Stata commands used to perform the actions listed above for the variable “year”:
- rename year Y
- recast long Y
- label variable Y “Year of Observation.”
Descriptive statistics are helpful for transforming large amounts of data into meaningful information. Descriptive statistics include measures of central tendencies, such as mean, median, and mode, and measures of dispersion, including variance, standard deviation, range, and skewness. The present paper uses mean, standard deviation, minimum, and maximum to describe the variables. The descriptive statistics were acquired using the summarize operator in Stat. The command used to obtain descriptive statistics is provided below:
- summarize Y X1 X2 X3
Table 1 below demonstrates Stata output after entering the command. The data was copied as a table and formatted in word to
Table 1: Descriptive Statistics.
According to the information provided in Table 1, the mean of Y was 2014.5, the standard deviation was 3.03, the minimum value was 2010, and the maximum value was 2019. The mean of X1 (population) was 68.5 million, the standard deviation was 825,839, the minimum value was 67,2 million, and the maximum value was 69.6 million. This implies that the population of the country increased by 2.4 million between 2010 and 2019. The mean of X2 (GDP) was $426 billion, the standard deviation was $60.9 billion, the minimum was $341 billion, and the maximum was $544 billion, which implies that GDP increased by $203 billion during the past ten years. The mean of X3 (number of international tourist arrivals) was 25.9 million, the standard deviation was 8,170,631, the minimal number of tourists was 14.2 million, and the maximum was 38.2 million. Thus, the number of tourist arrivals between 2010 and 2019 increased by 24 million.
Scatterplots are used to understand if there is a correlation between two variables. If one of the variables is time, scatterplots provide information about the trends in the changes of the variable. In order to understand if there was a pattern in changes of the three selected variables during the past ten years, three scatterplots were created, which included Y values on the x-axis and X1, X2, and X3 on the y-axis.
Figure 1 demonstrates changes in Thailand’s population between 2010 and 2019. As can be observed in the graph, the total population of the country has been growing linearly since 2010 without visible deviation from the pattern. The scatterplots were created using the following commands:
- twoway (scatter X1 Y).twoway (scatter X2 Y). twoway (scatter X3 Y)
After the commands were executed, every graph was edited to include a title.
Figure 2 provides a scatterplot of GDP in USD against time in years. The graph demonstrates that Thailand experienced a significant drop in GDP growth in 2014 and 2015. At present, GDP is growing at a steady pace. However, it should be noted that the present dataset does not include information for 2020, which may become another year of negative economic growth due to the COVID-19 pandemic.
Figure 4 provides a scatterplot of changes in international tourist arrivals during the past five years. The graph demonstrates that the number of international tourists was growing steadily for the past ten years with the only exception of 2015 when it experienced a visible drop.
Correlation analysis helps to understand if there are any inter-relationships between variables. Pearson’s R is a coefficient used to quantify correlations between two variables. The coefficient varies between 1 and -1, where 1 is the perfect positive linear correlation, and -1 is the perfect negative linear correlation. If Pearson’s R is equal to 0, no correlation between variables is present. Table 2 below is a correlation matrix between three X variables.
Table 2. Correlation Matrix
The table was created using the following command:
- correlate X1 X2 X3
According to Table 2, all the variables are strongly correlated with each other. This implies that both total population and number of international tourist arrivals can be used as explanatory variables for changes in Thailand’s GDP. However, it should be noted that the number of tourist arrivals had a greater correlation with GDP (R=0.99) in comparison with the total population (R=0.89). The analysis demonstrates that the correlation between tourism and GDP is almost perfectly linear, which implies that tourism is a significant part of Thailand’s economy.
Statistical analysis of data can be used for different purposes depending on the needs. The present paper demonstrated that descriptive statistics could be used to summarize data, scatterplots help to understand the pattern of relationships between two variables, and correlation analysis quantifies the relationships between variables. The utilization of statistical analysis revealed that income from tourism is a significant part of Thailand’s GDP, which implied that changes in the number of international tourist arrivals would directly affect the country’s GDP.
World Bank (2021). Thailand. Web.