The evaluation of (big) data integration methods in tourism Cover Image

The evaluation of (big) data integration methods in tourism
The evaluation of (big) data integration methods in tourism

Author(s): Marek Cierpiał-Wolan, Galja Stateva
Subject(s): Tourism
Published by: Główny Urząd Statystyczny
Keywords: data integration methods; tourism survey frame; web scraping

Summary/Abstract: In view of many dynamic changes taking place in the modern world due to the COVID-19 pandemic, the migration crisis, armed conflicts, and other, it is a major challenge for official statistics to provide high-quality information, which should be available almost in real time. In this context, the integration of data from multiple sources, in particular big data, is a prerequisite. The main aim of the study discussed in the article is to characterise and evaluate the following selected methods of data integration in tourism statistics: Natural Language Processing (NLP), machine learning algorithm, i.e. K-Nearest Neighbours (K-NN) using TF-IDF and N-gram techniques, and Fuzzy Matching, belonging to the group of probabilistic methods. In tourism surveys, data acquired using web scraping deserve special attention. For this reason, the analysed methods were used to combine data from booking portals (Booking.com, Hotels.com and Airbnb.com) with a tourism survey frame. The study is based on data regarding Poland and Bulgaria, downloaded between April and July 2023. An attempt was also made to answer the question of how the data obtained from web scraping of tourism portals improved the quality of the frame. The study showed that Fuzzy Matching based on the Levenshtein algorithm combined with Vincenty’s formula was the most effective among all the tested methods. In addition, as a result of data integration, it was possible to significantly improve the quality of the tourism survey frame in 2023 (an increase was observed in the number of new accommodation establishments in Poland by 1.1% and in Bulgaria by 1.4%).

  • Issue Year: 69/2023
  • Issue No: 12
  • Page Range: 25-28
  • Page Count: 24
  • Language: English
Toggle Accessibility Mode