Applying Machine Learning and Geolocation Techniques to Social Media Data (Twitter) to Develop a Resource for Urban Planning

With all the recent attention focused on big data, it is easy to overlook that basic vital statistics remain difficult to obtain in most of the world. This project set out to test whether an openly available dataset (Twitter) could be transformed i...

Full description

Bibliographic Details
Main Authors: Milusheva, Sveta, Marty, Robert, Bedoya, Guadalupe, Williams, Sarah, Resor, Elizabeth, Legovini, Arianna
Format: Working Paper
Language:English
Published: World Bank, Washington, DC 2020
Subjects:
Online Access:http://documents.worldbank.org/curated/en/407261607111342557/Applying-Machine-Learning-and-Geolocation-Techniques-to-Social-Media-Data-Twitter-to-Develop-a-Resource-for-Urban-Planning
http://hdl.handle.net/10986/34910
id okr-10986-34910
recordtype oai_dc
spelling okr-10986-349102022-09-20T00:09:56Z Applying Machine Learning and Geolocation Techniques to Social Media Data (Twitter) to Develop a Resource for Urban Planning Milusheva, Sveta Marty, Robert Bedoya, Guadalupe Williams, Sarah Resor, Elizabeth Legovini, Arianna MACHINE LEARNING BIG DATA URBAN PLANNING ROAD SAFETY SDGs GEOGRAPHIC INFORMATION SYSTEM SOCIAL MEDIA GEOSPATIAL ANALYSIS SPATIAL CLUSTERING With all the recent attention focused on big data, it is easy to overlook that basic vital statistics remain difficult to obtain in most of the world. This project set out to test whether an openly available dataset (Twitter) could be transformed into a resource for urban planning and development. The hypothesis is tested by creating road traffic crash location data, which are scarce in most resource-poor environments but essential for addressing the number one cause of mortality for children over age five and young adults. The research project scraped 874,588 traffic-related tweets in Nairobi, Kenya, applied a machine learning model to capture the occurrence of a crash, and developed an improved geoparsing algorithm to identify its location. The project geolocated 32,991 crash reports in Twitter for 2012-20 and clustered them into 22,872 unique crashes to produce one of the first crash maps for Nairobi. A motorcycle delivery service was dispatched in real-time to verify a subset of crashes, showing 92 percent accuracy. Using a spatial clustering algorithm, portions of the road network (less than 1 percent) were identified where 50 percent of the geolocated crashes occurred. Even with limitations in the representativeness of the data, the results can provide urban planners useful information to target road safety improvements where resources are limited. 2020-12-10T15:01:46Z 2020-12-10T15:01:46Z 2020-12 Working Paper http://documents.worldbank.org/curated/en/407261607111342557/Applying-Machine-Learning-and-Geolocation-Techniques-to-Social-Media-Data-Twitter-to-Develop-a-Resource-for-Urban-Planning http://hdl.handle.net/10986/34910 English Policy Research Working Paper;No. 9488 CC BY 3.0 IGO http://creativecommons.org/licenses/by/3.0/igo World Bank World Bank, Washington, DC Publications & Research Publications & Research :: Policy Research Working Paper Africa Africa Eastern and Southern (AFE) Kenya
repository_type Digital Repository
institution_category Foreign Institution
institution Digital Repositories
building World Bank Open Knowledge Repository
collection World Bank
language English
topic MACHINE LEARNING
BIG DATA
URBAN PLANNING
ROAD SAFETY
SDGs
GEOGRAPHIC INFORMATION SYSTEM
SOCIAL MEDIA
GEOSPATIAL ANALYSIS
SPATIAL CLUSTERING
spellingShingle MACHINE LEARNING
BIG DATA
URBAN PLANNING
ROAD SAFETY
SDGs
GEOGRAPHIC INFORMATION SYSTEM
SOCIAL MEDIA
GEOSPATIAL ANALYSIS
SPATIAL CLUSTERING
Milusheva, Sveta
Marty, Robert
Bedoya, Guadalupe
Williams, Sarah
Resor, Elizabeth
Legovini, Arianna
Applying Machine Learning and Geolocation Techniques to Social Media Data (Twitter) to Develop a Resource for Urban Planning
geographic_facet Africa
Africa Eastern and Southern (AFE)
Kenya
relation Policy Research Working Paper;No. 9488
description With all the recent attention focused on big data, it is easy to overlook that basic vital statistics remain difficult to obtain in most of the world. This project set out to test whether an openly available dataset (Twitter) could be transformed into a resource for urban planning and development. The hypothesis is tested by creating road traffic crash location data, which are scarce in most resource-poor environments but essential for addressing the number one cause of mortality for children over age five and young adults. The research project scraped 874,588 traffic-related tweets in Nairobi, Kenya, applied a machine learning model to capture the occurrence of a crash, and developed an improved geoparsing algorithm to identify its location. The project geolocated 32,991 crash reports in Twitter for 2012-20 and clustered them into 22,872 unique crashes to produce one of the first crash maps for Nairobi. A motorcycle delivery service was dispatched in real-time to verify a subset of crashes, showing 92 percent accuracy. Using a spatial clustering algorithm, portions of the road network (less than 1 percent) were identified where 50 percent of the geolocated crashes occurred. Even with limitations in the representativeness of the data, the results can provide urban planners useful information to target road safety improvements where resources are limited.
format Working Paper
author Milusheva, Sveta
Marty, Robert
Bedoya, Guadalupe
Williams, Sarah
Resor, Elizabeth
Legovini, Arianna
author_facet Milusheva, Sveta
Marty, Robert
Bedoya, Guadalupe
Williams, Sarah
Resor, Elizabeth
Legovini, Arianna
author_sort Milusheva, Sveta
title Applying Machine Learning and Geolocation Techniques to Social Media Data (Twitter) to Develop a Resource for Urban Planning
title_short Applying Machine Learning and Geolocation Techniques to Social Media Data (Twitter) to Develop a Resource for Urban Planning
title_full Applying Machine Learning and Geolocation Techniques to Social Media Data (Twitter) to Develop a Resource for Urban Planning
title_fullStr Applying Machine Learning and Geolocation Techniques to Social Media Data (Twitter) to Develop a Resource for Urban Planning
title_full_unstemmed Applying Machine Learning and Geolocation Techniques to Social Media Data (Twitter) to Develop a Resource for Urban Planning
title_sort applying machine learning and geolocation techniques to social media data (twitter) to develop a resource for urban planning
publisher World Bank, Washington, DC
publishDate 2020
url http://documents.worldbank.org/curated/en/407261607111342557/Applying-Machine-Learning-and-Geolocation-Techniques-to-Social-Media-Data-Twitter-to-Develop-a-Resource-for-Urban-Planning
http://hdl.handle.net/10986/34910
_version_ 1764481913914916864