The effective use of social media data for applications such as urban and regional studies or public health monitoring often requires reliable location information. Yet, in the case of Twitter, only a vanishingly small 2% of tweets are provided with post-related geo-annotations. This paper addresses this shortcoming by presenting a classification framework that infers tweet locations with high precision, relying solely on textual and contextual information provided within the post itself. The approach has worldwide coverage and adapts its spatial granularity based on post volume within the respective area, ranging from country-level predictions in less active regions to state-level predictions in high-activity zones. We perform extensive evaluations to examine the impact of various factors on the algorithm’s performance with respect to different regions of the world. Furthermore, our analyses suggest that discarding a small percentage of tweets containing few hints on location significantly increases overall model quality.
«
The effective use of social media data for applications such as urban and regional studies or public health monitoring often requires reliable location information. Yet, in the case of Twitter, only a vanishingly small 2% of tweets are provided with post-related geo-annotations. This paper addresses this shortcoming by presenting a classification framework that infers tweet locations with high precision, relying solely on textual and contextual information provided within the post itself. The ap...
»