Over the past decade, social media has become a central medium for people to connect, interact, and exchange. Every day, billions of brief and often informal posts circulate across these platforms, carrying opinions, sentiments, and personal details to audiences from close friends to the broader public. The ease of participation encourages users to publish and consume content with minimal reflection on reliability or long-term visibility. This continuous flow of real-time discourse offers deep insights into the topics that occupy public attention and how society perceives them. While this data stream of user-generated content has proven indispensable for emergency coordination, public health outreach, and community engagement, it is equally exploited for targeted advertising, polarizing political campaigns, and orchestrated disinformation. In order to maximize its societal benefits while mitigating its misuse, we require a detailed understanding of how messages are disseminated in social networks. Consequently, the goal of this thesis is to identify, understand, and predict the information flows within social networks. To achieve our goal, we identify research questions at three scales, covering the relevant diffusion dynamics at all involved scopes: the post level, the user level, and the network level. For each scale, we develop practical methods that enable reliable detection and forecasting mechanisms while circumventing existing challenges to data processing in the respective fields. Our contributions on the post-level scope begin with a systematic review of existing methods for processing unstructured social media texts, highlighting their limitations when applied to noisy and multilingual corpora. To close this gap, we propose a lightweight framework that fuses domain-specific anchor guidance with character- and subword-based embeddings, enabling multilingual topic discovery, sentiment classification, and temporal shift detection on noisy and code-mixed corpora. Together, these advances create a robust semantic layer that supports subsequent user- and network-level analysis presented in this thesis. We then focus on the user level and cover a critical assessment of prevailing techniques for influence estimation. These typically rely on fully observable social-graph data, i.e., information which is often concealed by privacy settings and API limits. To overcome this constraint, we devise heuristics that reconstruct active follower and interaction links from observable traffic and develop alternative methods for determining user popularity and influence, based on temporal network motifs and user interest trajectories. We show that content-based features can serve as behavioral proxies for structural features and discuss practical limits of user-centric information propagation due to network resistance. Finally, our contributions on the network level integrate the semantic and influence layers into a toolkit for detecting, modeling, and quantifying information diffusion. Specifically, we present a classifier that infers the geographic origin of social media posts to group spatially related posts, and outline strategies for optimal post placement with respect to maximizing its reach and potential to spread across communities. We then introduce a real-time method to identify emerging information flows and forecast their potential impact and longevity by proposing micro- and macro-level estimators for community engagement at different levels. In summary, this thesis provides a comprehensive framework for identifying, understanding, and predicting information flows on social networks, ranging from a multilingual framework for short-text analysis, through graph-agnostic measures of user influence, to real-time models for trend detection and prediction of diffusion dynamics. Our work contributes to the areas of natural language processing, influence theory, and diffusion theory, and aims to make social media platforms safer, more transparent, and more resilient places for digital interactions.
«
Over the past decade, social media has become a central medium for people to connect, interact, and exchange. Every day, billions of brief and often informal posts circulate across these platforms, carrying opinions, sentiments, and personal details to audiences from close friends to the broader public. The ease of participation encourages users to publish and consume content with minimal reflection on reliability or long-term visibility. This continuous flow of real-time discourse offers deep i...
»