The Research Problem
The project addresses how menopause-related social media narratives can be analysed responsibly to generate insights for well-being and health communication. Although many women share first-person experiences online (covering emotions, coping strategies, and stigma) these narratives are rarely reused in research due to major constraints.
Key problems include: privacy and platform compliance risks, potential re-identification and harm, and the reproduction of stigmatising or prescriptive health language. Additionally, most existing resources are English-centric, marginalising Portuguese- and Spanish-speaking communities.
As a result, valuable narratives remain underutilised, limiting evidence-based support for women’s health and slowing progress toward SDGs 3, 5, and 10. The central research problem is whether privacy-preserving synthetic narratives can serve as a valid, ethical alternative to real social media data, while maintaining analytical accuracy, safety, fairness, and multilingual transferability.
Research Design
The research design is structured around three hypotheses: (H1) synthetic narratives that preserve discourse patterns enable AI models to detect well-being themes with accuracy comparable to models trained on small, de-identified real datasets; (H2) models and generation routines developed in English can be transferred to Portuguese and Spanish with minimal adaptation and limited labelled data; and (H3) synthetic or mixed datasets reduce privacy and compliance risks while maintaining respectful, non-clinical language. Methodologically, the project integrates human annotation, synthetic text generation, explainable AI modelling, and systematic evaluation. Work proceeds through staged activities: ethical and legal setup including secure storage and a Data Protection Impact Assessment; curation and de-identification of a small set of public posts with human labelling of emotions, coping strategies, stigma signals, and targets; generation of controlled first-person synthetic narratives conditioned on labels; training explainable AI models on real, synthetic, and mixed datasets; and evaluation of performance, privacy leakage, safety, fairness, and multilingual transferability, followed by cross-linguistic testing in Portuguese and Spanish. The project operates through an interdisciplinary partnership across multiple institutions within the World Universities Network, led by the University of Exeter, and organised into specialised working groups with regular technical meetings, interdisciplinary reviews, and hybrid workshops.
Project Objectives
The project aims to establish and validate a responsible framework for analysing sensitive health narratives using synthetic data, with menopause as a pilot case. Its objectives are to: (1) validate the use of synthetic narratives for analysing well-being, emotions, coping strategies, and stigma; (2) develop privacy-preserving, explainable AI tools suitable for sensitive first-person narratives; (3) demonstrate multilingual transferability across English, Portuguese, and Spanish; (4) deliver reusable research outputs, including synthetic datasets, annotation guidelines, reproducible code, and evaluation benchmarks; and (5) support improved health communication and awareness aligned with SDGs 3, 5, and 10. Support from the WUN Research Development Fund enables international collaboration, interdisciplinary integration, and pilot validation, positioning the project as a foundation for larger future funding proposals.