Abstract
The emergence of the Smart Cities paradigm and the rapid expansion and integration of Internet of Things (IoT) technologies within this context have created unprecedented opportunities for high-resolution behavioral analytics, urban optimization, and context-aware services. However, this same proliferation intensifies privacy risks, particularly those arising from cross-modal data linkage across heterogeneous sensing platforms. To address these challenges, this paper introduces a comprehensive, statistically grounded framework for generating synthetic, multimodal IoT datasets tailored to Smart City research. The framework produces behaviorally plausible synthetic data suitable for preliminary privacy risk assessment and as a benchmark for future re-identification studies, as well as for evaluating algorithms in mobility modeling, urban informatics, and privacy-enhancing technologies. As part of our approach, we formalize probabilistic methods for synthesizing three heterogeneous and operationally relevant data streams—cellular mobility traces, payment terminal transaction logs, and Smart Retail nutrition records—capturing the behaviors of a large number of synthetically generated urban residents over a 12-week period. The framework integrates spatially explicit merchant selection using K-Dimensional (KD)-tree nearest-neighbor algorithms, temporally correlated anchor-based mobility simulation reflective of daily urban rhythms, and dietary-constraint filtering to preserve ecological validity in consumption patterns. In total, the system generates approximately 116 million mobility pings, 5.4 million transactions, and 1.9 million itemized purchases, yielding a reproducible benchmark for evaluating multimodal analytics, privacy-preserving computation, and secure IoT data-sharing protocols. To show the validity of this dataset, the underlying distributions of these residents were successfully validated against reported distributions in published research. We present preliminary uniqueness and cross-modal linkage indicators; comprehensive re-identification benchmarking against specific attack algorithms is planned as future work. This framework can be easily adapted to various scenarios of interest in Smart Cities and other IoT applications. By aligning methodological rigor with the operational needs of Smart City ecosystems, this work fills critical gaps in synthetic data generation for privacy-sensitive domains, including intelligent transportation systems, urban health informatics, and next-generation digital commerce infrastructures.
| Original language | English |
|---|---|
| Article number | 692 |
| Journal | Electronics (Switzerland) |
| Volume | 15 |
| Issue number | 3 |
| DOIs | |
| State | Published - Feb 2026 |
Funding
During the preparation of this manuscript, the authors used Google’s Gemini 3 Pro Image for the sole purpose of generating the high-level illustration at the beginning of this paper. The authors have reviewed the result and confirmed its applicability. No GenAI was used for conducting this research or for the writing of this manuscript. Notice: This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The publisher, by accepting this article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( https://www.energy.gov/doe-public-access-plan , accessed on 4 February 2026). This research was funded by the Advanced Telecommunications Engineering Lab (TEL) at the University of Nebraska–Lincoln under TEL’s Young Faculty Mentor Grant program.
Keywords
- IoT
- cross-modal linkage
- multimodal data synthesis
- privacy risk assessment
- smart cities
Fingerprint
Dive into the research topics of 'Heterogeneous Multi-Domain Dataset Synthesis to Facilitate Privacy and Risk Assessments in Smart City IoT'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver