Mobile UltimateMobile Ultimate

The Resolution Hypothesis: How Temporal Scale Shapes Health Predictions

Understanding the impact of temporal scale on human movement analytics |  Journal of Geographical Systems | Springer Nature Link

1. Introduction

Wearable photoplethysmography (PPG) has quietly become one of the richest sensing technologies in modern digital health. A single optical waveform—collected passively from the wrist—encodes an extraordinary range of biological information: cardiac mechanics, vascular tone, autonomic balance, respiratory coupling, and even long-term behavioral rhythms.

Yet despite this complexity, most machine learning pipelines treat PPG as if meaningful information exists at a single, fixed time scale. Signals are resampled, windowed, compressed, and flattened—implicitly assuming that one temporal resolution is sufficient for every task.

That assumption is increasingly difficult to justify.

An arrhythmia may manifest as subtle beat-to-beat changes measured in milliseconds. Sleep transitions, by contrast, emerge over minutes through gradual shifts in variability and autonomic tone. These observations motivate what we call the resolution hypothesis:

Different health outcomes depend on different temporal resolutions of the same physiological signal.

Rather than viewing resolution as a preprocessing choice, we propose treating it as a structural dimension of representation learning. Models that collapse all temporal scales into a single embedding risk discarding clinically meaningful structure. In contrast, architectures that preserve multiple resolutions align more naturally with human physiology.

In this article, we argue that hierarchical convolutional models—particularly U-Net–style architectures—provide a principled way to operationalize the resolution hypothesis. Compared to transformer-based models that rely on global attention, U-Nets capture long-range dependencies through progressive aggregation, while preserving fine-scale detail—often at a fraction of the computational cost.

Using PPG as a case study, we demonstrate how different health outcomes align with different layers of a multi-resolution hierarchy.

2. Methods

Consider a PPG sequence represented as a discrete-time signal:

x∈RC×Lx \in \mathbb{R}^{C \times L}

where CC denotes channels and LL the number of time steps.

The key design question is how a model aggregates information across time.

Transformers: Explicit Global Interactions

Transformer architectures rely on self-attention mechanisms to compute pairwise interactions between all time steps. While expressive, this process scales quadratically with sequence length O(L2)O(L^2), making long-context modeling computationally expensive. Moreover, attention mechanisms do not inherently encode assumptions about the hierarchical organization of physiological signals.

Hierarchical Convolutions: Implicit Context Through Depth

Hierarchical convolutional encoders take a different approach. Each layer applies local convolutions followed by downsampling (via striding), progressively expanding the effective receptive field.

If kk denotes kernel size and ss stride, the receptive field grows approximately as:

Rd=Rd−1+(k−1)∏i=1d−1siR_d = R_{d-1} + (k – 1)\prod_{i=1}^{d-1}s_i

With increasing depth, representations integrate information across exponentially larger temporal spans—without explicitly computing global attention.

U-Net Architecture

The U-Net architecture pairs this encoder with a symmetric decoder and skip connections. Shallow layers retain high-resolution waveform morphology. Deeper layers encode slower dynamics such as:

  • Heart rate trends

  • Autonomic modulation

  • Circadian rhythms

Skip connections ensure that global context enhances—rather than overwrites—fine-scale information.

The resulting representation is not a single latent vector but a structured hierarchy of embeddings indexed by temporal resolution.

Self-Supervised Training

We train the model using masked autoencoding: randomly masking segments of PPG and reconstructing them from context. This objective encourages learning of both:

  • Local continuity

  • Long-range temporal structure

More importantly, intermediate activations become resolution-specific probes. By training linear classifiers on different layers, we can ask a critical question:

At what temporal scale does predictive information emerge for each health outcome?

Computational Efficiency

Hierarchical convolutions operate in linear time and memory O(L)O(L), making them suitable for long sequences and on-device deployment. Rather than attending globally, the model builds context progressively through depth.

In effect, global awareness emerges naturally from hierarchical aggregation.

3. Results

When applied to PPG-based health prediction tasks, multi-resolution representations reveal clear task-specific structure.

Cardiovascular Outcomes

Hypertension and arrhythmia detection show peak performance at deeper layers—corresponding to coarser temporal resolutions. These outcomes reflect sustained trends and rhythm patterns rather than isolated beats.

Sleep Staging

Sleep classification similarly favors intermediate-to-coarse resolutions. Transitions between wakefulness, REM, and non-REM sleep are encoded in slower oscillations of heart rate variability and vascular tone.

Fine-grained morphology alone carries limited signal.

Laboratory Abnormalities

In contrast, laboratory measures such as hemoglobin levels or electrolyte imbalances show strongest predictability at fine temporal resolutions. These tasks appear sensitive to subtle, high-frequency distortions in PPG morphology—patterns that disappear with aggressive downsampling.

This finding provides particularly strong support for the resolution hypothesis in areas where physiological intuition is less obvious.

Efficiency and Performance

Across tasks, hierarchical convolutional models achieve competitive or superior performance relative to transformer-based alternatives—while being significantly smaller and more computationally efficient.

This suggests that architectural alignment with signal structure can substitute for brute-force capacity. If physiology is hierarchically organized, models should be too.

4. Discussion and Conclusion

The central message is straightforward:

Resolution matters.

Physiological signals are inherently multi-scale. Different health outcomes interrogate different temporal slices of that spectrum. Treating resolution as a primary axis of representation learning offers both practical and scientific advantages.

Practical Implications

  • Enables compact, on-device models

  • Supports continuous, privacy-preserving health monitoring

  • Reduces reliance on computationally expensive attention mechanisms

Scientific Implications

Interpretability shifts from asking which feature matters to asking:

At what temporal scale does information emerge?

This reframing provides a powerful lens for understanding disease processes and physiological variation.Architectural Insight

U-Net–style architectures encode long-range dependencies implicitly through receptive field expansion, while preserving fine detail through skip connections. This is not merely an efficiency gain—it reflects inductive bias.

When the structure of a model mirrors the structure of the signal, learning becomes more natural and representations become more meaningful.

Looking Ahead

The resolution hypothesis invites a broader rethink of foundation models in digital health. Rather than compressing signals into monolithic embeddings, we should expose and interrogate their internal hierarchies.

Different populations, diseases, and physiological processes may “live” at different temporal resolutions. Models that respect this heterogeneity will be better positioned to transform wearable data into actionable clinical insights.

The future of health AI may not depend on bigger models—but on better alignment with the structure of biology itself.

Acknowledgements

This work was conducted by the Digital Health Team at Samsung Research America.

Contributing researchers include:
Simon A. Lee, Cyrus Tanade, Hao Zhou, Juhyeon Lee, Megha Thukral, Minji Han, Rachel Choi, Md Sazzad Hissain Khan, Baiying Lu, Migyeong Gwak, Mehrab Bin Morshed, Viswam Nathan, Md Mahbubur Rahman, Li Zhu, Subramaniam Venkatraman, and Sharanya Arcot Desai.

We are deeply grateful to the study participants who generously contributed their data and made this research possible.