CircadifyCircadify
Technical11 min read

rPPG Integration for Telemedicine Software: Architecture Guide

A deep technical architecture guide covering client-side rPPG capture pipelines, data payload structures, server-side integration patterns, and scaling considerations for telemedicine platforms.

Circadify Team·

Integrating remote photoplethysmography into a telemedicine platform is fundamentally an architecture problem. The signal processing science is solved. The clinical validation exists. What remains is fitting a real-time physiological measurement pipeline into the constraints of a browser-based video call application that must work reliably across devices, networks, and clinical contexts.

This guide provides the technical architecture for that integration, from raw camera frames to structured clinical data in your backend systems.

Client-Side Capture Pipeline

The rPPG capture pipeline runs entirely in the browser. This is a deliberate architectural choice: on-device processing keeps raw facial video local to the patient's device, eliminates round-trip latency for signal analysis, and reduces your server-side compute and compliance burden. The pipeline consists of four stages that execute in sequence on every captured frame.

Stage 1: Face Detection and Tracking

The first stage identifies and tracks the patient's face within the camera frame. This uses a lightweight face detection model compiled to WebAssembly, optimized for real-time performance at 15-30 fps on consumer hardware.

Face tracking is continuous, not per-frame detection. After initial detection, the tracker predicts face position in subsequent frames using motion estimation, falling back to full detection only when the prediction confidence drops below threshold. This reduces per-frame compute by roughly 60 percent compared to running full detection on every frame.

The tracker outputs a bounding box and facial landmarks. The landmarks are critical because they define the regions of interest for signal extraction.

Stage 2: Region of Interest (ROI) Extraction

Not all facial skin is equally useful for rPPG. The forehead, cheeks, and nose bridge provide the strongest signals because they have dense capillary networks close to the skin surface and relatively flat geometry that reduces specular reflection artifacts.

The ROI extraction stage uses the facial landmarks from stage 1 to define precise polygonal regions on the forehead and both cheeks. These regions are dynamically adjusted each frame to follow facial movement, ensuring consistent signal extraction even as the patient shifts position during the call.

Within each ROI, the pipeline computes the spatial average of pixel values across the red, green, and blue color channels. The green channel carries the strongest PPG signal due to hemoglobin absorption characteristics, but all three channels are retained because multi-channel analysis enables noise cancellation and SpO2 estimation.

The output of this stage is a time series of RGB channel averages for each ROI, sampled at the camera's frame rate.

Stage 3: Signal Processing

This is where raw pixel data becomes physiological measurement. The signal processing stage applies several transformations to the RGB time series:

Detrending. Slow drifts in the signal caused by lighting changes, auto-exposure adjustments, and gradual head movement are removed using an adaptive smoothness prior detrending algorithm.

Motion artifact removal. Patient movement introduces noise that can overwhelm the PPG signal. The pipeline uses independent component analysis (ICA) across the multiple ROIs and color channels to separate the cardiac signal from motion artifacts. Because blood flow is synchronized across all facial regions while motion artifacts are spatially variant, ICA can effectively isolate the pulse wave.

Bandpass filtering. The detrended and artifact-reduced signal is bandpass filtered to the physiologically plausible heart rate range (typically 40-180 BPM, corresponding to 0.67-3.0 Hz). This eliminates both low-frequency respiratory modulation and high-frequency sensor noise.

Peak detection. Individual heartbeats are identified through peak detection on the filtered signal. The inter-beat intervals (IBIs) form the basis for heart rate and HRV calculations.

SpO2 estimation. The ratio of pulsatile signal amplitude between the red and blue (or green) channels correlates with arterial oxygen saturation. The pipeline computes this ratio and maps it to SpO2 using a calibration model derived from clinical validation against pulse oximetry.

Stage 4: Vital Signs Computation

From the processed signal and detected peaks, the pipeline computes the final vital signs:

  • Heart rate: Median inter-beat interval converted to BPM, with outlier rejection for missed or false-positive peaks
  • HRV metrics: SDNN, RMSSD, pNN50 (time-domain), and LF/HF ratio (frequency-domain) computed from the IBI series
  • SpO2: Calibrated estimate from the multi-channel ratio analysis
  • Respiratory rate: Extracted from the amplitude modulation envelope of the PPG signal, which varies with breathing
  • Stress index: A composite score derived from HRV frequency-domain analysis, reflecting sympathetic-parasympathetic balance

Each vital includes a confidence score based on signal quality metrics: signal-to-noise ratio, motion artifact level, and the consistency of peak detection across the measurement window.

Data Payload Structure

When a capture session completes, the SDK returns a structured JSON payload. Understanding this structure is essential for backend integration.

{
  "session_id": "vs_a1b2c3d4e5f6",
  "capture_start": "2026-04-03T14:23:01.442Z",
  "capture_end": "2026-04-03T14:23:31.891Z",
  "duration_ms": 30449,
  "device": {
    "user_agent": "Mozilla/5.0 ...",
    "camera_resolution": "1280x720",
    "effective_fps": 28.4
  },
  "signal_quality": {
    "overall": 0.87,
    "snr_db": 4.2,
    "motion_artifact_ratio": 0.08,
    "face_tracking_stability": 0.94
  },
  "vitals": {
    "heart_rate": {
      "value": 72,
      "unit": "bpm",
      "confidence": 0.93,
      "method": "peak_detection_median"
    },
    "hrv": {
      "sdnn_ms": 48.3,
      "rmssd_ms": 34.1,
      "pnn50_pct": 18.7,
      "lf_hf_ratio": 1.42,
      "confidence": 0.81,
      "unit": "ms"
    },
    "spo2": {
      "value": 97,
      "unit": "percent",
      "confidence": 0.78,
      "method": "ratio_of_ratios"
    },
    "respiratory_rate": {
      "value": 16,
      "unit": "breaths_per_min",
      "confidence": 0.74,
      "method": "amplitude_modulation"
    },
    "stress_index": {
      "value": 42,
      "unit": "index_0_100",
      "confidence": 0.82,
      "method": "hrv_frequency_domain"
    }
  }
}

The payload is compact, typically under 2 KB. It contains no image data, no video frames, and no biometric identifiers. This is a deliberate design choice that simplifies data handling and minimizes the compliance surface area for your platform.

Server-Side Integration Patterns

Your backend needs to receive, validate, store, and serve vitals data. There are three common integration patterns, and the right choice depends on your existing architecture.

Pattern 1: REST API Endpoint

The simplest approach is a dedicated REST endpoint that receives the vitals payload when a capture session completes.

POST /api/v1/visits/{visit_id}/vitals
Authorization: Bearer {session_token}
Content-Type: application/json

{vitals payload}

Your backend validates the payload, checks that the visit ID is active and belongs to the authenticated patient, applies your signal quality threshold (for example, rejecting measurements with overall quality below 0.6), and persists the data. The provider's dashboard polls or subscribes for updates to display new vitals as they arrive.

This pattern works well for platforms that capture vitals at discrete points during the visit rather than continuously.

Pattern 2: WebSocket Channel

If your platform already maintains a WebSocket connection for real-time features like chat, presence indicators, or signaling, you can multiplex vitals data onto the same channel.

This enables streaming vitals updates during the capture window. Rather than waiting for the full 30-second measurement to complete, the SDK can push preliminary estimates every 5 seconds, giving the provider a real-time view of the patient's heart rate as it converges to a stable measurement.

The message format follows a typed envelope pattern:

{
  "type": "vitals_update",
  "visit_id": "visit_xyz",
  "status": "measuring",
  "preliminary": {
    "heart_rate": { "value": 74, "confidence": 0.71 }
  },
  "elapsed_ms": 15200,
  "remaining_ms": 14800
}

Pattern 3: Webhook Delivery

For platforms that process visit data asynchronously or integrate with external systems, a webhook pattern lets you route vitals data to multiple consumers. When vitals are captured, your backend emits a webhook event to registered endpoints.

This is particularly useful for EHR integration, population health analytics, or triggering clinical workflows based on vital sign thresholds (for example, alerting a care team when a patient's SpO2 drops below 92 percent during a visit).

FHIR Resource Mapping

If your platform communicates with EHR systems via FHIR, rPPG vitals map to standard FHIR R4 Observation resources. Each vital sign becomes a separate Observation linked to the Encounter (visit) resource.

The relevant LOINC codes are:

  • Heart rate: 8867-4 (Heart rate)
  • SpO2: 2708-6 (Oxygen saturation in Arterial blood by Pulse oximetry)
  • Respiratory rate: 9279-1 (Respiratory rate)
  • HRV (SDNN): 80404-7 (R-R interval.standard deviation)

The Observation resource should include the method extension indicating rPPG-derived measurement and the confidence score as a data quality component. This gives receiving EHR systems the context to handle rPPG-derived vitals appropriately in clinical decision support rules.

WebRTC Considerations for Simultaneous Video and Vitals

Running rPPG capture alongside a WebRTC video call introduces specific architectural considerations that affect both measurement quality and call performance.

Camera Feed Branching

The fundamental requirement is that the rPPG SDK must process raw, uncompressed camera frames. WebRTC's video encoding pipeline applies compression, adaptive bitrate adjustments, and potentially resolution scaling that corrupt the subtle pixel-level signals rPPG depends on.

The correct architecture branches the camera feed before it enters the WebRTC pipeline. You obtain the raw MediaStream from getUserMedia, then create two consumers: the WebRTC peer connection gets one track for the video call, and the rPPG SDK gets a separate reference to process raw frames via a VideoFrameCallback or OffscreenCanvas.

CPU Budget Management

Both WebRTC encoding and rPPG processing consume CPU cycles. On lower-powered devices, running both simultaneously at maximum quality can cause frame drops in the video call or reduced rPPG accuracy.

The SDK includes an adaptive quality mode that monitors CPU utilization and adjusts its processing demands accordingly. It can reduce processing frame rate (for example, analyzing every other frame instead of every frame), switch to single-ROI mode instead of multi-ROI, or defer frequency-domain HRV analysis until the capture session completes and the results are assembled.

Your platform can also manage this by scheduling intensive vitals capture during pauses in the conversation rather than continuously during the entire visit.

Network Independence

A key advantage of on-device rPPG processing is that vitals capture is independent of network conditions. If the WebRTC connection degrades, the video call quality may drop, but the rPPG measurement continues using the local camera feed at full quality. This means vitals data is available even during network disruptions that would make cloud-based processing impossible.

Edge Computing vs. Cloud Processing Tradeoffs

The architecture decision between processing on the patient's device (edge) versus processing on your servers (cloud) has cascading implications.

On-device (edge) processing, which is Circadify's approach, offers lower latency (no upload round-trip), inherent privacy (raw video stays local), network independence, and no incremental server costs per measurement. The tradeoff is dependence on client device capability, though WebAssembly performance on modern browsers has largely eliminated this concern for devices manufactured in the last five years.

Cloud processing would theoretically allow more computationally intensive algorithms and centralized model updates, but it requires uploading facial video to your servers, introduces HIPAA compliance complexity for video handling, depends on network quality and bandwidth during the very moments when the network may already be stressed by the video call, and adds per-measurement infrastructure costs that scale with concurrent visits.

For telemedicine platforms, edge processing is the clear architectural winner. It aligns with the existing client-heavy architecture of browser-based video calls and avoids creating a new server-side processing bottleneck.

Scaling Considerations for Multi-Tenant Platforms

If your telemedicine platform serves multiple healthcare organizations, the vitals integration needs multi-tenancy support across several dimensions.

Data isolation. Vitals data must be strictly isolated per tenant, following the same patterns you use for visit records and patient data. If you use database-per-tenant or schema-per-tenant isolation, vitals data belongs in the same isolation boundary.

Configuration per tenant. Different organizations may want different signal quality thresholds, different vitals displayed to providers, or different capture workflows (automatic vs. provider-initiated). Your tenant configuration should include vitals-related settings that the SDK reads at initialization.

Feature flagging. Not all tenants will enable vitals capture simultaneously. Your feature flag system should control SDK loading so that tenants without vitals enabled do not incur the additional JavaScript bundle download.

Usage metering. If you price vitals as a premium feature, you need to meter capture sessions per tenant. The vitals payload includes a unique session ID and timestamp that feed into your billing pipeline.

Audit logging. Healthcare organizations require audit trails for clinical data access. Every vitals measurement, retrieval, and display event should be logged with the acting user, timestamp, and context, following the same audit patterns you apply to other clinical data.

Implementation Sequence

For engineering teams planning this integration, the recommended sequence is:

  1. Stand up the SDK in a sandbox environment and verify measurements against a reference pulse oximeter to build confidence in the data quality.
  2. Implement the camera feed branching to separate raw frames from the WebRTC pipeline.
  3. Build the backend endpoint and storage layer for vitals data.
  4. Integrate the provider-side display with your existing visit UI.
  5. Add FHIR mapping if you have EHR integrations.
  6. Implement multi-tenant configuration and metering.
  7. Load test concurrent vitals capture across your expected peak visit volume.

Each step builds on the previous one, and the architecture supports incremental delivery rather than requiring a big-bang release. You can ship vitals capture to a single tenant or provider group while the full multi-tenant infrastructure matures.

The result is a telemedicine platform that captures clinical-grade vital signs from a standard browser camera, processes them without ever exposing raw facial video to your servers, and delivers structured data through the same integration patterns your platform already uses for clinical information.

rPPGarchitecturetelemedicineintegration
Request a Platform Demo