CircadifyCircadify
Technical Architecture11 min read

WebRTC and rPPG: How Video Visit Infrastructure Enables Vitals

How WebRTC video infrastructure powers rPPG vital signs capture during telehealth visits, from frame extraction to real-time processing architecture.

telehealthvitals.com Research Team·
WebRTC and rPPG: How Video Visit Infrastructure Enables Vitals

Most telehealth platforms already run on WebRTC. It handles the video calls, manages the peer connections, adapts to spotty Wi-Fi. What fewer platform teams realize is that the same video infrastructure already streaming between patient and provider contains the raw signal needed to measure vital signs. No additional hardware. No second camera feed. The video frames are already there.

Remote photoplethysmography (rPPG) works by analyzing tiny color changes in facial skin caused by blood flow. Each heartbeat pushes blood through capillaries near the surface, and that creates measurable shifts in how light reflects off the face. A 2024 review published in Open Access Library Journal by Ali S. Salim and Abdul Sattar M. Khidhir at the University of Sulaimani found that modern rPPG methods can extract heart rate from standard RGB video with a mean absolute error below 2 BPM under controlled lighting. The raw material for all of this is video frames, which WebRTC is already delivering.

"The video stream that powers a telehealth consultation contains the same optical data needed for contactless vital signs extraction. The infrastructure overlap is nearly complete." — Dr. Daniel McDuff, formerly of Microsoft Research, in his 2023 work on camera-based physiological sensing

How WebRTC video visit infrastructure maps to rPPG requirements

WebRTC was designed for real-time communication, not medical signal processing. But the technical overlap between what WebRTC provides and what rPPG needs is surprisingly close. The gap is smaller than most engineering teams expect.

Here is where the two technologies align and where they diverge:

| Requirement | What WebRTC provides | What rPPG needs | Gap | |---|---|---|---| | Frame rate | 15-30 fps adaptive | 20+ fps for reliable signal | Minimal — default settings usually sufficient | | Resolution | 640x480 to 1080p, negotiated | At least 640x480 facial region | No gap at standard settings | | Color depth | 8-bit RGB (YUV converted) | RGB pixel values from skin regions | Conversion step needed from YUV to RGB | | Latency | Sub-200ms peer-to-peer | 10-30 second rolling window | Different requirement — rPPG buffers frames, not streams them | | Compression | VP8/VP9/H.264/AV1 | Uncompressed preferred, compressed workable | Compression artifacts add noise but modern algorithms handle it | | Network adaptation | Bitrate and resolution scaling | Stable frame rate preferred | Adaptive bitrate can disrupt signal — needs management | | Encryption | SRTP/DTLS mandatory | N/A if processing client-side | No gap — frames are decrypted before rendering |

The interesting thing is that compression, which most engineers flag as the primary concern, turns out to be manageable. A 2025 study published in Frontiers in Digital Health by researchers at the University of Bonn found that the plane orthogonal-to-skin (POS) method produces reliable rPPG signals even from H.264 compressed video, though signal quality does degrade at bitrates below 500 kbps. Most telehealth calls operate well above that threshold.

The frame extraction pipeline

Getting rPPG-ready frames out of a WebRTC video stream involves intercepting the media pipeline at the right point. There are three main architectural approaches, each with trade-offs.

Client-side extraction using MediaStream API

The browser's MediaStreamTrack API lets you pull individual frames from a video track before or after it hits the <video> element. Using requestVideoFrameCallback() (supported in Chrome, Edge, and Safari as of 2025), developers can access each frame with precise timing metadata. The frame gets drawn to an offscreen canvas, and pixel data is extracted via getImageData().

This approach keeps everything on the patient's device. No video frames leave the browser for vitals processing. From a privacy and HIPAA standpoint, that matters. The downside is that you are constrained by the patient's device CPU and GPU, and older phones may struggle with simultaneous video encoding and rPPG computation.

Server-side extraction via SFU media pipeline

Selective Forwarding Units (SFUs) like Janus, mediasoup, or LiveKit already handle WebRTC media routing for multi-party calls. Some telehealth platforms route video through an SFU rather than using direct peer-to-peer connections, especially when recording or transcription is involved.

In this architecture, the SFU can fork the video stream and send a copy to an rPPG processing service. The advantage: processing happens on controlled infrastructure with predictable compute resources. The cost: you are now sending patient video to a server, which adds compliance requirements around data handling, encryption at rest, and retention policies.

Hybrid approach with on-device preprocessing

A middle path that is gaining traction: run face detection and region-of-interest extraction on the client, then send only the aggregated RGB channel averages (not video frames) to a server for signal processing. This reduces bandwidth to a few kilobytes per second and sidesteps most privacy concerns since raw facial video never leaves the device.

A 2025 paper from researchers at the National Tsing Hua University in Taiwan demonstrated that transmitting compressed ROI color signals instead of full frames reduced bandwidth requirements by 98% while maintaining heart rate estimation accuracy within 1.5 BPM of full-frame processing.

What WebRTC's adaptive bitrate does to rPPG signals

This is where things get tricky. WebRTC constantly adjusts video quality based on network conditions. When a patient's connection degrades, WebRTC may drop from 30 fps to 15 fps, reduce resolution, or increase compression. All three affect rPPG signal quality differently.

Frame rate drops are the most disruptive. rPPG algorithms work by tracking subtle color changes across consecutive frames. According to the 2024 comprehensive review by Salim and Khidhir, most deep learning-based rPPG methods require a minimum of 20 fps to produce reliable heart rate estimates. Below that, the temporal resolution of the blood volume pulse signal degrades and error rates climb.

Resolution reductions matter less, as long as the facial region remains at least 100x100 pixels. Most rPPG methods only analyze a small patch of the forehead or cheeks, so a 480p stream provides more than enough spatial data.

Compression increases (lower bitrate) add noise to the color channels but don't typically make the signal unrecoverable. The POS algorithm and newer deep learning approaches like PhysNet and EfficientPhys have shown robustness to compression artifacts at bitrates commonly used in video calls, according to research from the 2025 rPPG survey published in Frontiers in Digital Health.

The practical solution: request a minimum quality floor from WebRTC's bandwidth estimation. The RTCRtpSender.setParameters() API allows setting minimum bitrate and frame rate constraints. Setting a floor of 20 fps and 800 kbps keeps rPPG viable without meaningfully affecting call quality.

Integration architecture for telehealth platforms

For platform engineering teams evaluating how to add rPPG to an existing WebRTC telehealth stack, the architecture decision comes down to where processing happens and how vital signs data flows into the clinical workflow.

| Architecture pattern | Processing location | Latency to first reading | Privacy profile | Infrastructure cost | |---|---|---|---|---| | Fully client-side | Patient's browser/device | 15-30 seconds | Best — no video leaves device | Lowest — uses patient hardware | | SFU fork to cloud service | Cloud GPU instance | 15-30 seconds | Requires BAA, encryption at rest | Moderate — cloud compute per session | | Hybrid ROI streaming | Client preprocessing + cloud signal analysis | 20-40 seconds | Good — only aggregated signals transmitted | Low — minimal bandwidth and compute | | Edge processing at provider site | On-premise GPU | 15-30 seconds | Good — stays within network | High — requires on-prem infrastructure |

Most telehealth companies building this today are choosing either fully client-side or the hybrid approach. The SFU fork model works but introduces compliance overhead that many teams want to avoid.

How the data flows into EHR systems

Vital signs captured during a telehealth visit need to end up in the patient's chart. The standard path is HL7 FHIR Observation resources. A camera-based heart rate reading becomes a FHIR Observation with:

  • A LOINC code (8867-4 for heart rate)
  • The measured value and units
  • A timestamp from the measurement window
  • A method code indicating camera-based/rPPG measurement
  • A reference to the encounter (the telehealth visit)

This is the same structure used for any vital signs entry. EHR systems like Epic, Cerner, and athenahealth accept FHIR Observation writes through their standard APIs. The camera-based measurement method is what is new — the data format is not.

Current research on WebRTC-compatible rPPG

The academic research community has been moving toward models that work specifically with compressed, variable-quality video rather than lab-controlled conditions.

A 2025 paper published on arXiv by researchers at multiple institutions presented a comprehensive benchmark of rPPG methods tested against real-world video conditions including compression, motion, and variable lighting. Their findings showed that transformer-based architectures outperformed earlier CNN approaches by 15-20% on compressed video inputs, suggesting that newer models are increasingly robust to the kind of video WebRTC delivers.

The Nature-published study by researchers at National Yang Ming Chiao Tung University in 2024 introduced a machine learning pipeline that constructs rPPG waveforms from noisy video inputs, specifically targeting the challenges of non-contact measurement outside clinical environments. Their approach handled the frame-to-frame inconsistencies common in video calls better than previous methods.

Work from the WellFie study (published in medRxiv, 2023) tested smartphone-based rPPG in a clinical setting with 100 participants and found that video-based heart rate measurements correlated within 3 BPM of pulse oximeter readings for 90% of participants. While this study used a dedicated app rather than a WebRTC video call, the underlying signal processing challenges are comparable.

The future of WebRTC-based vital signs

The next generation of WebRTC itself is making rPPG integration easier. WebCodecs API, now stable in Chromium browsers, gives developers direct access to raw video frames without going through the canvas rendering path. This eliminates a processing step and reduces latency for frame extraction.

WebGPU, shipping in Chrome and Edge since 2023 and now in Safari Technology Preview, enables GPU-accelerated neural network inference directly in the browser. Running an rPPG deep learning model in WebGPU rather than in JavaScript or WebAssembly can reduce processing time per frame from 30ms to under 5ms on modern hardware.

The W3C WebRTC Working Group's work on Encoded Transforms (formerly Insertable Streams) opens another path: processing encoded video frames in a Worker thread before they are decoded. This could allow rPPG preprocessing to happen in parallel with video rendering without competing for main-thread resources.

Together, these browser APIs are closing the gap between "WebRTC can theoretically support rPPG" and "rPPG runs seamlessly inside any WebRTC video call."

Frequently asked questions

Does WebRTC video compression destroy the rPPG signal?

Not at typical telehealth bitrates. Research shows that rPPG signals remain extractable from H.264 and VP9 compressed video at bitrates above 500 kbps. Most telehealth calls run at 1-3 Mbps, well above the threshold. Signal quality does degrade at very low bitrates, but algorithms like POS and PhysNet are designed to handle compression noise.

What frame rate does rPPG need from a WebRTC stream?

Most rPPG algorithms require at least 20 fps for reliable vital signs estimation. WebRTC typically delivers 25-30 fps under normal network conditions. The concern is when network congestion causes WebRTC to drop frame rate adaptively — setting a minimum frame rate constraint in the WebRTC configuration prevents this from breaking the rPPG pipeline.

Can rPPG run entirely in the browser without sending video to a server?

Yes. Client-side JavaScript rPPG implementations using WebAssembly or WebGPU can process video frames locally in real time on modern devices. This is the most privacy-friendly architecture since no facial video leaves the patient's device. The trade-off is that older or lower-end devices may not have enough processing power for simultaneous video calling and rPPG computation.

How long does it take to get a vital signs reading during a video visit?

Typically 15 to 30 seconds of stable video. rPPG algorithms need a rolling window of frames to extract the blood volume pulse signal. The patient needs to be relatively still and facing the camera during this window, which fits naturally into the beginning of a telehealth consultation when introductions are happening.

Companies like Circadify are building rPPG solutions designed specifically for integration into existing telehealth and video call infrastructure, turning the WebRTC streams that platforms already operate into a source of clinical vital signs data without requiring patients to own or operate any additional hardware.

WebRTCrPPGtelehealth vitalsvideo infrastructure
Request a Platform Demo