Sleep & Tech

Apple, Google, Amazon, and Others: Comparison Study on the Accuracy of 11 Commercial Sleep Traking Devices

|by Asleep

    💡 [Editor’s Note] 

    A sleep tracker is a tool that anyone can easily buy, without a prescription, to conveniently measure their sleep stages every night at home, allowing them to monitor their own sleep. It measures and records various changes that occur in your body during sleep, such as how long it takes you to fall asleep in bed, whether there were any sleep disturbances such as snoring, tossing and turning or waking up, and how efficient your sleep was. In this way, we can actively strive for a healthier life by monitoring the one-third of our day that is spent sleeping.

    With the remarkable advances in technology, we now have a variety of sleep trackers to choose from based on our individual situations and lifestyles. However, it has become difficult to determine which sleep tracker will really benefit us.
    With so many sleep trackers on the market, which one is right for me? What criteria should we use to choose a sleep tracker?

    Asleep have thoroughly tested and compared the 11 most popular sleep trackers in the world under the same conditions, taking into account the strengths and weaknesses of each sleep tracker. Our aim is to find out which sleep tracker is best suited to help you identify and improve your sleep problems.

    Recently, Asleep, in collaboration with the research teams from Bundang Seoul National University Hospital and Stanford University Sleep Center, published research comparing the performances of 11 well-kown domestic and international sleep measurement devices. This is the first time that international sleep trackers including Apple Watch, Google Fitbit, Samsung Galaxy Watch, Amazon Halo and others have been compared simultaneously in the same usage environment.

    Increased objectivity in sleep tracker comparison research environment

    Sleep trackers have been compared in the past based on their basic performance. However, many existing reviews are often based on the subjective experiences of individual editors from different media who randomly select and test sleep trackers. It is also often difficult to identify the specific testing conditions under which these reviews were conducted. To truly compare sleep trackers in a 'proper' way, it is crucial to meticulously establish standardised procedures for comparison. In this study, two methods were used to establish criteria for evaluating the accuracy of sleep trackers and to ensure objectivity in the experimental conditions.

    <1> Ensuring diversity in participating research groups

    Previous sleep tracker comparison studies have often involved a limited number of participants from a single institution. In this study, however, participants were recruited from two different institutions. This comparative study involved 75 participants, 39 males and 36 females, and was conducted at institutions including Bundang Seoul National University Hospital.The researchers noted, “The experiment participants were recruited considering the gender ratio and BMI.”

    It's worth noting that university hospitals tend to have a higher proportion of male patients with more serious illnesses and older age. On the other hand, primary care clinics have a higher proportion of female patients and a more diverse range of patient ages compared to university hospitals. Recruiting participants from these two different institutions with different characteristics was a major effort to ensure the objectivity of the research.

    <2> Comparison with the golden standard of polysomnography (PSG)

    Before comparing the accuracy of sleep trackers, it is important to determine how accurate each individual sleep tracker is. The reference standard for this should be polysomnography (PSG), which provides the most comprehensive and accurate analysis of sleep states. In this study, the 75 participants wore the sleep trackers simultaneously while undergoing PSG. The accuracy of each sleep tracker was analysed by comparing it to the PSG. A total of 3890 hours of sleep session records based on sleep measurement device standards and 543 hours of polysomnography records were utilized.

    Utilize a variety of commercial sleep trackers as much as possible

    The sleep trackers used in this study were as follows.

    💡 11 commercial sleep trackers used in the comparison study
    Wearable devices that can be worn on the body, such as watches or rings
    Google Pixel Watch, Galaxy Watch 5, Fitbit Sense 2, Apple Watch 8, Aura Ring

    Nearable devices you can keep near where you sleep
    Withings Sleep Tracking Mat, Google Nest Hub 2, Amazon Halo Rise

    Airable devices that can be used in the form of smartphone applicaions
    SleepRoutine, SleepScore, Pillow

    Currently, commercially available sleep trackers employ various methods and operating principles for measuring physiological signals, and they can be broadly classified into three categories based on their accessibility and convenience from the user's perspective: wearables, nearables, and airables.

    For more detailed information, refer to the article: 'Wearables vs. Nearables vs. Airables' (link)

    Wearable sleep trackers involve direct attachment to the body, such as watches, rings, earphones, and other forms, to measure sleep-related data. They operate on principles such as LED light, acceleration sensors, brainwaves, body temperature, and more. One drawback is the inconvenience of having to wear the device on the body while sleeping.

    Nearable sleep trackers do not require direct attachment to the body. Instead, they are placed near the sleeping area or under the mattress to monitor sleep. They work by detecting respiration using methods such as radar or pressure sensors on the mattress. While they have the advantage of being non-contact, measurement results can vary depending on the placement, angle and distance of the device.

    Airable sleep trackers are application-based sleep trackers that can be used anytime, anywhere with a simple software update, as long as you have a smartphone. They measure sleep states by using the smartphone's built-in microphone to detect breathing sounds, or by using the smartphone's accelerometer (gyroscope) or ultrasound sensor. Unlike wearable and nearable devices, you don't need to buy a separate device to track your sleep.

    Sleep tracker accuracy comparison for the 4 stages of sleep

    The process of sleep can be broadly divided into four stages: Wakefulness, Light sleep, Deep sleep and REM (rapid eye movement) sleep. One of the key functions of sleep trackers is to provide information about each of these stages. In order to properly assess sleep quality, it's important to accurately monitor these four sleep stages, so it's crucial to have the ability to do so. (Link to 'Sleep stage: what are they and why are they important?' content)

    The research results showed that the 'SleepRoutine', which was one of the Airable devices, was the only one that accurately and evenly monitored all sleep stages (Wake, Light, Deep, REM)

    According to the research findings, based on the F1 score standard in the measurement of four stages of sleep, the SleepRoutine app scored the highest with 0.6863. It was followed by Amazon Halo Rise with 0.6242, and Fitbit Sense 2 with 0.5814.

    📍What is F1 Score?

    The F1 score is one of the ways to measure accuracy, considering not only the simple accuracy but also the performance in identifying each sleep stage. For instance, the F1 score increases when accurately measuring cases that do not frequently occur during sleep, such as waking up or deep sleep stages.
    [Table 1] Results of measuring the performance of each sleep tracker for the 4 stages of sleep

    [Table 1] Results of measuring the performance of each sleep tracker for the 4 stages of sleep

    The results of the study, which assessed how accurately the sleep trackers monitored each stage of sleep, can be easily understood by looking at the table below (Figure 2). Figure 2 presents the confusion matrices for the sleep stages of the 11 CSTs, providing a clear visual representation of prediction biases and misclaasification. 

    So how did the other sleep trackers perform? Looking at the tables above, you can see that, with the exception of SleepRoutine, the other devices show darker shading in certain cells among the four cells that make up the diagonal. This means that they were more likely to identify specific sleep stages accurately, rather than identifying all four sleep stages evenly and accurately.

    In particular, wearable devices tend to have a higher stability of signal measurement because they are in direct contact with the body. As a result, they are relatively good at accurately detecting sleep stages with less heart rate variability, such as Deep sleep or, conversely, Light sleep. In the actual research results, wearable devices that are worn on the wrist or finger, such as the Google Pixel Watch, the Fitbit Sense 2 and the Aura Ring, performed relatively well in accurately identifying the Deep sleep stage. They also excelled at monitoring Light sleep. These devices all use photoplethysmography (PPG) sensors to measure heart rate variability and detect sleep stages. This method measures pulse waves in blood vessels using the green LED light emitted by watches and rings.

    The same is true for Light sleep. The Google Pixel Watch, Fitbit Sense 2 and Galaxy Watch 5 showed high accuracy in monitoring light sleep. It's worth noting that all these devices detect sleep stages by measuring heart rate variability using a photoplethysmography (PPG) sensor. This method measures the pulse wave of the blood vessels using the green LED light emitted by watches and rings.

    The accuracy of REM sleep monitoring was good across all three types of sleep trackers: Wearables, Nearables and Airables. REM sleep, often associated with vivid dreaming, is characterised by reduced body movement but increased brain activity compared to normal. Breathing and heart rate become noticeably irregular during this stage. REM sleep has many characteristics that make it relatively easy to monitor using different methods. As a result, all sleep trackers have performed well in detecting this stage.

    You can confirm the same information in the hypnogram below (Figure 3). By comparing the similarity of the hypnograms for each sleep stage between the top PSG (polysomnography) hypnogram and the hypnograms for each sleep tracker, you can gauge their accuracy. Again, you can see that the SleepRoutine hypnogram is closest to the PSG hypnogram. SleepRoutine accurately matched each sleep stage, including Wake (yellow line), to the PSG monitoring.

    SleepRoutine: Best at detecting wakefulness

    In addition to assessing the four sleep stages, SleepRoutine was the best of the 11 sleep trackers at detecting wakefulness during sleep.According to the research, SleepRoutine achieved the highest accuracy in detecting wakefulness, along with accuracy in detecting the four sleep stages (Mac F1 Score). 

    The F1 score for measuring awakenings during sleep showed that SleepRoutine was the highest with 0.7065, followed by Amazon Halo Rise with 0.5967. Following this, the Apple Watch 8 showed the third-highest F1 score with 0.5493. When compared to the golden standard of PSG, it can be seen that SleepRoutine shows the most similar pattern.

    Wake detection was generally less accurate for most sleep trackers, except for SleepRoutine and Amazon Halo Rise, with accuracy rates ranging from 20% to 30%. The ability to correctly detect wakefulness was significantly lower compared to other sleep stages. This can also be seen in the table above (Figure 2). Many sleep trackers often mistakenly identified wakefulness as light sleep. Wearables primarily misclassified wake as light, while Nearables strongly misclassified REM as light. Airables, on the other hand, demonstrated a relatively higher frequency of confusion between light and deep stages. 

    That's why, in order to get practical benefits from using a sleep tracker to maintain or improve sleep quality, it needs to be convenient to use on a daily basis and, most importantly, the sleep data provided by the sleep tracker needs to be accurate. Finally, the intervention services required may vary depending on my sleep condition.

    In particular, it's important for the reliability of the sleep tracker that it doesn't incorrectly record periods of wakefulness when I'm clearly waking up during sleep, or consistently label periods when I'm transitioning between sleep stages as waking. A sleep tracker that does not accurately record the time it takes to fall asleep, so that you lie in bed for a long time without registering it as waking, would be difficult to use. If the accuracy of wake tracking is taken as a basic premise, and the tracker can consistently and accurately track the other sleep stages, users can have more confidence in the sleep tracker.

    It's worth noting that the nearby device, the Amazon Halo Rise, also performed well at detecting wakefulness. However, this sleep tracker is not the most user-friendly. It is sensitive to the placement and height of the device. For accurate measurements, the device needs to be positioned about 30-50 cm from the left chest during sleep. On the other hand, SleepRoutine, which uses the smartphone's microphone to measure respiratory sounds, is less constrained by device placement or orientation. The capabilities of smartphone microphones are constantly improving, allowing a wider range to detect the user's breathing sounds.

    [Table] Results of measuring the performance of each sleep tracker for awakenings during sleep

    [Table] Results of measuring the performance of each sleep tracker for awakenings during sleep

    Why wake detection is so important

    In order for us to say that we've had a "good night's sleep", several conditions need to be met. Put simply, these include <1> Total sleep time: The total number of hours slept <2> Actual sleep time compared to time spent in bed (sleep efficiency): How much time you actually spent sleeping in bed <3> Maintaining all four stages of sleep evenly while maintaining a proper sleep cycle: Ensuring that you have gone through the stages of Wake, Light sleep, Deep sleep and REM sleep evenly, completing a full sleep cycle.

    Most sleep trackers provide these three types of data. Among them, the four-stage hypnogram of sleep is commonly offered by the 11 sleep trackers used in this study. However, as you've seen, devices that accurately monitor each stage of sleep equally are quite rare.

    As mentioned earlier, a sleep tracker that accurately detects 'wakefulness' can be the first step in treating insomnia by determining whether a user's sleep efficiency is good or bad. It's not just about the subjective feeling of "I slept well" or "I didn't sleep well", but knowing the "objective sleep duration". This knowledge can guide you to the right cognitive-behavioural therapy, even from a medical professional.

    The weather service needs to predict 'rainy days' accurately, not just 'sunny days', to increase reliability. Similarly, for sleep trackers used during sleep, it's important to go beyond just saying "what kind of sleep you had" and provide information about "how many times you woke up" during sleep. Knowing how many times you woke up during the night is crucial to finding the right amount of sleep and effectively managing your sleep condition. If you're wondering which sleep tracker to choose to help you find your ideal amount of sleep and manage your sleep state, it's important to check that the device accurately detects waking events.