9 min read

About the ISRUC-Sleep dataset

The Insitute of Systems and Robotics from Coimbra, Portugal (ISRUC) publishes a full sleep dataset under open-access terms [1]. A dedicated website lists available files and documentation for all to download and use freely. As for many sleep resources like SleepEDF [2] or Sleepdata.org [3] [4], signals, scoring and metadata are distributed in separate files and differents formats.

This article provides an easy workflow to download, read and visualise the ISRUC sleep dataset. Examples are implemented using the R programming language [5] and the SleepR library [6] .

Downloading data

3 recordings subgroups of differents size and characteristics are available:

  • Subgroup 1: 100 records from 100 subjects, many with sleep apnea.
  • Subgroup 2: 8 subjects with 2 records each, to study changes between records.
  • Subgroup 3: 10 records from 10 healthy subjects.

Records archives contains signals in EDF format [7] as long as scoring in Excel files. Metadata, or biodata, are distributed in Excel files. To download all A useful Sleepr function downloads and expands archives from the 3 subgroups.

# Install latest SleepR version.
# Dowload ISRUC dataset

Reading and plotting sleep stages

Reading scored stages provides an easy way to take a first look at the data. For analysis purposes, sleeps records are traditionaly splited into 30 seconds epochs. These 30 seconds epochs are scored between 5 stages, following the American Association for Sleep Medicine (AASM) manual [8].

  • AWA: Wake, the wake stage.
  • REM: Rapid Eye Movement (REM) stage, or paradoxical sleep.
  • N1: Non-REM Sleep 1, a transitional stage between sleep and wake.
  • N2: Non-REM Sleep 2, the most encountered sleep stage.
  • N3: Non-REM Sleep 3, deep sleep.

A hypnogram visualize these stages through the course of the night. Traditionaly, REM sleep is colored in red. The following code chunk plot the hypnogram from the first record from the first subgroup of the ISRUC database. The patient here suffers from obstructive sleep apnea, hence a fragmented sleep with many wake epochs.

hypnogram <- sleepr::read_events_isruc("1/1",1)

Hypnogram, subgroup 1, subject 1, record 1.

Figure 1: Hypnogram, subgroup 1, subject 1, record 1.

Unfortunately, a few files are uncorrectly formatted and can’t be read without correction. Script here exclude these files and incorrect rows.

Many other events can be read from the scoring files. The provided documentation lists available data.

Database metadata and statistics

3 metadata files, one for each subgroup, contain many informations about the subject, the record and its recording conditions.

metadata <- sleepr::read_isruc_metadata("./")

From the global metadata files can be plotted availables stages, records durations and subjects age distributions across the whole database. These informations help assert the database quality.

Stages distribution

Figure 2: Sleep stages distribution by subgroup.

A sleep database analysis starts by plotting stages distribution, or the number of epochs by sleep stages. Usually, N2 counts for the most epochs. Moreover, a large number of wake epochs could imply too long records, requiring further investigation.

Records duration

Figure 3: Records duration by subgroup.

Records durations should be consistent across the database. Vizualizing the distribution highlights outliers requiring attention.

Subjects age

Figure 4: Subject age by subgroup.

Sleep analysis must take subjects age into account, as sleep evolves throughout lifespan [9]. Over-representation of an age class will enlight class specific features. On the contrary, scattered ages can give an overbroad analysis.


[1] S. Khalighi, T. Sousa, J.M. Santos, U. Nunes, ISRUC-Sleep: A comprehensive public dataset for sleep researchers, Computer Methods and Programs in Biomedicine. 124 (2016) 180–192. doi:10.1016/j.cmpb.2015.10.013.

[2] B. Kemp, A. Zwinderman, B. Tuk, H. Kamphuisen, J. Oberye, Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the EEG, IEEE Transactions on Biomedical Engineering. 47 (2000) 1185–1194. doi:10.1109/10.867928.

[3] D.A. Dean, A.L. Goldberger, R. Mueller, M. Kim, M. Rueschman, D. Mobley, S.S. Sahoo, C.P. Jayapandian, L. Cui, M.G. Morrical, S. Surovec, G.-Q. Zhang, S. Redline, Scaling Up Scientific Discovery in Sleep Medicine: The National Sleep Research Resource, Sleep. 39 (2016) 1151–1164. doi:10.5665/sleep.5774.

[4] G.-Q. Zhang, L. Cui, R. Mueller, S. Tao, M. Kim, M. Rueschman, S. Mariani, D. Mobley, S. Redline, The National Sleep Research Resource: towards a sleep data commons, Journal of the American Medical Informatics Association. (2018). doi:10.1093/jamia/ocy064.

[5] K. Hornik, The Comprehensive R Archive Network, 4 (2012) 394–398. doi:10.1002/wics.1212.

[6] P. Bouchequet, SleepR, (2018). https://github.com/boupetch/sleepr.

[7] B. Kemp, J. Olivan, European data format ’plus’ (EDF+), an EDF alike standard format for the exchange of physiological data, Clinical Neurophysiology. 114 (2003) 1755–1761. doi:10.1016/S1388-2457(03)00123-8.

[8] R.B. Berry, R. Brooks, C.E. Gamaldo, S.M. Harding, C.L. Marcus, B.V. Vaughn, The AASM Manual for the Scoring of Sleep and Associated Events, 2013. doi:10.1017/CBO9781107415324.004.

[9] M.M. Ohayon, M.A. Carskadon, C. Guilleminault, M.V. Vitiello, Meta-analysis of quantitative sleep parameters from childhood to old age in healthy individuals: Developing normative sleep values across the human lifespan, Sleep. 27 (2004) 1255–1273. doi:10.1093/sleep/27.7.1255.