9 min read

About the ISRUC-Sleep dataset

The Insitute of Systems and Robotics from Coimbra (ISRUC) publishes a full sleep dataset under open-access terms [1]. A dedicated website lists available files and documentation for all to download and use freely. As for many sleep resources like SleepEDF [2] or Sleepdata.org [3] [4], signals, scoring and metadata are distributed in separate files and differents formats.

This article suggests a few methods to download, read and visualise the ISRUC sleep dataset. Exemples are implemented using the R programming language [5].

Downloading data

Signals and scoring

Records archives contains signals in EDF format [6] as long as scoring in Excel files. 3 recordings subgroups of differents size and characteristics are available:

  • Subgroup 1: 100 records from 100 subjects, many with sleep apnea.
  • Subgroup 2: 8 subjects with 2 records each, to study changes between records.
  • Subgroup 3: 10 records from 10 healthy subjects.

The following R routine downloads and expands archives from the 3 subgroups.

target <- "../../sources/isruc-sleep-dataset/"

# Subgroups populations
pops <- list(c(1:100),c(1:8),c(1:10))
  
for (subgroup in c(1:3)){
  pop <- pops[[subgroup]]
  dir <- paste0(target,subgroup,"/")
  if(!dir.exists(dir)){
    dir.create(dir)
  }
  subchars = paste(rep("I",subgroup),
                   collapse = "")
  url <- paste0(
    "http://sleeptight.isr.uc.pt/",
    "ISRUC_Sleep/ISRUC_Sleep/subgroup",
    subchars,"/"
  )
  for (i in pop){
    filename <- paste0(i,".rar")
    filePath <- paste0(dir,filename)
    if (!file.exists(filePath)){
      furl <- paste0(url,i,".rar") 
      download.file(url = furl,
                    destfile = filePath)
      system(paste0("unrar x ",filePath,
                    " ",dir))
    }
  }
}

Metadata

Metadata, or biodata, is distributed in Excel files. Again, a small R routine can easily download these files.

for(i in c(1:3)){
  subchars = paste(rep("I",i),
                   collapse = "")
  path <- paste0(target,i,"/metadata.xlsx")
  url <- paste0("http://sleeptight.isr.uc.pt/",
                "ISRUC_Sleep/ISRUC_Sleep/",
                "Details/Details_subgroup_",
                subchars,"_Submission.xlsx")
  if(!file.exists(path)){
    download.file(url,destfile = path)
  }
}

Reading data

Once all the data downloaded, it can be read. Signals, scoring and metadata can be linked using subgroup, subject and record identifiers.

Reading stages

Reading scored stages provides an easy way to take a first look at the data. For analysis purposes, sleeps records are traditionaly splited into 30 seconds epochs. These 30 seconds epochs are scored between 5 categories, following the American Association for Sleep Medicine (AASM) manual [7].

  • W: Wake, the wake stage.
  • REM: Rapid Eye Movement stage, or paradoxical sleep.
  • N1: Non-REM Sleep 1, a transitional stage between sleep and wake.
  • N2: Non-REM Sleep 2, the most encountered sleep stage.
  • N3: Non-REM Sleep 3, deep sleep.

The following code chunk reads all the Excel files containing scored sleep stages.

library(magrittr)

records <- list(1,c(1:2),c(1))

stages <- data.frame("epoch"=numeric(),
                     "stage"=character(),
                     "subgroup"=numeric(),
                     "subject"=numeric())
for(i in c(1:length(pops))){
  for(j in pops[[i]]){
    for(k in records){
      if(i == 2){
        scoringPath <- paste0(target,i,"/",j,"/",k,"/",k,"_1.xlsx")
      } else{
        scoringPath <- paste0(target,i,"/",j,"/",j,"_1.xlsx")
      }
    }
    rStages <- readxl::read_xlsx(scoringPath)
    if("Epoch" %in% colnames(rStages)){
      rStages <- rStages %>%
        dplyr::rename(epoch = Epoch,stage = Stage) %>%
        dplyr::select(epoch,stage) %>%
        dplyr::mutate(subgroup = i,
                      subject = j,
                      record = k)
      stages <- rbind(stages,rStages)
    }
   }
}

Unfortunately, a few files are uncorrectly formatted and can’t be read without correction. Script here exclude these files and incorrect rows.

Many other events can be read from the scoring files. The provided documentation lists available data.

Displaying an hypnogram

The hypnogram allow a quick visulization of a sleep record. A graph displays sleep stages from wake to N3 over the course of the night.

stages %>% 
  dplyr::mutate(stage = 
    factor(stages$stage,
           levels = c("N3","N2","N1","R","W"))) %>%
  dplyr::filter(subject == 1, subgroup == 1,
                record == 1, !is.na(stage)) %>%
  ggplot2::ggplot(
    ggplot2::aes(x=epoch,y=stage, group=1)) +
  ggplot2::geom_line() + ggplot2::theme_bw() +
  ggplot2::xlab("Epoch number") + ggplot2::ylab("")
Hypnogram, subgroup 1, subject 1, record 1.

Figure 1: Hypnogram, subgroup 1, subject 1, record 1.

Metadata

3 metadata files, one for each subgroup, contain many informations about the subject, the record and the recording conditions.

library(magrittr)
metadata <- readxl::read_xlsx(
  paste0(target,"1/metadata.xlsx"),skip=2) %>%
  dplyr::mutate(subgroup = 1,
                Subject = as.character(Subject)) %>%
  dplyr::bind_rows(
    readxl::read_xlsx(
      paste0(target,"2/metadata.xlsx"),skip=2) %>%
      dplyr::mutate(subgroup = 2,
                    Age = as.character(Age)) %>%
      dplyr::filter(!is.na(Subject))
  ) %>%
  dplyr::bind_rows(
    readxl::read_xlsx(
      paste0(target,"3/metadata.xlsx"),skip=2) %>%
      dplyr::mutate(subgroup = 3) %>%
      dplyr::mutate(Subject = as.character(Subject),
                    Age = as.character(Age))
  )

From the gloabal metadata files can be plotted availables stages, records durations and subjects age distributions across the whole database. These informations help assert the database quality.

Figure 2: Sleep stages distribution by subgroup.

Figure 3: Records duration by subgroup.

Figure 4: Subject age by subgroup.

?blogdown::serve_site(baseurl = “0.0.0.0”)

References

[1] S. Khalighi, T. Sousa, J.M. Santos, U. Nunes, ISRUC-Sleep: A comprehensive public dataset for sleep researchers, Computer Methods and Programs in Biomedicine. 124 (2016) 180–192. doi:10.1016/j.cmpb.2015.10.013.

[2] B. Kemp, A. Zwinderman, B. Tuk, H. Kamphuisen, J. Oberye, Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the EEG, IEEE Transactions on Biomedical Engineering. 47 (2000) 1185–1194. doi:10.1109/10.867928.

[3] D.A. Dean, A.L. Goldberger, R. Mueller, M. Kim, M. Rueschman, D. Mobley, S.S. Sahoo, C.P. Jayapandian, L. Cui, M.G. Morrical, S. Surovec, G.-Q. Zhang, S. Redline, Scaling Up Scientific Discovery in Sleep Medicine: The National Sleep Research Resource, Sleep. 39 (2016) 1151–1164. doi:10.5665/sleep.5774.

[4] G.-Q. Zhang, L. Cui, R. Mueller, S. Tao, M. Kim, M. Rueschman, S. Mariani, D. Mobley, S. Redline, The National Sleep Research Resource: towards a sleep data commons, Journal of the American Medical Informatics Association. (2018). doi:10.1093/jamia/ocy064.

[5] K. Hornik, The Comprehensive R Archive Network, 4 (2012) 394–398. doi:10.1002/wics.1212.

[6] B. Kemp, J. Olivan, European data format ’plus’ (EDF+), an EDF alike standard format for the exchange of physiological data, Clinical Neurophysiology. 114 (2003) 1755–1761. doi:10.1016/S1388-2457(03)00123-8.

[7] R.B. Berry, R. Brooks, C.E. Gamaldo, S.M. Harding, C.L. Marcus, B.V. Vaughn, The AASM Manual for the Scoring of Sleep and Associated Events, 2013. doi:10.1017/CBO9781107415324.004.