Database Open Access
neuroQWERTY MIT-CSXPD Dataset
Published: Dec. 20, 2016. Version: 1.0.0
L. Giancardo, A. Sánchez-Ferro, T. Arroyo-Gallego, I. Butterworth, C. S. Mendoza, P. Montero, M. Matarazzo, J. A. Obeso, M. L. Gray, R. San José Estépar. Computer keyboard interaction as an indicator of early Parkinson's disease. Scientific Reports 6, 34468; doi: 10.1038/srep34468 (2016)Please include the standard citation for PhysioNet:
Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PCh, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals (2003). Circulation. 101(23):e215-e220.
The neuroQWERTY MIT-CSXPD database contains keystroke logs collected from 85 subjects with and without parkinsons disease (PD). This dataset has been collected and analyzed in order to indicate that the routine interaction with computer keyboards can be used to detect motor signs in the early stages of PD.
The subjects were recruited from two movement disorder units in Madrid (Spain) following the institutional protocols approved by the Massachusetts Institute of Technology, USA (Committee on the Use of Humans as Experimental Subjects approval no. 1402006203), Hospital 12 de Octubre, Spain (no. CEIC:14/090) and Hospital Clinico San Carlos, Spain (no. 14/136-E).
Each data file collected includes the timing information collected during the sessions of typing activity using a standard word processor on a Lenovo G50-70 i3-4005U with 4MB of memory and a 15 inches screen running Manjaro Linux. Subjects were instructed to type as they normally would do at home and they were left free to correct typing mistakes only if they wanted to. The key acquisition software presented a temporal resolution of 3/0.28 (mean/std) milliseconds.
There are two datasets collected from two sets of experiments:
- PD_MIT-CS1PD - 31 subjects. 13 healthy controls and 18 PD sufferers. Subjects were asked to visit a movement disorder unit twice to complete the study. Therefore each subject's data is stored in 2 csv files.
- PD_MIT-CS2PD - 54 subjects. 30 healthy controls and 24 PD sufferers. Subjects were asked to visit a movement disorder unit once to complete the study.
Along with the raw typing collections, clinical evaluations were also performed on each subject, including UPDRS and finger tapping tests. See the referenced publication for more details.
The data from each of the two experiment sets are split into their own subdirectories. Each dataset contains a subject summary csv file
GT_DataPD_MIT-CSXPD.csv which lists for each subject:
- pID - Patient ID
- gt - Ground truth label of whether or not they had PD
- updrs108 - Unified Parkinson’s Disease Rating Scale part III (UPDRS-III)
- afTap - Alternating finger tapping result
- sTap - Single key tapping result
- nqScore - neuroQWERTY index (nQi)
- Typing speed
- file_n - The csv file(s) containing the patient's typing data
Each keystroke data csv file has four columns which give:
- The key pressed.
- The hold duration in seconds.
- The key release time in seconds from time 0.
- The key press time in seconds from time 0.
neuroQWERTY.zip file includes all of the data along with the scripts described in the next section.
nqDataLoader.py python module contains functions used to filter anomalous results and load the data from the csv data files. The
readme.ipynb ipython notebook uses these functions and demonstrates how to load and display the data.
These datasets have been collected as part of the neuroQWERTY project at the Massachusetts Institute of Technology thanks to the financial support by the Comunidad de Madrid, Fundacion Ramon Areces and The Michael J Fox Foundation for Parkinson's research (grant number 10860). We thank the M + Vision faculty for their guidance in developing this project. We also thank our many clinical collaborators at MGH in Boston, at “12 de Octubre”, Hospital Clinico and Centro Integral en Neurociencias HM CINAC in Madrid for their insightful contributions.
Anyone can access the files, as long as they conform to the terms of the specified license.
License (for files):
Open Data Commons Attribution License v1.0
Files on Google Cloud
Click here to view the files in the Google Cloud Console. Login with a Google account is required.
Total uncompressed size: 7.3 MB.Download Zip (7.3 MB)