Literature searches and data extraction

The content of MDSGene is based on genetic as well as phenotypic and clinical data extracted from the relevant literature following systematic screens of different resources (Lill et al, 2016; Mashychev et al, in preparation; Hartmann et al, in preparation). Eligible articles need to be written in English and published in peer-reviewed journals. They are identified following systematic PubMed searches based on standardized search terms comprising the name of the genes (including aliases) and the disease/syndrome of interest. Demographic, clinical and genetic data are extracted adhering to a standardized data extraction protocol. Diagnoses displayed in MDSGene follow the recent recommendations of the International Parkinson disease and Movement Disorder (MDS) Task Force of Genetic Nomenclature in Movement Disorders (Marras et al, 2016). Whenever necessary, mutations are remapped to the human genome build 19, and mutation identifiers are renamed according to the Human Genome Variation Society (HGVS) nomenclature.

Pathogenicity scoring

Potential pathogenicity of reported mutations is classified as “possible”, “probable”, or “definite” based on the following criteria: i) co-segregation with disease in the reported pedigrees and/or the number of reported mutation carriers, ii) frequency in ~65,000 ethnically diverse individuals from the ExAC (Exome Aggregation consortium) browser, iii) CADD (“Combined Annotation Dependent Depletion") score as an in-silico measure of deleteriousness of genetic variants (Kircher et al, 2014), and iv) reported molecular evidence from in-vivo and/or in-vitro studies. Each evidence domain was divided into four categories each accumulating specific “points”, weighted by category (see Table 1). Evidence domains “co-segregation with disease” and “presence of mutation-specific positive functional data” received the strongest weights in the pathogenicity grading. Finally, points were summed across categories and pathogenicity was graded as follows: benign (<5 points), possibly pathogenic (5-9 points), probably pathogenic (10-14 points), definitely pathogenic (>14 points). Reported genetic variants that have been classified as benign or risk polymorphisms using this scoring algorithm are not included in MDSGene.

Table 1. Pathogenicity scoring scheme implemented in MDSGene:

Evidence Segregation Frequency (ExAC) In-silico prediction (CADD score) Functional studies
Least Only a single heterozygous patient
(0 points)
≥0.01
(0 points)
<10
(0 points)
Only negative reports or absence of studies
(0 points)
Suggestive Homozygous patient or ≥2 single heterozygous patients or 1 family (i.e. ≥2 affected mutation carriers)
(2 points)
0.001-0.009
(1 point)
10-14
(1 point)
1 positive study
(2 points)
Strong 2 families
(3 points)
0.0001-0.0009
(2 points)
15-20
(3 points)
2 positive studies or null allele
(4 points)
Highest >2 families
(6 points)
<0.0001
(3 points)
>20
(5 points)
>2 positive studies
(6 points)
Database implementation

Similar to other genetic databases developed and maintained by our group (e.g. PDGene available at www.pdgene.org) (Lill et al, 2012), MDSGene was implemented using a relational database scheme allowing for efficient management and query of the available data.

Abbreviations/conventions used throughout MDSGene

AAO = age at onset
A = Asian (ethnicity)
AM = Native American (ethnicity)
B = Arab (ethnicity)
C = Caucasian (ethnicity)
comp. het. = compound heterozyous
D = African descent (ethnicity)
het = heterozygous
H = Hispanic (ethnicity)
hom = homozyogus
I = Indian (ethnicity)
JA = Jewish (Askhenazi; ethnicity)
JO = Jewish (non-Askhenazi/mixed/other; ethnicity)
N = number
n.g. = not given
O = other/mixed (ethnicity)
PARK = Parkinson’s disease
PFBC = primary familial brain calcification
SD = standard deviation

Note that the full names of the official gene names can be found in the EntrezGene database. In addition, country names have been abbreviated according to the official 3-letter codes recommended by the International Organization for Standardization (ISO).

References

Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 2014;46:310–5

Lill CM, Roehr JT, McQueen MB, Kavvoura FK, Bagade S, Schjeide B-MM, Schjeide LM, Meissner E, Zauft U, Allen NC, Liu T, Schilling M, Anderson KJ, Beecham G, Berg D, Biernacka JM, Brice A, DeStefano AL, Do CB, Eriksson N, Factor SA, Farrer MJ, Foroud T, Gasser T, Hamza T, Hardy JA, Heutink P, Hill-Burns EM, Klein C, Latourelle JC, Maraganore DM, Martin ER, Martinez M, Myers RH, Nalls MA, Pankratz N, Payami H, Satake W, Scott WK, Sharma M, Singleton AB, Stefansson K, Toda T, Tung JY, Vance J, Wood NW, Zabetian CP, 23andMe Genetic Epidemiology of Parkinson’s Disease Consortium, International Parkinson’s Disease Genomics Consortium, Parkinson’s Disease GWAS Consortium, Wellcome Trust Case Control Consortium 2), Young P, Tanzi RE, Khoury MJ, Zipp F, Lehrach H, Ioannidis JPA, Bertram L. Comprehensive research synopsis and systematic meta-analyses in Parkinson’s disease genetics: The PDGene database. PLoS Genet 2012;8:e1002548.

Lill CM, Mashychev A, Hartmann C, Lohmann K, Marras C,. Lang AE, Klein C, Bertram L . Launching the Movement Disorders Society Genetic Mutation Database (MDSGene). Mov Disord, 2016 May;31(5):607-9

Marras C, Lang A, van de Warrenburg BP, Sue CM, Tabrizi SJ, Bertram L, Mercimek-Mahmutoglu S, Ebrahimi-Fakhari D, Warner TT, Durr A, Assmann B, Lohmann K, Kostic V, Klein C. Recommendations of the International Parkinson and Movement Disorder Society Task Force on Nomenclature of Genetic Movement Disorders. Mov Disord 2016;31(4):436-57.