Correlated expression analysis of genes implicated in schizophrenia: Identification of putative disease-related pathways

Schizophrenia (SCZ) is a severe psychiatric disorder affecting 0.7% of the population [1]. When inadequately treated, subjects with SCZ experience symptoms that render them dysfunctional and unable to discern aspects of reality. Despite the fact that the majority of subjects with SCZ are sporadic cases and do not have a known family history of SCZ, a family history is one of the largest risk factors for developing SCZ [2–4]. A large genome-wide association study (GWAS) recently pinpointed 108 significant loci within the human genome that contribute to SCZ pathogenesis [5]. While some loci include genes that have been previously implicated in SCZ, the majority, due to the unbiased nature of the genetic investigation, include genes with unknown relevance to SCZ. This investigation is based on the premise that: 1) at least one of the genes at the 108 loci contribute to SCZ etiology; 2) some of the genes contributing to SCZ etiology are in a common pathway; and 3) some genes in a common pathway will have correlated gene expression. Gene expression data available in the gene expression omnibus (GEO) was used to identify correlated expression among the 369 genes (853 isoforms) found at the 108 loci associated with SCZ. Expression data came from bone marrow CD34+ selected cells isolated from 66 individuals (GSE4619). First, correlation among genes related to the DRD2 pathway was analyzed to test the hypothesis that some SCZ genes are in a common pathway and have correlated expression. Then, two pathways were generated based on correlated expression of genes at the 108 loci. One pathway consisted of the largest number of genes with correlated expression (56) and included four genes from the DRD2 pathway and seven of the 33 genes that were previously implicated in SCZ. The second pathway, a novel pathway of 12 genes, was constructed by excluding both the 33 genes that were previously implicated in SCZ and other genes that exhibited significantly correlated expression with these 33 genes. Correlated expression analysis among SCZassociated genes at the 108 loci provides evidence implicating those genes with correlated expression in SCZ pathogenesis. In addition, these analyses will facilitate pathway identification creating starting points for targeted experiments to verify or refute the hypothetical pathways generated here. Ultimately identifying the genes and pathways at the 108 loci involved in SCZ genesis will inform novel pharmaceutical development for treatment and prevention of SCZ. Focal points:

A B S T R A C T Schizophrenia (SCZ) is a severe psychiatric disorder affecting 0.7% of the population [1]. When inadequately treated, subjects with SCZ experience symptoms that render them dysfunctional and unable to discern aspects of reality. Despite the fact that the majority of subjects with SCZ are sporadic cases and do not have a known family history of SCZ, a family history is one of the largest risk factors for developing SCZ [2][3][4]. A large genome-wide association study (GWAS) recently pinpointed 108 significant loci within the human genome that contribute to SCZ pathogenesis [5]. While some loci include genes that have been previously implicated in SCZ, the majority, due to the unbiased nature of the genetic investigation, include genes with unknown relevance to SCZ. This investigation is based on the premise that: 1) at least one of the genes at the 108 loci contribute to SCZ etiology; 2) some of the genes contributing to SCZ etiology are in a common pathway; and 3) some genes in a common pathway will have correlated gene expression. Gene expression data available in the gene expression omnibus (GEO) was used to identify correlated expression among the 369 genes (853 isoforms) found at the 108 loci associated with SCZ. Expression data came from bone marrow CD34+ selected cells isolated from 66 individuals (GSE4619). First, correlation among genes related to the DRD2 pathway was analyzed to test the hypothesis that some SCZ genes are in a common pathway and have correlated expression. Then, two pathways were generated based on correlated expression of genes at the 108 loci. One pathway consisted of the largest number of genes with correlated expression (56) and included four genes from the DRD2 pathway and seven of the 33 genes that were previously implicated in SCZ. The second pathway, a novel pathway of 12 genes, was constructed by excluding both the 33 genes that were previously implicated in SCZ and other genes that exhibited significantly correlated expression with these 33 genes. Correlated expression analysis among SCZassociated genes at the 108 loci provides evidence implicating those genes with correlated expression in SCZ pathogenesis. In addition, these analyses will facilitate pathway identification creating starting points for targeted experiments to verify or refute the hypothetical pathways generated here. Ultimately identifying the genes and pathways at the 108 loci involved in SCZ genesis will inform novel pharmaceutical development for treatment and prevention of SCZ. Focal points: • Bench 1. Genes at the 108 loci implicated in SCZ genesis in the largest genome wide association study to date were grouped according to correlated expression to identify genes potentially involved in the same pathways.

Introduction
Schizophrenia (SCZ) is characterized by the hallmark symptom, psychosis -positive symptoms-which may include hallucinations, delusions, incoherent speech and disorganized thoughts resulting in a failure to discern aspects of reality [6]. In addition, SCZ may be expressed through a combination of -negative symptoms-that may include social withdrawal and lack of both motivation and engagement as well as -cognitive symptoms-that may include diminished attention, difficulties understanding information and poor decisionmaking (diminished executive function) [6]. Despite the success of antipsychotics in the treatment of psychosis, SCZ remains a severely disabling disorder. With medication, 60% of patients with SCZ improve long-term, though the majority still remain disabled with only 10-20% of people with SCZ finding competitive employment [7][8][9]. Relapse overwhelms support from family and friends, disorients patients, increases the risk of accidental injury or death, and may advance brain pathology [10,11]. Relapse is a barrier to successful treatment and is often caused by medication noncompliance due to both intolerable side effects and lack of efficacy [8,[10][11][12][13]. Novel therapeutic interventions are desperately needed in order to address the unmet needs of people with SCZ.
Psychotic symptoms are controlled with antipsychotics primarily through antagonist blockage at dopamine D2-like receptors (D2, D3 and D4) in the striatum, specifically the ventral striatum, including the nucleus accumbens [14][15][16][17][18]. Despite the overwhelming pharmacological evidence supporting dopamine D2 receptor's integral role in SCZ, genetic evidence for the dopamine D2 receptor (DRD2) gene's involvement in SCZ genesis was inconsistent until recently [19][20][21][22]. With recent advances in genomic technologies along with larger sample sizes, there has been a renewed interest in studying the genetic risks and basis for SCZ. Large genome-wide association studies (GWAS) in large samples have identified 108 common single nucleotide polymorphisms (SNPs) loci variations that show statistical associations with SCZ, including an association with DRD2 [5].
These 108 loci implicated in SCZ are expected to inform development of novel therapeutic targets for pharmaceutical intervention. However, these loci individually impart negligible genetic risk thereby limiting their contribution to predicting the likelihood of a person developing SCZ, often an immediate benefit of genetic discovery. Instead, despite small genetic influence on risk, effective targeting of some of the genes conveying disease susceptibility at these loci may in fact benefit most subjects with SCZ, as is the case with DRD2. The DRD2 association with SCZ was actually the T allele at the rs2514218 (AT) locus with an OR of 1.073 [5]. The function of rs2514218 is unknown because it maps 5′, outside of the coding region of the DRD2 at 11q23.2 and is in linkage disequilibrium with many other genetic variants. However, it has been associated with antipsychotic treatment response and impaired striatal functioning [23][24][25]. With the goal of facilitating discovery of new therapeutic targets, the next steps are to understand the function of the genes at these 108 loci and what pathways they contribute to. The underlying premises of this investiga-tion are that: 1) some genes involved in the same disorder may lie in the same pathway; and 2) some genes within the same pathway may have correlated gene expression.
These studies can ultimately highlight new genetic and protein pathways that may overcome the challenges of current medications used for treatment. At present, SCZ treatment remains mechanistically based on antipsychotics developed over a half-century ago. Mapping out potential SCZ pathways from the 108 loci and their associated genes may jump-start the development of more specific and targeted treatments that may provide novel ways to treat symptoms of SCZ. Based on correlated expression of 369 genes at the 108 loci, this paper presents two possible new pathways in relation to: 1) 33 genes previously implicated in SCZ (i.e. DRD2); 2) genes at loci that have been implicated in SCZ after the initial publication; and 3) genes at loci where function is unknown.

GEO data set
The "CD34+ cells isolated from bone marrow" data set GDS 2118 (GSE4619; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE4619) was chosen due to the large sample size and the expectation that many genes would be expressed at detectable levels in stem cells. Pellagatti et al., 2006 provides detailed methods for acquisition and processing of the CD34+ dataset. Briefly, bone marrow samples were obtained from 55 patients with myelodysplastic syndrome (MDS) and 11 healthy controls, from which CD34+ cells were isolated using magnetic cell separation columns. Total RNA was isolated using TRIZOL® (Invitrogen, Paisley, United Kingdom) then amplified (2-Cycle cDNA Synthesis) and labeled (2-Cycle Target Labeling and Control Reagent packages (Affymetrix, Santa Clara, CA) before hybridization to GeneChip Human Genome U133 Plus 2.0 arrays (Affymetrix Santa Clara, CA) [26]. Affymetirx CEL files were processed with Robust MultiChip Analysis (RMA). Expression data was calculated and scaled using GeneChip Operating Software (GCOS).

Retrieving the data
The retrieval method used the query data set for GSE4619 (55 patients and 11 controls). This method consisted of the code written in MATLAB using the Bioinformatics Toolbox (v.7.14. R2015a, MathWorks, Natik, MA, USA) Geo_geneExtract (Github:https:// github.com/neuropil/Geo_geneExtract/), first retrieving ID numbers of isoforms associated with each gene found in the 108 loci. Isoforms are different transcripts from the same gene. First, each gene was identified in the GEO database. Genes at the 108 loci were identified in the Supplementary Table 3: Bioinformatics summary data for 108 genome-wide significant loci [5]. Then finding the expression value for each isoform ID in each of the sample sets found in GEO. Each isoform correlates with an expression value that was then put into a spreadsheet demonstrating that expression value with the sample, gene name, associated loci, and isoform ID. To access this data set: (a) go to the GEO home page, (b) enter the data set number in the GEO accession box, (c) Click 'GO' (d) the samples can be accessed under the sample heading on this page.

Processing the data
To process these data to find statistically significant results, we established stringent criteria to highlight the most important correlations. The statistical methodology applied was Spearman correlations on the pairs of gene expressions. A Benjamini-Hochberg-Yekutieli procedure was applied to control the false discovery rate (FDR) at 0.05. In addition, we used SAS to organize data, apply thresholds, and facilitate reporting summary statistics used to generate pathways.

Identifying genes in the DRD2 pathway
In order find evidence supporting the theories that some genes contributing to SCZ will be part of the same pathway and that some of the genes in the same pathway will have correlated gene expression, genes from the 108 loci were sought that may be part of the DRD2 pathway and the correlation of their expression was examined. Using protein sequence and a functional summary databases, we included and expanded upon the proteins and genes in the DRD2 pathway identified from Beaulieu et al. (2009) [27].

Identifying the gene isoforms with the largest numbers of correlations
To link genes to a pathway, significant correlations had to meet the following criteria: FDR (alpha =0.05), absolute value of rho (ρ) ≥│0.682│ (the top 10% of correlation values), and not involve isoforms from the same gene. Through this process, the gene with the highest number of correlations with other genes at the 108 loci was identified. The top correlated gene isoform and all the genes isoforms correlated to it were included in the pathway.

Developing the novel pathway
The goal was to identify, among the candidate genes at the 108 loci, a pathway of genes not previously implicated in SCZ according to the 2014 GWAS study. [5] The pathway was based on correlated gene isoform expression of gene isoforms at the 108 loci. Selected correlations must be: 1) statistically significant per FDR (alpha =0.05); 2) have a correlation of greater than or equal in absolute value of 0.5 (any higher threshold would have produced too few genes to construct pathways from); and 3) exclude genes (isoforms) previously implicated in SCZ or genes (isoforms) correlated with 33 previously known genes (rho (ρ) ≥│0.5│). Depiction of pathways generated based on correlated gene expression were constructed in QIAGEN's Ingenuity® Pathway Analysis (IPA Qiagen, Redwood City, www.qiagen.com/ingenuity).

Characteristics of genes at the 108 loci
According to the original publication, there were 35 genes found within the 108 loci that had previously been implicated in SCZ genesis with known functions. Of the 35 genes previously known, only 33 were on the gene chip and were included in the analysis. These 33 genes were considered "previously known" and were highlighted in the pathway generated with the largest number of correlations [5]. In addition, account for recent progress in characterizing other genes at the 108 loci, a list of "recently implicated" genes was created by searching PubMed and manuscripts that cited the GWAS study using the "cited by" feature. Genetic replication or independent studies were not included.

Genes and isoforms
Of 456 genes from all 108 loci, 91 genes were not included on the gene expression chip (Human Genome U133 Plus 2.0 arrays; Affymetrix Santa Clara, CA) and were not included in the analysis (91 genes from the HLA locus mapping to chromosome 6: 28303247-28712247 (from hg19) plus 365 genes from the Supplementary Table 3: Bioinformatic summary data for 108 genome-wide significant loci) [5]. Data from 369 genes with a total of 853 isoforms mapping to the 108 loci were used for 351,143 pair-wise correlations between isoforms not from the same locus or the same gene. Of these, 37,012 correlations were significant (rho (ρ) ≥│0.423│) among 812 isoforms of 364 genes.

DRD2 pathway correlated expression
Correlations within the DRD2 pathway were sought in order to validate the approach of using correlated gene expression to indicate which genes may be involved in the same schizophrenia pathway. Based on characterization of the DRD2 pathway [28], six genes at the 108 loci where analyzed for correlation with DRD2 ( Fig. 1, Table 1). Nuclear factors of activated T-cells, NFATC3, the protein, is regulated by glycogen synthase kinase-3 (GSK3) through phosphorylation along with β-catenin [29]. GSK3β and β-catenin are key molecules altered by antipsychotic suppression of the dopamine D2 receptor [28]. β-catenin, α-catenin (gene symbol CTNNA1) and δ-catenin (gene symbol CTNND1) are in the catenin family of genes. CTNND1 and CTNNA1 are each found at one of the 108 loci. While, it is unclear if CTNND1 functions as part of the DRD2 pathway, CTNNA1 interacts with βarrestin-2, the first step in the antipsychotic arm of the DRD2 pathway. AKT3 is a gene at one of the 108 loci that may be part of the DRD2 pathway. Due to its strong homology with AKT1, it may have a redundant function with AKT1 involving regulation of GSK3β.  [28]. Three arms are shown including the Ca+ release, cAMP generation, and ARRB2 (beta-arrestin) that is considered the arm related to antipsychotic action. Genes found at one of the 108 loci are in bold. *Significant correlations between DRD2 expression and NFATC, CTNNA1, CTNND1, and GRIN2A. Correlation with AKT3 (in the same gene family as AKT1 that is implicated in the DRD2 pathway) and PPP1R1B were not significant. These findings support the premise that some genes that contribute to schizophrenia etiology are in the same pathway and that some will have correlated gene expression. GRIN2A, an NMDA glutamine receptor also regulates GSK3β through AKT1 [30]. Finally, PPP1R13B interacts with PPP1CA (protein PP1A), a protein that is downstream of protein phosphatase 1 regulatory subunit 1B (PPP1R1B). PPP1R1B is regulated through both the calcium release and the cAMP arms of the DRD2 pathway ( Fig. 1).
Correlations between DRD2 and six other DRD2 pathway genes at the 108 loci are found in Table 1. Ten significant correlations were identified between expression of four DRD2 isoforms and six NFATC3 isoforms (Human Genome U133 Plus 2.0 arrays (Affymetrix Santa Clara, CA). Two significant correlations were identified among 24 examined between DRD2 isoforms and the six CTNNA1 isoforms. Five significant correlations were identified among 16 examined   between DRD2 isoforms and the four isoforms of CTNND1. Expression of the four DRD2 isoforms was not correlated with expression levels of six AKT3 isoforms or one PPP1R13B isoform. Two significant correlations were identified among 12 correlations examined between the three GRIN2A isoforms and the isoforms of DRD2. Concordant regulation at both the protein and gene expression level was identified between DRD2 and NFARC3. The finding of significant negative correlations in gene expression between DRD2 and NFATC3 is consistent with the regulation of proteins within the known DRD2 pathway. Increased activity with DRD2 agonist, dopamine, would result in β-arrestin/PP2A recruitment to the membrane and inhibition of AKT1 then lack of inhibition of GSK3β which would then inhibit NFATC (Fig. 1). Conversely, regulation of gene expression (positive correlation) compared to the negative correlation between protein activity was discordant between DRD2 with both CTNNA1, and CTNND1. It is important to note that lack of a significant correlation in gene expression does not mean the genes are not in the same pathway; proteins like GSK3β and AKT1 are regulated by phosphorylation in response to cellular signaling before gene expression is altered.

Identifying the SCZ candidate gene isoform with the largest numbers of correlations; identifying genes within the hypothetical TAC3 pathway
The top 10% of the significant correlations (rho (ρ) ≥│0.682│) resulted in 195 genes with 1391 correlations (1866 isoforms). Among the 195 genes (293 isoforms), TAC3 had the largest number of correlations with 66 other gene isoforms (55 genes; Fig. 2; Table 2). Together, the 56 genes (TAC3 and the genes correlated with it) map to 35 distinct loci along with the SNP used in the GWAS study (Table 2). Twenty-one loci included multiple genes allowing for the possibility that more than one gene may contribute to SCZ genesis at each locus. Based on the correlations of these 56 genes with each other, a hypothetical pathway was generated. The number of correlations among the genes within the TAC3 pathway were included in Table 2 and all correlations were displayed as connections between 56 genes (Fig. 2). Out of these 56 genes, four genes (CTNND1, DRD2, GRIN2A and NFATC3) were part of the DRD2 pathway (Fig. 1). Seven genes (FXR1, PTN, TLE1, DRD2, CHRNA3, GRIN2A, and KCNB1) were among 33 previously known genes implicated in SCZ. [5] Fourteen genes overlap with genes that have been characterized further as a result of three investigations following up on the 2014 GWAS findings [31][32][33]. In the first investigation, ten of the TAC3 pathway genes were among 108 genes at the 108 loci implicated in prenatal development of the dorsal lateral prefrontal cortex (DLPFC) compared to overall postnatal expression in DLPFC [33]. In the second investigation, three genes correlated with TAC3 (RGS6 (regulator of G-protein signaling), GRIN2A, and KCNB1 (a potassium channel)) were among four genes identified as enriched in cortex during several stages of development compared to other brain regions (amygdala, cerebellum, hippocampus, striatum and thalamus) using temporal-spatial expression analysis of 349 genes at the 108 SCZ risk loci (Table 2) [31]. The third investigation resulted in identification of five variants among the 108 SCZ-associated loci that may directly contribute to functional changes in gene expression in five genes (CNTN4, CLCN3, FURIN, TSNARE1, and SNAP91). Of the five genes implicated, TSNARE1 was the only TAC3 pathway gene (rho (ρ) ≥│0.682│ (the top 10% of correlation values)). However, with including all significant correlations (rho(ρ) =0.4306) the SNAP91 gene was correlated with 367 other genes and CNTN4 with 65 other genes. With including all significant correlations, SNAP91 and CNTN4 were correlated with TAC3, DRD2, GRIN2A and NFATC expression (data not shown). [31] In addition, FURIN expression was correlated with 65 other genes (including AKT3, CACNA1C, and CTNNA1); and CLCN3 with 10 genes (data not shown). A fourth study was included in the analysis that implicates HLADPA1 at the HLA locus through cis brain expression quantitative trait locus. [34] Using all significant correlations (rho(ρ)≥│0.4306│) HLADPA1 is correlated with 83 other genes including: 1) TAC3; 2) CTNNA1; and 3) 38 other genes from the HLA locus.
TAC3 isoform (219992_at) correlations between genes. ρ=correlations; # other genes= the number of correlations with other genes; Bold= four genes implicated in the DRD2 pathway; *= seven known genes implicated in SCZ etiology or treatment before the 2014 GWAS study came out; Italics=12 genes cited in the literature that reference the 2014 GWAS study [5].

For generation of a novel pathway
Using the threshold of significant correlations based on FDR (p≤0.05) at rho (ρ)≥│0.5│ there were 18,668 correlations among  isoforms that were used for further analysis. "Previously known" genes and isoforms were eliminated along with any isoforms that were correlated with them. There were 168 significant pair-wise correlations left between isoforms from 100 genes with the largest pathway consisting of 12 correlated genes ( Fig. 3; Table 3 [33]. Out of the 12 correlated genes, four of the genes (IMMP2L, SHMT2, HSPA9, and PBRM1) are related to the mitochondria. Previous studies have shown that mitochondrial impairment may affect bioenergetics in the developing brain and alter critical neuronal processes leading to neurodevelopmental abnormalities, mitochondrial deficits, altered redox balance, and chronic low-grade inflammation-all potential factors contributing to schizophrenia [35].

Conclusions
In conclusion, we identified four genes (NFATC3, CTNNA1, CTNND1 and GRIN2A) that have been implicated in the DRD2 pathway with significantly correlated expression with DRD2 in the data set analyzed, which supports the theory that some genes that contribute to SCZ genesis will be in the same pathway and that some of those genes will have correlated gene expression. The genes at the 108 loci are all candidate genes and the challenge is determining which of the gene candidates contribute to SCZ genesis. Using the approach of identifying SCZ genes at the 108 loci associated with correlated expression resulted in identifying a putative pathway (TAC3) with 56 different genes. The fact that many of the genes in the TAC3 pathway had correlated expression with seven genes known to contribute to SCZ and twelve genes that have been recently implicated in SCZ supports the hypothesis the 56 genes within the TAC3 pathway may contribute to SCZ and may be generally part of a large pathway. In addition, a completely novel pathway was identified with 12 genes. The analyses presented here were hypothesis generating and both putative pathways as well as many other pathways based on correlated expression need to be further implicated through other approaches.

Executive summary
• Genetic studies such as genome-wide association studies (GWAS) are powerful because they rely on objective measures such as the frequency of an allele at distinct genetic loci in patients versus control subjects.
• For complex traits like SCZ, each locus implicated through GWAS includes multiple candidate genes.
• Correlated expression between candidate genes from the 108 loci is expected to provide evidence supporting a candidate gene's role in SCZ genesis. Genes that are at the 108 loci, but do not actually contribute to SCZ genesis are less likely to be correlated with genes known to be involved in SCZ.
• Because objective measures are employed in genetic studies many of the genes implicated in SCZ through GWAS have unknown functions.
• Correlated expression between candidate genes from the 108 loci is expected to generate hypotheses regarding pathways that contribute to SCZ genesis.
• Understanding which genes contribute to SCZ genesis and what pathways they are involved in will inform development of novel interventions.
• For complex traits like SCZ, associated genes even with low odds ratios are integral to understanding the fundamental etiology of disease and inform interventions.