A novel method and software GSA-SNP2 was presented for pathway enrichment analysis of GWAS P-value data, providing high power, decent type 1 error control and fast computation by incorporating random set model and SNP count adjusted gene scores, and network analysis of genome wide association study summary data; helping to identify new drug targets and gaining new understandings of disease and new therapies to treat it.
Each person’s genome is a unique combination of DNA sequences that help to determine who we are, and accounts for individual differences including susceptibility for disease and diverse phenotypes, this genetic variation is called single nucleotide polymorphisms among humans. Single nucleotide polymorphisms correlating to specific disease could serve as predictive biomarkers to help develop new drugs. Statistical analysis of genome wide association study summary data it could be possible to identify disease associated single nucleotide polymorphisms.
Conventional single nucleotide polymorphism detection technology has been unable to identify all possible SNPs despite massive amounts of money as time invested into it. Most conventional methods are designed to control false positives in results for correct interpretation, but too much filtering hampers usefulness in drug development, making enhanced statistical power key to practical statistical algorithms.
The GSA-SNP2 algorithm was developed with the goal in mind to improve statistical predictability while maintaining accurate control of false positives, this was accomplished by applying Monotone Cubic Spline trend curve to the gene score via competitive pathway analysis for gene expression data.
GSA-SNP2 is able to provide improved type 1 error control by using single nucleotide polymorphism count adjusted gene scores while maintaining high statistical power; providing local and global protein interaction networks in associated pathways that may facilitate integrated pathway and network analysis of genome wide association study data. It is expected that the algorithm will be able to visualize protein interaction networks within and across significant pathways to prioritize core subnetworks for further studies.