Last updated: 2021-09-09

Checks: 6 1

Knit directory: ctwas_applied/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20210726) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 59e5f4d. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Unstaged changes:
    Modified:   analysis/ukb-d-30500_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30500_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30600_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30600_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30610_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30610_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30620_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30620_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30630_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30630_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30640_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30640_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30650_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30650_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30660_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30660_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30670_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30670_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30680_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30690_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30690_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30700_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30700_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30710_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30710_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30720_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30720_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30730_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30740_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30740_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30750_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30750_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30760_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30760_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30770_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30770_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30780_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30780_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30790_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30800_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30800_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30810_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30820_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30820_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30830_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30830_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30840_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30840_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30850_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30850_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30860_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30860_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30870_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30870_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30880_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30880_irnt_Whole_Blood.Rmd
    Modified:   analysis/ukb-d-30890_irnt_Liver.Rmd
    Modified:   analysis/ukb-d-30890_irnt_Whole_Blood.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/ukb-d-30890_irnt_Whole_Blood.Rmd) and HTML (docs/ukb-d-30890_irnt_Whole_Blood.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd cbf7408 wesleycrouse 2021-09-08 adding enrichment to reports
html cbf7408 wesleycrouse 2021-09-08 adding enrichment to reports
Rmd 4970e3e wesleycrouse 2021-09-08 updating reports
html 4970e3e wesleycrouse 2021-09-08 updating reports
Rmd dfd2b5f wesleycrouse 2021-09-07 regenerating reports
html dfd2b5f wesleycrouse 2021-09-07 regenerating reports
Rmd 61b53b3 wesleycrouse 2021-09-06 updated PVE calculation
html 61b53b3 wesleycrouse 2021-09-06 updated PVE calculation
Rmd 837dd01 wesleycrouse 2021-09-01 adding additional fixedsigma report
Rmd d0a5417 wesleycrouse 2021-08-30 adding new reports to the index
Rmd 0922de7 wesleycrouse 2021-08-18 updating all reports with locus plots
html 0922de7 wesleycrouse 2021-08-18 updating all reports with locus plots
html 1c62980 wesleycrouse 2021-08-11 Updating reports
Rmd 5981e80 wesleycrouse 2021-08-11 Adding more reports
html 5981e80 wesleycrouse 2021-08-11 Adding more reports
Rmd da9f015 wesleycrouse 2021-08-07 adding more reports
html da9f015 wesleycrouse 2021-08-07 adding more reports

Overview

These are the results of a ctwas analysis of the UK Biobank trait Vitamin D (quantile) using Whole_Blood gene weights.

The GWAS was conducted by the Neale Lab, and the biomarker traits we analyzed are discussed here. Summary statistics were obtained from IEU OpenGWAS using GWAS ID: ukb-d-30890_irnt. Results were obtained from from IEU rather than Neale Lab because they are in a standardard format (GWAS VCF). Note that 3 of the 34 biomarker traits were not available from IEU and were excluded from analysis.

The weights are mashr GTEx v8 models on Whole_Blood eQTL obtained from PredictDB. We performed a full harmonization of the variants, including recovering strand ambiguous variants. This procedure is discussed in a separate document. (TO-DO: add report that describes harmonization)

LD matrices were computed from a 10% subset of Neale lab subjects. Subjects were matched using the plate and well information from genotyping. We included only biallelic variants with MAF>0.01 in the original Neale Lab GWAS. (TO-DO: add more details [number of subjects, variants, etc])

Weight QC

TO-DO: add enhanced QC reporting (total number of weights, why each variant was missing for all genes)

qclist_all <- list()

qc_files <- paste0(results_dir, "/", list.files(results_dir, pattern="exprqc.Rd"))

for (i in 1:length(qc_files)){
  load(qc_files[i])
  chr <- unlist(strsplit(rev(unlist(strsplit(qc_files[i], "_")))[1], "[.]"))[1]
  qclist_all[[chr]] <- cbind(do.call(rbind, lapply(qclist,unlist)), as.numeric(substring(chr,4)))
}

qclist_all <- data.frame(do.call(rbind, qclist_all))
colnames(qclist_all)[ncol(qclist_all)] <- "chr"

rm(qclist, wgtlist, z_gene_chr)

#number of imputed weights
nrow(qclist_all)
[1] 11095
#number of imputed weights by chromosome
table(qclist_all$chr)

   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 
1129  747  624  400  479  621  560  383  404  430  682  652  192  362  331 
  16   17   18   19   20   21   22 
 551  725  159  911  313  130  310 
#proportion of imputed weights without missing variants
mean(qclist_all$nmiss==0)
[1] 0.762776

Load ctwas results

#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size #check PVE calculation

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type == "gene", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
sqlite <- RSQLite::dbDriver("SQLite")
db = RSQLite::dbConnect(sqlite, paste0("/project2/compbio/predictdb/mashr_models/mashr_", weight, ".db"))
query <- function(...) RSQLite::dbGetQuery(db, ...)
gene_info <- query("select gene, genename from extra")
gene_info <- query("select gene, genename, gene_type from extra")
RSQLite::dbDisconnect(db)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$id, match, gene_info$gene), c("genename", "gene_type")])

#add z score to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

#load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd")) #for new version, stored after harmonization
z_snp <- readRDS(paste0(results_dir, "/", trait_id, ".RDS")) #for old version, unharmonized

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,] #subset snps to those included in analysis, note some are duplicated, need to match which allele was used
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)] #for duplicated snps, this arbitrarily uses the first allele
ctwas_snp_res$z_flag <- as.numeric(ctwas_snp_res$id %in% z_snp$id[duplicated(z_snp$id)]) #mark the unclear z scores, flag=1

#formatting and rounding for tables
ctwas_gene_res$z <- round(ctwas_gene_res$z,2)
ctwas_snp_res$z <- round(ctwas_snp_res$z,2)
ctwas_gene_res$susie_pip <- round(ctwas_gene_res$susie_pip,3)
ctwas_snp_res$susie_pip <- round(ctwas_snp_res$susie_pip,3)
ctwas_gene_res$mu2 <- round(ctwas_gene_res$mu2,2)
ctwas_snp_res$mu2 <- round(ctwas_snp_res$mu2,2)
ctwas_gene_res$PVE <- signif(ctwas_gene_res$PVE, 2)
ctwas_snp_res$PVE <- signif(ctwas_snp_res$PVE, 2)

#merge gene and snp results with added information
ctwas_gene_res$z_flag=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos"))]
first_cols <- c("genename", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

report_cols_snps <- c("id", report_cols[-1])

#get number of SNPs from s1 results; adjust for thin
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

Check convergence of parameters

library(ggplot2)
library(cowplot)

********************************************************
Note: As of version 1.0.0, cowplot does not change the
  default ggplot2 theme anymore. To recover the previous
  behavior, execute:
  theme_set(theme_cowplot())
********************************************************
load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))
  
df <- data.frame(niter = rep(1:ncol(group_prior_rec), 2),
                 value = c(group_prior_rec[1,], group_prior_rec[2,]),
                 group = rep(c("Gene", "SNP"), each = ncol(group_prior_rec)))
df$group <- as.factor(df$group)

df$value[df$group=="SNP"] <- df$value[df$group=="SNP"]*thin #adjust parameter to account for thin argument

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Prior mean") +
  theme_cowplot()

df <- data.frame(niter = rep(1:ncol(group_prior_var_rec), 2),
                 value = c(group_prior_var_rec[1,], group_prior_var_rec[2,]),
                 group = rep(c("Gene", "SNP"), each = ncol(group_prior_var_rec)))
df$group <- as.factor(df$group)
p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Prior variance") +
  theme_cowplot()

plot_grid(p_pi, p_sigma2)

Version Author Date
dfd2b5f wesleycrouse 2021-09-07
#estimated group prior
estimated_group_prior <- group_prior_rec[,ncol(group_prior_rec)]
names(estimated_group_prior) <- c("gene", "snp")
estimated_group_prior["snp"] <- estimated_group_prior["snp"]*thin #adjust parameter to account for thin argument
print(estimated_group_prior)
        gene          snp 
0.0019923611 0.0002010424 
#estimated group prior variance
estimated_group_prior_var <- group_prior_var_rec[,ncol(group_prior_var_rec)]
names(estimated_group_prior_var) <- c("gene", "snp")
print(estimated_group_prior_var)
      gene        snp 
157.189551   6.999876 
#report sample size
print(sample_size)
[1] 329247
#report group size
group_size <- c(nrow(ctwas_gene_res), n_snps)
print(group_size)
[1]   11095 8697330
#estimated group PVE
estimated_group_pve <- estimated_group_prior_var*estimated_group_prior*group_size/sample_size #check PVE calculation
names(estimated_group_pve) <- c("gene", "snp")
print(estimated_group_pve)
      gene        snp 
0.01055352 0.03717423 
#compare sum(PIP*mu2/sample_size) with above PVE calculation
c(sum(ctwas_gene_res$PVE),sum(ctwas_snp_res$PVE))
[1] 0.01238507 0.32669942

Genes with highest PIPs

#distribution of PIPs
hist(ctwas_gene_res$susie_pip, xlim=c(0,1), main="Distribution of Gene PIPs")

Version Author Date
dfd2b5f wesleycrouse 2021-09-07
#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
      genename region_tag susie_pip     mu2     PVE      z
2073   SULT2A1      19_33     1.000  357.93 1.1e-03 -19.30
8347  HSD17B13       4_59     0.941   30.22 8.6e-05   4.70
4564     PSRC1       1_67     0.874   49.00 1.3e-04   6.56
3811    INSIG2       2_69     0.874   38.97 1.0e-04  -5.95
2410       MLX      17_25     0.783   44.84 1.1e-04  -6.51
2496      ZPR1      11_70     0.767   85.44 2.0e-04   8.92
2720     MED23       6_87     0.677   53.08 1.1e-04   6.97
4249      LRP3      19_23     0.606   32.59 6.0e-05  -4.40
8634     DHCR7      11_40     0.519 1482.51 2.3e-03  36.46
5772       LPP      3_115     0.506   37.63 5.8e-05   4.45
4084     PSMA1      11_11     0.500 1232.13 1.9e-03 -17.47
6264     PDE3B      11_11     0.500 1232.13 1.9e-03 -17.47
8633   NADSYN1      11_40     0.481 1482.36 2.2e-03  36.46
5318      USP3      15_29     0.480   49.83 7.3e-05  -7.06
1043    NFE2L1      17_28     0.343   27.56 2.9e-05  -4.22
7089      USP1       1_39     0.339   76.99 7.9e-05  -8.54
10765  ZDHHC18       1_18     0.319   35.28 3.4e-05  -4.52
4394      UTP3       4_49     0.221   34.98 2.3e-05   4.63
3191    AKR1A1       1_28     0.209   35.40 2.3e-05  -4.82
4151      LDLR       19_9     0.194   59.83 3.5e-05   3.77

Genes with largest effect sizes

#plot PIP vs effect size
plot(ctwas_gene_res$susie_pip, ctwas_gene_res$mu2, xlab="PIP", ylab="mu^2", main="Gene PIPs vs Effect Size")

Version Author Date
dfd2b5f wesleycrouse 2021-09-07
#genes with 20 largest effect sizes
head(ctwas_gene_res[order(-ctwas_gene_res$mu2),report_cols],20)
      genename region_tag susie_pip     mu2     PVE      z
8634     DHCR7      11_40     0.519 1482.51 2.3e-03  36.46
8633   NADSYN1      11_40     0.481 1482.36 2.2e-03  36.46
9889    CYP2R1      11_11     0.000 1261.34 0.0e+00  35.17
4084     PSMA1      11_11     0.500 1232.13 1.9e-03 -17.47
6264     PDE3B      11_11     0.500 1232.13 1.9e-03 -17.47
7886    TRIM68       11_3     0.000  503.24 0.0e+00  -0.85
4083     COPB1      11_11     0.000  403.08 0.0e+00  -6.44
2073   SULT2A1      19_33     1.000  357.93 1.1e-03 -19.30
9362    OR51E1       11_3     0.000   91.12 0.0e+00   1.13
2496      ZPR1      11_70     0.767   85.44 2.0e-04   8.92
7089      USP1       1_39     0.339   76.99 7.9e-05  -8.54
3102     DOCK7       1_39     0.003   66.73 6.8e-07  -7.91
9151     STAP2       19_4     0.153   63.01 2.9e-05  -4.11
3099   ARHGEF2       1_77     0.004   61.79 6.7e-07  -5.14
7922    ZNF701      19_37     0.050   61.05 9.3e-06  -4.07
574       CA11      19_33     0.004   60.81 6.6e-07   3.80
4151      LDLR       19_9     0.194   59.83 3.5e-05   3.77
10443   ZNF665      19_37     0.041   59.81 7.5e-06  -3.57
4434     SYT11       1_77     0.026   58.93 4.7e-06  -5.79
9500  TRAPPC6B      14_11     0.002   57.65 4.0e-07  -7.19

Genes with highest PVE

#genes with 20 highest pve
head(ctwas_gene_res[order(-ctwas_gene_res$PVE),report_cols],20)
      genename region_tag susie_pip     mu2     PVE      z
8634     DHCR7      11_40     0.519 1482.51 2.3e-03  36.46
8633   NADSYN1      11_40     0.481 1482.36 2.2e-03  36.46
4084     PSMA1      11_11     0.500 1232.13 1.9e-03 -17.47
6264     PDE3B      11_11     0.500 1232.13 1.9e-03 -17.47
2073   SULT2A1      19_33     1.000  357.93 1.1e-03 -19.30
2496      ZPR1      11_70     0.767   85.44 2.0e-04   8.92
4564     PSRC1       1_67     0.874   49.00 1.3e-04   6.56
2720     MED23       6_87     0.677   53.08 1.1e-04   6.97
2410       MLX      17_25     0.783   44.84 1.1e-04  -6.51
3811    INSIG2       2_69     0.874   38.97 1.0e-04  -5.95
8347  HSD17B13       4_59     0.941   30.22 8.6e-05   4.70
7089      USP1       1_39     0.339   76.99 7.9e-05  -8.54
5318      USP3      15_29     0.480   49.83 7.3e-05  -7.06
4249      LRP3      19_23     0.606   32.59 6.0e-05  -4.40
5772       LPP      3_115     0.506   37.63 5.8e-05   4.45
4151      LDLR       19_9     0.194   59.83 3.5e-05   3.77
10765  ZDHHC18       1_18     0.319   35.28 3.4e-05  -4.52
1043    NFE2L1      17_28     0.343   27.56 2.9e-05  -4.22
9151     STAP2       19_4     0.153   63.01 2.9e-05  -4.11
3191    AKR1A1       1_28     0.209   35.40 2.3e-05  -4.82

Genes with largest z scores

#genes with 20 largest z scores
head(ctwas_gene_res[order(-abs(ctwas_gene_res$z)),report_cols],20)
     genename region_tag susie_pip     mu2     PVE      z
8633  NADSYN1      11_40     0.481 1482.36 2.2e-03  36.46
8634    DHCR7      11_40     0.519 1482.51 2.3e-03  36.46
9889   CYP2R1      11_11     0.000 1261.34 0.0e+00  35.17
2073  SULT2A1      19_33     1.000  357.93 1.1e-03 -19.30
4084    PSMA1      11_11     0.500 1232.13 1.9e-03 -17.47
6264    PDE3B      11_11     0.500 1232.13 1.9e-03 -17.47
2496     ZPR1      11_70     0.767   85.44 2.0e-04   8.92
7089     USP1       1_39     0.339   76.99 7.9e-05  -8.54
3102    DOCK7       1_39     0.003   66.73 6.8e-07  -7.91
5633     CRNN       1_74     0.000   34.86 1.2e-08   7.29
9500 TRAPPC6B      14_11     0.002   57.65 4.0e-07  -7.19
5318     USP3      15_29     0.480   49.83 7.3e-05  -7.06
2720    MED23       6_87     0.677   53.08 1.1e-04   6.97
3270     ARG1       6_87     0.154   49.91 2.3e-05   6.75
4564    PSRC1       1_67     0.874   49.00 1.3e-04   6.56
2410      MLX      17_25     0.783   44.84 1.1e-04  -6.51
4083    COPB1      11_11     0.000  403.08 0.0e+00  -6.44
9893    ZBTB6       9_63     0.031   43.82 4.2e-06  -6.18
205   RABGAP1       9_63     0.024   44.61 3.2e-06  -6.15
5007    BUD13      11_70     0.000   26.05 2.6e-08  -6.04

Comparing z scores and PIPs

#set nominal signifiance threshold for z scores
alpha <- 0.05

#bonferroni adjusted threshold for z scores
sig_thresh <- qnorm(1-(alpha/nrow(ctwas_gene_res)/2), lower=T)

#Q-Q plot for z scores
obs_z <- ctwas_gene_res$z[order(ctwas_gene_res$z)]
exp_z <- qnorm((1:nrow(ctwas_gene_res))/nrow(ctwas_gene_res))

plot(exp_z, obs_z, xlab="Expected z", ylab="Observed z", main="Gene z score Q-Q plot")
abline(a=0,b=1)

Version Author Date
dfd2b5f wesleycrouse 2021-09-07
#plot z score vs PIP
plot(abs(ctwas_gene_res$z), ctwas_gene_res$susie_pip, xlab="abs(z)", ylab="PIP")
abline(v=sig_thresh, col="red", lty=2)

Version Author Date
dfd2b5f wesleycrouse 2021-09-07
#proportion of significant z scores
mean(abs(ctwas_gene_res$z) > sig_thresh)
[1] 0.00396575
#genes with most significant z scores
head(ctwas_gene_res[order(-abs(ctwas_gene_res$z)),report_cols],20)
     genename region_tag susie_pip     mu2     PVE      z
8633  NADSYN1      11_40     0.481 1482.36 2.2e-03  36.46
8634    DHCR7      11_40     0.519 1482.51 2.3e-03  36.46
9889   CYP2R1      11_11     0.000 1261.34 0.0e+00  35.17
2073  SULT2A1      19_33     1.000  357.93 1.1e-03 -19.30
4084    PSMA1      11_11     0.500 1232.13 1.9e-03 -17.47
6264    PDE3B      11_11     0.500 1232.13 1.9e-03 -17.47
2496     ZPR1      11_70     0.767   85.44 2.0e-04   8.92
7089     USP1       1_39     0.339   76.99 7.9e-05  -8.54
3102    DOCK7       1_39     0.003   66.73 6.8e-07  -7.91
5633     CRNN       1_74     0.000   34.86 1.2e-08   7.29
9500 TRAPPC6B      14_11     0.002   57.65 4.0e-07  -7.19
5318     USP3      15_29     0.480   49.83 7.3e-05  -7.06
2720    MED23       6_87     0.677   53.08 1.1e-04   6.97
3270     ARG1       6_87     0.154   49.91 2.3e-05   6.75
4564    PSRC1       1_67     0.874   49.00 1.3e-04   6.56
2410      MLX      17_25     0.783   44.84 1.1e-04  -6.51
4083    COPB1      11_11     0.000  403.08 0.0e+00  -6.44
9893    ZBTB6       9_63     0.031   43.82 4.2e-06  -6.18
205   RABGAP1       9_63     0.024   44.61 3.2e-06  -6.15
5007    BUD13      11_70     0.000   26.05 2.6e-08  -6.04

Locus plots for genes and SNPs

ctwas_gene_res_sortz <- ctwas_gene_res[order(-abs(ctwas_gene_res$z)),]

n_plots <- 5
for (region_tag_plot in head(unique(ctwas_gene_res_sortz$region_tag), n_plots)){
  ctwas_res_region <-  ctwas_res[ctwas_res$region_tag==region_tag_plot,]
  start <- min(ctwas_res_region$pos)
  end <- max(ctwas_res_region$pos)
  
  ctwas_res_region <- ctwas_res_region[order(ctwas_res_region$pos),]
  ctwas_res_region_gene <- ctwas_res_region[ctwas_res_region$type=="gene",]
  ctwas_res_region_snp <- ctwas_res_region[ctwas_res_region$type=="SNP",]
  
  #region name
  print(paste0("Region: ", region_tag_plot))
  
  #table of genes in region
  print(ctwas_res_region_gene[,report_cols])
  
  par(mfrow=c(4,1))
  
  #gene z scores
  plot(ctwas_res_region_gene$pos, abs(ctwas_res_region_gene$z), xlab="Position", ylab="abs(gene_z)", xlim=c(start,end),
   ylim=c(0,max(sig_thresh, abs(ctwas_res_region_gene$z))),
   main=paste0("Region: ", region_tag_plot))
  abline(h=sig_thresh,col="red",lty=2)
  
  #significance threshold for SNPs
  alpha_snp <- 5*10^(-8)
  sig_thresh_snp <- qnorm(1-alpha_snp/2, lower=T)
  
  #snp z scores
  plot(ctwas_res_region_snp$pos, abs(ctwas_res_region_snp$z), xlab="Position", ylab="abs(snp_z)",xlim=c(start,end),
   ylim=c(0,max(sig_thresh_snp, max(abs(ctwas_res_region_snp$z)))))
  abline(h=sig_thresh_snp,col="purple",lty=2)
  
  #gene pips
  plot(ctwas_res_region_gene$pos, ctwas_res_region_gene$susie_pip, xlab="Position", ylab="Gene PIP", xlim=c(start,end), ylim=c(0,1))
  abline(h=0.8,col="blue",lty=2)
  
  #snp pips
  plot(ctwas_res_region_snp$pos, ctwas_res_region_snp$susie_pip, xlab="Position", ylab="SNP PIP", xlim=c(start,end), ylim=c(0,1))
  abline(h=0.8,col="blue",lty=2)
}
[1] "Region: 11_40"
          genename region_tag susie_pip     mu2     PVE     z
8634         DHCR7      11_40     0.519 1482.51 2.3e-03 36.46
8633       NADSYN1      11_40     0.481 1482.36 2.2e-03 36.46
11821     KRTAP5-9      11_40     0.000    6.41 5.0e-09 -0.37
6699       FAM86C1      11_40     0.000   10.57 1.1e-08 -1.16
4999        RNF121      11_40     0.000    8.47 5.8e-09  1.87
11802 RP11-849H4.2      11_40     0.000    5.00 3.4e-09 -0.19
4991        IL18BP      11_40     0.001   21.56 8.6e-08 -1.62
9668        LRTOMT      11_40     0.000   10.48 7.1e-09 -2.06
4992         NUMA1      11_40     0.000   11.41 1.2e-08  1.56
2526       ANAPC15      11_40     0.000    6.97 4.9e-09  0.77
2527         FOLR3      11_40     0.000   10.18 7.7e-09 -1.76
2525         FOLR1      11_40     0.001   17.41 5.4e-08  0.80
7579         FOLR2      11_40     0.005   31.34 4.4e-07  2.16
7580        INPPL1      11_40     0.000    5.03 3.4e-09 -0.12
7028          CLPB      11_40     0.000    5.07 3.5e-09  0.02
11293    LINC01537      11_40     0.000   10.05 7.4e-09  2.22

Version Author Date
dfd2b5f wesleycrouse 2021-09-07
[1] "Region: 11_11"
          genename region_tag susie_pip     mu2    PVE      z
10464         FAR1      11_11       0.0   21.74 0.0000  -2.05
4083         COPB1      11_11       0.0  403.08 0.0000  -6.44
4084         PSMA1      11_11       0.5 1232.13 0.0019 -17.47
6264         PDE3B      11_11       0.5 1232.13 0.0019 -17.47
9889        CYP2R1      11_11       0.0 1261.34 0.0000  35.17
10103         INSC      11_11       0.0   10.36 0.0000  -1.15
11818 RP11-531H8.1      11_11       0.0   31.29 0.0000  -3.75
11810 RP11-531H8.2      11_11       0.0   39.36 0.0000   4.53

Version Author Date
dfd2b5f wesleycrouse 2021-09-07
[1] "Region: 19_33"
      genename region_tag susie_pip    mu2     PVE      z
2050     PRKD2      19_33     0.000   5.68 2.2e-10  -0.41
1257     STRN4      19_33     0.000   6.51 2.7e-10   0.49
9389      FKRP      19_33     0.000   7.15 3.2e-10   0.69
400      AP2S1      19_33     0.000   5.59 2.2e-10   0.37
6825  ARHGAP35      19_33     0.000   5.47 2.1e-10  -0.10
5502      SAE1      19_33     0.000   5.22 1.9e-10   0.52
2055      BBC3      19_33     0.000  20.35 3.0e-09  -1.90
2053     CCDC9      19_33     0.000  13.11 1.3e-09   1.17
11894   INAFM1      19_33     0.000   8.63 4.4e-10   0.65
4639     C5AR2      19_33     0.000   5.97 2.4e-10  -0.22
4635     DHX34      19_33     0.000   5.94 2.4e-10   0.46
2077     MEIS3      19_33     0.000   6.46 2.5e-10  -0.17
2074      NAPA      19_33     0.000   5.75 2.3e-10  -0.24
3238    ZNF541      19_33     0.000   6.95 2.6e-10  -0.37
572    GLTSCR1      19_33     0.000   7.37 3.4e-10  -0.49
294       EHD2      19_33     0.000  14.66 1.6e-09  -0.96
2066   GLTSCR2      19_33     0.000  10.25 5.9e-10  -0.50
2073   SULT2A1      19_33     1.000 357.93 1.1e-03 -19.30
2089   PLA2G4C      19_33     0.000   5.93 2.2e-10   0.88
2086      LIG1      19_33     0.000   7.87 3.8e-10   0.19
9808  C19orf68      19_33     0.000   6.86 3.0e-10   0.60
2091     CABP5      19_33     0.000  10.68 6.5e-10   1.42
2085     CARD8      19_33     0.000  11.56 8.3e-10   1.39
5501      EMP3      19_33     0.000   7.18 3.2e-10   0.82
2084   CCDC114      19_33     0.000   5.27 2.0e-10  -0.48
2081     GRWD1      19_33     0.000   7.77 3.6e-10  -1.12
2080     CYTH2      19_33     0.000   8.50 4.4e-10   0.95
9493    KCNJ14      19_33     0.000   5.59 2.1e-10   0.12
5504     LMTK3      19_33     0.000   5.38 2.0e-10  -0.27
1173   SULT2B1      19_33     0.000  11.79 9.2e-10   0.96
573      SPHK2      19_33     0.000  10.91 6.8e-10   1.44
574       CA11      19_33     0.004  60.81 6.6e-07   3.80
5503      NTN5      19_33     0.000   5.15 1.9e-10  -0.08
9034    MAMSTR      19_33     0.000  29.10 1.3e-08   2.35
2097    RASIP1      19_33     0.000  26.07 8.4e-09  -2.10

Version Author Date
dfd2b5f wesleycrouse 2021-09-07
[1] "Region: 11_70"
     genename region_tag susie_pip   mu2     PVE     z
5007    BUD13      11_70     0.000 26.05 2.6e-08 -6.04
2496     ZPR1      11_70     0.767 85.44 2.0e-04  8.92
3237    APOA1      11_70     0.001 10.15 1.5e-08  1.22
6898     SIK3      11_70     0.000  5.92 6.0e-09  0.14
8030 PAFAH1B2      11_70     0.001 21.79 3.3e-08 -0.86
6104    TAGLN      11_70     0.000  8.19 7.5e-09  2.06
6902    PCSK7      11_70     0.001 14.53 2.9e-08 -2.30
7873   RNF214      11_70     0.001 17.02 5.6e-08 -1.19
9915    BACE1      11_70     0.000 10.95 1.2e-08  3.02
2530   CEP164      11_70     0.001 15.12 3.6e-08  1.53
5018    FXYD2      11_70     0.002 21.17 9.9e-08  1.74
5017    FXYD6      11_70     0.002 25.28 1.7e-07 -2.05

Version Author Date
dfd2b5f wesleycrouse 2021-09-07
[1] "Region: 1_39"
     genename region_tag susie_pip   mu2     PVE     z
7088    TM2D1       1_39     0.001  6.08 1.9e-08  0.37
4449     PATJ       1_39     0.001  6.32 2.0e-08 -0.46
7089     USP1       1_39     0.339 76.99 7.9e-05 -8.54
3102    DOCK7       1_39     0.003 66.73 6.8e-07 -7.91
3822    ATG4C       1_39     0.001 10.33 4.6e-08  1.31

Version Author Date
dfd2b5f wesleycrouse 2021-09-07

SNPs with highest PIPs

#snps with PIP>0.8 or 20 highest PIPs
head(ctwas_snp_res[order(-ctwas_snp_res$susie_pip),report_cols_snps],
max(sum(ctwas_snp_res$susie_pip>0.8), 20))
                id region_tag susie_pip     mu2     PVE      z
35277  rs140655928       1_74     1.000   44.21 1.3e-04   9.17
35366  rs115288876       1_74     1.000  104.31 3.2e-04  12.20
35535   rs61813920       1_74     1.000   48.54 1.5e-04   9.77
73583    rs1042034       2_13     1.000   33.99 1.0e-04  -5.24
75319     rs780093       2_16     1.000   54.90 1.7e-04   7.97
225050 rs189319377       4_49     1.000   39.57 1.2e-04  -6.98
225199    rs222042       4_52     1.000  453.30 1.4e-03   4.62
225201    rs843005       4_52     1.000 1579.06 4.8e-03  59.21
225202   rs7697091       4_52     1.000 2017.45 6.1e-03 -64.16
561121  rs12282912       11_3     1.000 1171.13 3.6e-03   2.65
561126 rs759352454       11_3     1.000 1154.25 3.5e-03   2.92
565265 rs576218922      11_11     1.000  500.09 1.5e-03 -35.89
565284  rs10766194      11_11     1.000 1617.70 4.9e-03 -41.01
565288   rs7129781      11_11     1.000  949.01 2.9e-03 -20.22
583355   rs3740776      11_42     1.000   36.62 1.1e-04   6.23
632606   rs6538696      12_57     1.000  161.23 4.9e-04  15.75
657309 rs775834524      13_25     1.000 1174.60 3.6e-03  -2.53
716351   rs2070895      15_27     1.000   96.36 2.9e-04 -10.63
752338  rs11542462      16_46     1.000   34.19 1.0e-04  -6.06
838982   rs6123359      20_32     1.000   58.27 1.8e-04   8.76
838988   rs2585441      20_32     1.000   70.61 2.1e-04   9.56
838989  rs12480880      20_32     1.000   59.72 1.8e-04   9.97
893819 rs561086564      19_33     1.000  235.44 7.2e-04   2.81
893870  rs75778010      19_33     1.000  191.71 5.8e-04   2.70
35475   rs11205006       1_74     0.999   55.39 1.7e-04 -11.74
716345  rs62000868      15_27     0.999   44.61 1.4e-04  -7.33
839011   rs6068816      20_32     0.999   36.41 1.1e-04  -7.56
859473   rs2072193      22_11     0.998   29.81 9.0e-05   5.74
632610   rs7308827      12_57     0.997   62.99 1.9e-04 -11.80
843372   rs2823025       21_2     0.995   27.57 8.3e-05  -5.33
225215  rs35096193       4_52     0.994  152.34 4.6e-04  25.11
224979 rs112345622       4_49     0.979   26.55 7.9e-05   4.88
759491   rs4968186       17_7     0.974   25.99 7.7e-05   5.18
451826   rs4738679       8_45     0.964   27.14 7.9e-05   5.22
329172  rs75080831       6_19     0.958   25.04 7.3e-05  -4.95
565693 rs117731662      11_12     0.958   26.87 7.8e-05  -5.55
86718    rs2862874       2_39     0.956   27.58 8.0e-05  -5.01
764298 rs112178027      17_18     0.953   26.54 7.7e-05  -5.24
223960    rs445908       4_48     0.952   44.12 1.3e-04  -6.66
59159   rs11122453      1_117     0.940   25.44 7.3e-05   5.06
322677   rs6925468        6_7     0.929   23.48 6.6e-05  -4.68
893742 rs191652498      19_33     0.924   70.69 2.0e-04   2.43
818716   rs2972561      19_31     0.906   24.30 6.7e-05  -4.79
54245   rs79687284      1_108     0.900   23.72 6.5e-05  -4.64
494898  rs11143795       9_34     0.898   24.54 6.7e-05   4.85
816186   rs1137844      19_24     0.897   24.16 6.6e-05  -4.71
565260  rs11023314      11_11     0.892  823.26 2.2e-03  12.46
637857  rs55947413      12_69     0.892   23.04 6.2e-05   4.46
443078  rs10086575       8_26     0.880   23.71 6.3e-05   4.67
683333   rs2144530      14_11     0.873  117.56 3.1e-04 -11.93
597633   rs2847500      11_72     0.862   27.26 7.1e-05  -5.37
579618 rs575513603      11_36     0.859   25.69 6.7e-05   4.77
743886    rs821840      16_31     0.858   44.48 1.2e-04  -6.90
201276  rs78649910        4_4     0.856   30.30 7.9e-05  -5.87
300492   rs1966478       5_72     0.849   24.49 6.3e-05  -4.73
287261   rs6887019       5_45     0.848   24.18 6.2e-05  -4.66
35748   rs61803125       1_75     0.837   24.74 6.3e-05   4.69
747689 rs139861017      16_37     0.815   24.73 6.1e-05   4.71
468612  rs12056610       8_78     0.813   74.46 1.8e-04  -8.77
564769 rs560227606      11_10     0.810   25.07 6.2e-05  -4.56
513801 rs115478735       9_70     0.801   24.60 6.0e-05  -4.60
865471   rs5767072      22_22     0.801   24.63 6.0e-05   4.55

SNPs with largest effect sizes

#plot PIP vs effect size
plot(ctwas_snp_res$susie_pip, ctwas_snp_res$mu2, xlab="PIP", ylab="mu^2", main="SNP PIPs vs Effect Size")

Version Author Date
dfd2b5f wesleycrouse 2021-09-07
#SNPs with 50 largest effect sizes
head(ctwas_snp_res[order(-ctwas_snp_res$mu2),report_cols_snps],50)
                id region_tag susie_pip     mu2     PVE      z
225202   rs7697091       4_52     1.000 2017.45 6.1e-03 -64.16
565284  rs10766194      11_11     1.000 1617.70 4.9e-03 -41.01
225201    rs843005       4_52     1.000 1579.06 4.8e-03  59.21
225200    rs705119       4_52     0.000 1456.00 2.2e-13  56.46
657309 rs775834524      13_25     1.000 1174.60 3.6e-03  -2.53
561121  rs12282912       11_3     1.000 1171.13 3.6e-03   2.65
657307   rs7999449      13_25     0.559 1160.51 2.0e-03  -2.79
657299   rs7337153      13_25     0.579 1160.03 2.0e-03  -2.81
657300   rs9527399      13_25     0.219 1156.60 7.7e-04   2.80
657303   rs9597193      13_25     0.208 1156.54 7.3e-04   2.80
657304   rs9537143      13_25     0.197 1156.54 6.9e-04   2.79
657302   rs9527401      13_25     0.206 1156.53 7.2e-04   2.79
657297   rs9527398      13_25     0.178 1155.93 6.3e-04   2.80
657298   rs9537125      13_25     0.178 1155.93 6.2e-04   2.80
657295   rs9537123      13_25     0.173 1155.83 6.1e-04   2.79
561126 rs759352454       11_3     1.000 1154.25 3.5e-03   2.92
581864   rs2276362      11_40     0.001 1151.31 3.8e-06  36.48
581860  rs12790010      11_40     0.001 1150.81 3.7e-06  36.47
581862  rs11233747      11_40     0.001 1150.52 3.7e-06  36.47
581859  rs12789751      11_40     0.001 1150.12 3.7e-06  36.46
581868  rs11233789      11_40     0.001 1147.36 3.7e-06  36.41
225208 rs114289627       4_52     0.008 1142.39 2.7e-05 -55.56
657288   rs3013347      13_25     0.006 1132.57 2.0e-05  -2.63
657289   rs2937326      13_25     0.006 1132.53 1.9e-05  -2.63
657290   rs9597179      13_25     0.001 1128.46 2.8e-06   2.57
657314   rs9537159      13_25     0.000 1107.37 7.1e-07  -2.51
657320    rs539380      13_25     0.000 1106.19 4.7e-07  -2.51
657291   rs9537116      13_25     0.000 1105.22 6.9e-08   2.43
657313  rs35800055      13_25     0.000 1103.60 1.7e-07   2.49
657312  rs67100646      13_25     0.000 1103.57 1.8e-07   2.50
657310   rs4536353      13_25     0.000 1103.53 1.7e-07   2.49
657311   rs4296148      13_25     0.000 1103.40 1.6e-07   2.49
657317   rs7994036      13_25     0.000 1103.28 1.8e-07   2.50
657315   rs9597201      13_25     0.000 1103.23 1.7e-07   2.50
657319   rs9537174      13_25     0.000 1103.14 1.7e-07   2.49
565285   rs7125781      11_11     0.000 1087.09 0.0e+00 -33.70
657286   rs3105089      13_25     0.002 1061.11 8.0e-06  -3.11
657283   rs2315887      13_25     0.005 1056.52 1.8e-05  -3.22
657284   rs2315886      13_25     0.005 1056.52 1.8e-05  -3.22
657285   rs3124374      13_25     0.004 1056.29 1.2e-05  -3.19
657275   rs2315898      13_25     0.008 1056.11 2.6e-05  -3.26
657281   rs3124402      13_25     0.012 1055.13 3.8e-05  -3.29
657277   rs3105045      13_25     0.004 1055.08 1.4e-05  -3.22
657278   rs2315895      13_25     0.004 1055.07 1.4e-05  -3.22
657279   rs3124405      13_25     0.004 1055.05 1.4e-05  -3.22
561086  rs11500272       11_3     0.000 1054.45 1.3e-09   3.09
657273   rs7317475      13_25     0.002 1052.73 5.4e-06  -3.15
657262   rs1960704      13_25     0.002 1051.10 7.5e-06  -3.19
657270    rs520268      13_25     0.002 1051.09 7.2e-06  -3.19
657267    rs616312      13_25     0.002 1050.96 6.6e-06  -3.18

SNPs with highest PVE

#SNPs with 50 highest pve
head(ctwas_snp_res[order(-ctwas_snp_res$PVE),report_cols_snps],50)
                id region_tag susie_pip     mu2     PVE      z
225202   rs7697091       4_52     1.000 2017.45 0.00610 -64.16
565284  rs10766194      11_11     1.000 1617.70 0.00490 -41.01
225201    rs843005       4_52     1.000 1579.06 0.00480  59.21
561121  rs12282912       11_3     1.000 1171.13 0.00360   2.65
657309 rs775834524      13_25     1.000 1174.60 0.00360  -2.53
561126 rs759352454       11_3     1.000 1154.25 0.00350   2.92
565288   rs7129781      11_11     1.000  949.01 0.00290 -20.22
565260  rs11023314      11_11     0.892  823.26 0.00220  12.46
657299   rs7337153      13_25     0.579 1160.03 0.00200  -2.81
657307   rs7999449      13_25     0.559 1160.51 0.00200  -2.79
225196  rs16846876       4_52     0.728  720.42 0.00160 -52.15
565265 rs576218922      11_11     1.000  500.09 0.00150 -35.89
225199    rs222042       4_52     1.000  453.30 0.00140   4.62
657300   rs9527399      13_25     0.219 1156.60 0.00077   2.80
657303   rs9597193      13_25     0.208 1156.54 0.00073   2.80
657302   rs9527401      13_25     0.206 1156.53 0.00072   2.79
893819 rs561086564      19_33     1.000  235.44 0.00072   2.81
657304   rs9537143      13_25     0.197 1156.54 0.00069   2.79
657297   rs9527398      13_25     0.178 1155.93 0.00063   2.80
657298   rs9537125      13_25     0.178 1155.93 0.00062   2.80
657295   rs9537123      13_25     0.173 1155.83 0.00061   2.79
893870  rs75778010      19_33     1.000  191.71 0.00058   2.70
632606   rs6538696      12_57     1.000  161.23 0.00049  15.75
225215  rs35096193       4_52     0.994  152.34 0.00046  25.11
225183  rs34186014       4_52     0.183  702.54 0.00039 -51.73
581876  rs78434528      11_40     0.635  192.47 0.00037  -8.60
35366  rs115288876       1_74     1.000  104.31 0.00032  12.20
683333   rs2144530      14_11     0.873  117.56 0.00031 -11.93
716351   rs2070895      15_27     1.000   96.36 0.00029 -10.63
565268  rs73418613      11_11     0.108  818.41 0.00027  12.14
838988   rs2585441      20_32     1.000   70.61 0.00021   9.56
581843   rs7105151      11_40     0.133  486.51 0.00020  24.33
893742 rs191652498      19_33     0.924   70.69 0.00020   2.43
225182  rs11732044       4_52     0.087  696.51 0.00019 -51.61
632610   rs7308827      12_57     0.997   62.99 0.00019 -11.80
468612  rs12056610       8_78     0.813   74.46 0.00018  -8.77
838982   rs6123359      20_32     1.000   58.27 0.00018   8.76
838989  rs12480880      20_32     1.000   59.72 0.00018   9.97
35475   rs11205006       1_74     0.999   55.39 0.00017 -11.74
75319     rs780093       2_16     1.000   54.90 0.00017   7.97
35535   rs61813920       1_74     1.000   48.54 0.00015   9.77
716345  rs62000868      15_27     0.999   44.61 0.00014  -7.33
35277  rs140655928       1_74     1.000   44.21 0.00013   9.17
223960    rs445908       4_48     0.952   44.12 0.00013  -6.66
225050 rs189319377       4_49     1.000   39.57 0.00012  -6.98
743886    rs821840      16_31     0.858   44.48 0.00012  -6.90
811056  rs73004951      19_15     0.743   53.78 0.00012   7.89
583355   rs3740776      11_42     1.000   36.62 0.00011   6.23
839011   rs6068816      20_32     0.999   36.41 0.00011  -7.56
73583    rs1042034       2_13     1.000   33.99 0.00010  -5.24

SNPs with largest z scores

#SNPs with 50 largest z scores
head(ctwas_snp_res[order(-abs(ctwas_snp_res$z)),report_cols_snps],50)
                id region_tag susie_pip     mu2     PVE      z
225202   rs7697091       4_52     1.000 2017.45 6.1e-03 -64.16
225201    rs843005       4_52     1.000 1579.06 4.8e-03  59.21
225200    rs705119       4_52     0.000 1456.00 2.2e-13  56.46
225208 rs114289627       4_52     0.008 1142.39 2.7e-05 -55.56
225196  rs16846876       4_52     0.728  720.42 1.6e-03 -52.15
225183  rs34186014       4_52     0.183  702.54 3.9e-04 -51.73
225182  rs11732044       4_52     0.087  696.51 1.9e-04 -51.61
225175  rs11733890       4_52     0.000  641.59 9.4e-07 -49.98
225190   rs1526692       4_52     0.000  859.79 1.2e-15 -46.84
225186   rs7682810       4_52     0.000  857.93 1.2e-15 -46.79
565284  rs10766194      11_11     1.000 1617.70 4.9e-03 -41.01
225210  rs34539583       4_52     0.000  291.35 1.3e-14  40.43
225197   rs1491711       4_52     0.000  536.97 2.9e-15  39.27
581864   rs2276362      11_40     0.001 1151.31 3.8e-06  36.48
581860  rs12790010      11_40     0.001 1150.81 3.7e-06  36.47
581862  rs11233747      11_40     0.001 1150.52 3.7e-06  36.47
581859  rs12789751      11_40     0.001 1150.12 3.7e-06  36.46
581868  rs11233789      11_40     0.001 1147.36 3.7e-06  36.41
565265 rs576218922      11_11     1.000  500.09 1.5e-03 -35.89
565289  rs10766196      11_11     0.000  976.48 0.0e+00 -35.15
225225   rs2139641       4_52     0.000  437.07 4.1e-11  34.17
225227   rs4694436       4_52     0.000  436.80 4.0e-11  34.17
565285   rs7125781      11_11     0.000 1087.09 0.0e+00 -33.70
565262  rs10832297      11_11     0.000  826.40 0.0e+00 -33.59
565251  rs10832291      11_11     0.000  822.27 0.0e+00 -33.58
565252  rs78474492      11_11     0.000  821.98 0.0e+00 -33.56
565256   rs7119320      11_11     0.000  821.34 0.0e+00 -33.56
581884   rs7926180      11_40     0.001  799.77 3.0e-06  32.92
581866   rs1790344      11_40     0.001  888.15 2.7e-06 -31.64
581867   rs1790343      11_40     0.001  888.15 2.7e-06 -31.64
581865 rs142902943      11_40     0.001  887.91 2.7e-06 -31.63
581869   rs1790339      11_40     0.001  887.79 2.7e-06 -31.62
581871   rs4402322      11_40     0.001  887.37 2.7e-06 -31.62
581858   rs2002064      11_40     0.001  883.03 2.7e-06 -31.54
581853   rs1790328      11_40     0.001  828.59 2.6e-06 -30.67
581855   rs1792264      11_40     0.001  826.84 2.6e-06 -30.62
225171  rs28393895       4_52     0.000  259.36 1.4e-15  29.35
565216  rs12792120      11_11     0.000  621.43 0.0e+00 -28.89
565207  rs12804549      11_11     0.000  616.18 0.0e+00 -28.80
565212  rs12287212      11_11     0.000  615.27 0.0e+00 -28.78
225191   rs2201126       4_52     0.000  106.45 2.0e-17 -28.74
581879  rs10898191      11_40     0.001  589.60 2.0e-06 -28.55
565215  rs36093983      11_11     0.000  611.13 0.0e+00 -28.51
581886  rs79634040      11_40     0.001  572.45 1.9e-06 -28.17
581887  rs11234027      11_40     0.001  558.91 1.8e-06 -27.83
581885  rs28762388      11_40     0.001  549.99 1.7e-06 -27.58
225193  rs35937576       4_52     0.000  211.65 2.5e-17  27.03
225189  rs13111974       4_52     0.000  211.18 2.4e-17  26.99
225185    rs371397       4_52     0.000  209.16 2.5e-17 -26.89
225184  rs67769961       4_52     0.000  206.72 2.4e-17  26.71

Gene set enrichment for genes with PIP>0.8

#GO enrichment analysis
library(enrichR)
Welcome to enrichR
Checking connection ... 
Enrichr ... Connection is Live!
FlyEnrichr ... Connection is available!
WormEnrichr ... Connection is available!
YeastEnrichr ... Connection is available!
FishEnrichr ... Connection is available!
dbs <- c("GO_Biological_Process_2021", "GO_Cellular_Component_2021", "GO_Molecular_Function_2021")
genes <- ctwas_gene_res$genename[ctwas_gene_res$susie_pip>0.8]

#number of genes for gene set enrichment
length(genes)
[1] 4
if (length(genes)>0){
  GO_enrichment <- enrichr(genes, dbs)

  for (db in dbs){
    print(db)
    df <- GO_enrichment[[db]]
    df <- df[df$Adjusted.P.value<0.05,c("Term", "Overlap", "Adjusted.P.value", "Genes")]
    print(df)
  }
  
  #DisGeNET enrichment
  
  # devtools::install_bitbucket("ibi_group/disgenet2r")
  library(disgenet2r)
  
  disgenet_api_key <- get_disgenet_api_key(
                    email = "wesleycrouse@gmail.com", 
                    password = "uchicago1" )
  
  Sys.setenv(DISGENET_API_KEY= disgenet_api_key)
  
  res_enrich <-disease_enrichment(entities=genes, vocabulary = "HGNC",
                               database = "CURATED" )
  
  df <- res_enrich@qresult[1:10, c("Description", "FDR", "Ratio",  "BgRatio")]
  print(df)
  
  #WebGestalt enrichment
  library(WebGestaltR)
  
  background <- ctwas_gene_res$genename
  
  #listGeneSet()
  databases <- c("pathway_KEGG", "disease_GLAD4U", "disease_OMIM")
  
  enrichResult <- WebGestaltR(enrichMethod="ORA", organism="hsapiens",
                              interestGene=genes, referenceGene=background,
                              enrichDatabase=databases, interestGeneType="genesymbol",
                              referenceGeneType="genesymbol", isOutput=F)
  print(enrichResult[,c("description", "size", "overlap", "FDR", "database", "userId")])
}
Uploading data to Enrichr... Done.
  Querying GO_Biological_Process_2021... Done.
  Querying GO_Cellular_Component_2021... Done.
  Querying GO_Molecular_Function_2021... Done.
Parsing results... Done.
[1] "GO_Biological_Process_2021"
                                                                                            Term
1                                                          sterol metabolic process (GO:0016125)
2                                                     cholesterol metabolic process (GO:0008203)
3                                                         ethanol catabolic process (GO:0006068)
4                                                           SREBP signaling pathway (GO:0032933)
5                                             cellular response to sterol depletion (GO:0071501)
6                                                 primary alcohol catabolic process (GO:0034310)
7                      maintenance of protein localization in endoplasmic reticulum (GO:0035437)
8                                                       cellular response to sterol (GO:0036315)
9                                                                         sulfation (GO:0051923)
10 positive regulation of cyclin-dependent protein serine/threonine kinase activity (GO:0045737)
11                                               regulation of spindle organization (GO:0090224)
12                                                        ethanol metabolic process (GO:0006067)
13                             purine ribonucleoside bisphosphate metabolic process (GO:0034035)
14                  positive regulation of cyclin-dependent protein kinase activity (GO:1904031)
15                                                       lipid droplet organization (GO:0034389)
16                          3'-phosphoadenosine 5'-phosphosulfate metabolic process (GO:0050427)
17                                   positive regulation of lipid metabolic process (GO:0045834)
18            positive regulation of microtubule polymerization or depolymerization (GO:0031112)
19                                positive regulation of microtubule polymerization (GO:0031116)
20                                           secondary alcohol biosynthetic process (GO:1902653)
21                                                 cholesterol biosynthetic process (GO:0006695)
22                                positive regulation of lipid biosynthetic process (GO:0046889)
23                                       regulation of mitotic spindle organization (GO:0060236)
24                                         regulation of lipid biosynthetic process (GO:0046890)
25                                                        oxoacid metabolic process (GO:0043436)
26                                                      sterol biosynthetic process (GO:0016126)
27                                         regulation of microtubule polymerization (GO:0031113)
28                                      positive regulation of biosynthetic process (GO:0009891)
29                                                     microtubule bundle formation (GO:0001578)
30                                              secondary alcohol metabolic process (GO:1902652)
31                                    organic hydroxy compound biosynthetic process (GO:1901617)
32                                              mitotic metaphase plate congression (GO:0007080)
33                                                     steroid biosynthetic process (GO:0006694)
34                                                positive regulation of cell cycle (GO:0045787)
35                                    positive regulation of protein polymerization (GO:0032273)
36                                          purine ribonucleotide metabolic process (GO:0009150)
37          regulation of cyclin-dependent protein serine/threonine kinase activity (GO:0000079)
38                                                              response to insulin (GO:0032868)
39                         positive regulation of supramolecular fiber organization (GO:1902905)
40                                             mitotic sister chromatid segregation (GO:0000070)
41                                                        steroid metabolic process (GO:0008202)
42                  positive regulation of protein serine/threonine kinase activity (GO:0071902)
43                                    cellular response to peptide hormone stimulus (GO:0071375)
44                                               negative regulation of cell growth (GO:0030308)
45                                                    negative regulation of growth (GO:0045926)
46                                            cellular response to insulin stimulus (GO:0032869)
47                                                        regulation of cell growth (GO:0001558)
   Overlap Adjusted.P.value          Genes
1     2/70      0.002270902 INSIG2;SULT2A1
2     2/77      0.002270902 INSIG2;SULT2A1
3      1/7      0.014522738        SULT2A1
4      1/8      0.014522738         INSIG2
5      1/9      0.014522738         INSIG2
6     1/10      0.014522738        SULT2A1
7     1/12      0.014522738         INSIG2
8     1/16      0.014522738         INSIG2
9     1/17      0.014522738        SULT2A1
10    1/17      0.014522738          PSRC1
11    1/18      0.014522738          PSRC1
12    1/19      0.014522738        SULT2A1
13    1/19      0.014522738        SULT2A1
14    1/20      0.014522738          PSRC1
15    1/22      0.014522738       HSD17B13
16    1/25      0.014522738        SULT2A1
17    1/25      0.014522738       HSD17B13
18    1/26      0.014522738          PSRC1
19    1/28      0.014522738          PSRC1
20    1/34      0.014522738         INSIG2
21    1/35      0.014522738         INSIG2
22    1/35      0.014522738       HSD17B13
23    1/35      0.014522738          PSRC1
24    1/35      0.014522738       HSD17B13
25    1/35      0.014522738        SULT2A1
26    1/38      0.015157694         INSIG2
27    1/40      0.015362222          PSRC1
28    1/44      0.016084577       HSD17B13
29    1/45      0.016084577          PSRC1
30    1/49      0.016512774        SULT2A1
31    1/50      0.016512774         INSIG2
32    1/51      0.016512774          PSRC1
33    1/65      0.020089839         INSIG2
34    1/66      0.020089839          PSRC1
35    1/69      0.020398338          PSRC1
36    1/77      0.022117773        SULT2A1
37    1/82      0.022846560          PSRC1
38    1/84      0.022846560         INSIG2
39    1/91      0.024103153          PSRC1
40   1/102      0.025435818          PSRC1
41   1/104      0.025435818        SULT2A1
42   1/106      0.025435818          PSRC1
43   1/106      0.025435818         INSIG2
44   1/126      0.028847918          PSRC1
45   1/126      0.028847918          PSRC1
46   1/129      0.028886207         INSIG2
47   1/217      0.047244425          PSRC1
[1] "GO_Cellular_Component_2021"
                                                       Term Overlap
1                          spindle microtubule (GO:0005876)    1/61
2                                lipid droplet (GO:0005811)    1/77
3 intracellular non-membrane-bounded organelle (GO:0043232)  2/1158
  Adjusted.P.value          Genes
1       0.04955318          PSRC1
2       0.04955318       HSD17B13
3       0.04955318 PSRC1;HSD17B13
[1] "GO_Molecular_Function_2021"
                                                                                                Term
1                                                                     oxysterol binding (GO:0008142)
2                                                             sulfotransferase activity (GO:0008146)
3                                                                        sterol binding (GO:0032934)
4 oxidoreductase activity, acting on the CH-OH group of donors, NAD or NADP as acceptor (GO:0016616)
  Overlap Adjusted.P.value    Genes
1     1/7      0.008396047   INSIG2
2    1/40      0.023893772  SULT2A1
3    1/60      0.023893772   INSIG2
4    1/87      0.025931914 HSD17B13
HSD17B13 gene(s) from the input list not found in DisGeNET CURATEDPSRC1 gene(s) from the input list not found in DisGeNET CURATED
                        Description        FDR Ratio  BgRatio
1     Liver Cirrhosis, Experimental 0.02543268   2/2 774/9703
3                       Weight Gain 0.04183427   1/2 102/9703
2               Prostatic Neoplasms 0.12295901   1/2 616/9703
4    Malignant neoplasm of prostate 0.12295901   1/2 616/9703
NA                             <NA>         NA  <NA>     <NA>
NA.1                           <NA>         NA  <NA>     <NA>
NA.2                           <NA>         NA  <NA>     <NA>
NA.3                           <NA>         NA  <NA>     <NA>
NA.4                           <NA>         NA  <NA>     <NA>
NA.5                           <NA>         NA  <NA>     <NA>
******************************************

*                                        *

*          Welcome to WebGestaltR !      *

*                                        *

******************************************
Loading the functional categories...
Loading the ID list...
Loading the reference list...
Performing the enrichment analysis...
Warning in oraEnrichment(interestGeneList, referenceGeneList, geneSet,
minNum = minNum, : No significant gene set is identified based on FDR 0.05!
NULL

sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] WebGestaltR_0.4.4 disgenet2r_0.99.2 enrichR_3.0       cowplot_1.0.0    
[5] ggplot2_3.3.3    

loaded via a namespace (and not attached):
  [1] bitops_1.0-6                matrixStats_0.57.0         
  [3] fs_1.3.1                    bit64_4.0.5                
  [5] doParallel_1.0.16           progress_1.2.2             
  [7] httr_1.4.1                  rprojroot_2.0.2            
  [9] GenomeInfoDb_1.20.0         doRNG_1.8.2                
 [11] tools_3.6.1                 utf8_1.2.1                 
 [13] R6_2.5.0                    DBI_1.1.1                  
 [15] BiocGenerics_0.30.0         colorspace_1.4-1           
 [17] withr_2.4.1                 tidyselect_1.1.0           
 [19] prettyunits_1.0.2           bit_4.0.4                  
 [21] curl_3.3                    compiler_3.6.1             
 [23] git2r_0.26.1                Biobase_2.44.0             
 [25] DelayedArray_0.10.0         rtracklayer_1.44.0         
 [27] labeling_0.3                scales_1.1.0               
 [29] readr_1.4.0                 apcluster_1.4.8            
 [31] stringr_1.4.0               digest_0.6.20              
 [33] Rsamtools_2.0.0             svglite_1.2.2              
 [35] rmarkdown_1.13              XVector_0.24.0             
 [37] pkgconfig_2.0.3             htmltools_0.3.6            
 [39] fastmap_1.1.0               BSgenome_1.52.0            
 [41] rlang_0.4.11                RSQLite_2.2.7              
 [43] generics_0.0.2              farver_2.1.0               
 [45] jsonlite_1.6                BiocParallel_1.18.0        
 [47] dplyr_1.0.7                 VariantAnnotation_1.30.1   
 [49] RCurl_1.98-1.1              magrittr_2.0.1             
 [51] GenomeInfoDbData_1.2.1      Matrix_1.2-18              
 [53] Rcpp_1.0.6                  munsell_0.5.0              
 [55] S4Vectors_0.22.1            fansi_0.5.0                
 [57] gdtools_0.1.9               lifecycle_1.0.0            
 [59] stringi_1.4.3               whisker_0.3-2              
 [61] yaml_2.2.0                  SummarizedExperiment_1.14.1
 [63] zlibbioc_1.30.0             plyr_1.8.4                 
 [65] grid_3.6.1                  blob_1.2.1                 
 [67] parallel_3.6.1              promises_1.0.1             
 [69] crayon_1.4.1                lattice_0.20-38            
 [71] Biostrings_2.52.0           GenomicFeatures_1.36.3     
 [73] hms_1.1.0                   knitr_1.23                 
 [75] pillar_1.6.1                igraph_1.2.4.1             
 [77] GenomicRanges_1.36.0        rjson_0.2.20               
 [79] rngtools_1.5                codetools_0.2-16           
 [81] reshape2_1.4.3              biomaRt_2.40.1             
 [83] stats4_3.6.1                XML_3.98-1.20              
 [85] glue_1.4.2                  evaluate_0.14              
 [87] data.table_1.14.0           foreach_1.5.1              
 [89] vctrs_0.3.8                 httpuv_1.5.1               
 [91] gtable_0.3.0                purrr_0.3.4                
 [93] assertthat_0.2.1            cachem_1.0.5               
 [95] xfun_0.8                    later_0.8.0                
 [97] tibble_3.1.2                iterators_1.0.13           
 [99] GenomicAlignments_1.20.1    AnnotationDbi_1.46.0       
[101] memoise_2.0.0               IRanges_2.18.1             
[103] workflowr_1.6.2             ellipsis_0.3.2