Last updated: 2023-03-21
Checks: 6 1
Knit directory: ctwas_applied/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of
the R Markdown file created these results, you’ll want to first commit
it to the Git repo. If you’re still working on the analysis, you can
ignore this warning. When you’re finished, you can run
wflow_publish
to commit the R Markdown file and build the
HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20210726)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version fe2dd30. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Untracked files:
Untracked: LDL_all_tissue_PIPs.csv
Untracked: gwas.RData
Untracked: ld_R_info.RData
Untracked: z_snp_pos_ebi-a-GCST004131.RData
Untracked: z_snp_pos_ebi-a-GCST004132.RData
Untracked: z_snp_pos_ebi-a-GCST004133.RData
Untracked: z_snp_pos_scz-2018.RData
Untracked: z_snp_pos_ukb-a-360.RData
Untracked: z_snp_pos_ukb-d-30780_irnt.RData
Unstaged changes:
Modified: analysis/multigroup_testing.Rmd
Modified: code/automate_Rmd.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/multigroup_testing.Rmd
)
and HTML (docs/multigroup_testing.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | fe2dd30 | wesleycrouse | 2023-03-21 | all tissue results! |
Rmd | 903d30e | wesleycrouse | 2023-02-04 | removing duplicated section |
html | 903d30e | wesleycrouse | 2023-02-04 | removing duplicated section |
Rmd | c688c02 | wesleycrouse | 2023-02-04 | updating reports |
html | c688c02 | wesleycrouse | 2023-02-04 | updating reports |
Rmd | 88f9698 | wesleycrouse | 2023-02-01 | multigroup simulation results |
html | 88f9698 | wesleycrouse | 2023-02-01 | multigroup simulation results |
Rmd | 6596a6d | wesleycrouse | 2023-01-19 | adjusting SBP figure and adding Venn |
Rmd | a1b9876 | wesleycrouse | 2023-01-18 | fixing error with combined pips |
html | a1b9876 | wesleycrouse | 2023-01-18 | fixing error with combined pips |
Rmd | c03b0a4 | wesleycrouse | 2023-01-10 | separate vs individual analyses |
html | c03b0a4 | wesleycrouse | 2023-01-10 | separate vs individual analyses |
Rmd | 05269c3 | wesleycrouse | 2023-01-10 | gene-level results for all cases |
html | 05269c3 | wesleycrouse | 2023-01-10 | gene-level results for all cases |
Rmd | e2d92de | wesleycrouse | 2023-01-09 | adding gene-level results |
html | 32cc717 | wesleycrouse | 2023-01-08 | fixing case 9+10 |
Rmd | e7e2b73 | wesleycrouse | 2023-01-06 | more results |
html | e7e2b73 | wesleycrouse | 2023-01-06 | more results |
Rmd | 9c0d0e7 | wesleycrouse | 2023-01-06 | adding more plots |
html | 9c0d0e7 | wesleycrouse | 2023-01-06 | adding more plots |
Rmd | 70f3460 | wesleycrouse | 2023-01-05 | names in legends |
html | 70f3460 | wesleycrouse | 2023-01-05 | names in legends |
Rmd | 9869b19 | wesleycrouse | 2023-01-05 | tinkering with figure legend |
html | 9869b19 | wesleycrouse | 2023-01-05 | tinkering with figure legend |
Rmd | d57dfc3 | wesleycrouse | 2023-01-05 | trying to fix legends |
html | d57dfc3 | wesleycrouse | 2023-01-05 | trying to fix legends |
html | 93168e6 | wesleycrouse | 2023-01-05 | fixing plots? |
Rmd | 18962d7 | wesleycrouse | 2023-01-04 | updating results |
html | 18962d7 | wesleycrouse | 2023-01-04 | updating results |
Rmd | d8772b8 | wesleycrouse | 2022-12-20 | updating multitrait report |
html | d8772b8 | wesleycrouse | 2022-12-20 | updating multitrait report |
Rmd | 506b1ae | wesleycrouse | 2022-12-16 | added case 5 |
html | 506b1ae | wesleycrouse | 2022-12-16 | added case 5 |
Rmd | 356cb9b | wesleycrouse | 2022-12-15 | multigroup testing |
html | 356cb9b | wesleycrouse | 2022-12-15 | multigroup testing |
Rmd | 9e7d4cd | wesleycrouse | 2022-12-15 | multigroup testing results |
html | 9e7d4cd | wesleycrouse | 2022-12-15 | multigroup testing results |
analysis_id <- "ukb-d-30780_irnt_multigroup"
trait_id <- "ukb-d-30780_irnt"
traits <- read.csv("/project2/mstephens/wcrouse/UKB_analysis_known_anno/ukbb_neale_v3_known_annotations.csv", head=F)
colnames(traits) <- c("trait_name", "ieu_id", "weight")
trait_name <- traits$trait_name[match(trait_id, traits$ieu_id)]
source("/project2/mstephens/wcrouse/ctwas_multigroup_testing/ctwas_config.R")
Compare with Case 2 (shared/separate prior parameters).
results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case3")
weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db"
weight <- unlist(strsplit(weight, ";"))
#load information for all genes
gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())
for (i in 1:length(weight)){
sqlite <- RSQLite::dbDriver("SQLite")
db = RSQLite::dbConnect(sqlite, weight[i])
query <- function(...) RSQLite::dbGetQuery(db, ...)
gene_info_current <- query("select gene, genename, gene_type from extra")
RSQLite::dbDisconnect(db)
gene_info_current$weight <- weight[i]
gene_info <- rbind(gene_info, gene_info_current)
}
gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})
gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")
#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))
#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")
#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))
sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))
#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size
#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)
#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)
ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])
#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z
z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]
#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA
ctwas_res <- rbind(ctwas_gene_res,
ctwas_snp_res[,colnames(ctwas_gene_res)])
#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)
#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])
library(ggplot2)
library(cowplot)
load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))
#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument
#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec
#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]
#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation
#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))
title_size <- 12
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi)) +
ggtitle("Proportion Causal") +
theme_cowplot()
p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(sigma^2)) +
ggtitle("Effect Size") +
theme_cowplot()
p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(h^2[G])) +
ggtitle("PVE") +
theme_cowplot()
p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
ggtitle("Enrichment") +
theme_cowplot()
p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
plot_grid(p_pi, p_sigma2, p_enrich, p_pve)
####################
#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
SNP Liver Adipose_Subcutaneous
9.014951e-05 7.943610e-03 1.653321e-02
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
SNP Liver Adipose_Subcutaneous
16.908160 48.882570 6.014357
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 126.1567
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
SNP Liver Adipose_Subcutaneous
8696600 11003 13082
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
SNP Liver Adipose_Subcutaneous
0.038577093 0.012433786 0.003785652
#total PVE
sum(estimated_group_pve)
[1] 0.05479653
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
SNP Liver Adipose_Subcutaneous
0.70400612 0.22690827 0.06908562
#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
genename group region_tag susie_pip mu2
4435 PSRC1 Liver 1_67 1.0000000 1669.97618
2454 ST3GAL4 Liver 11_77 1.0000000 170.97506
12008 HPR Liver 16_38 0.9999974 156.57752
3721 INSIG2 Liver 2_69 0.9999934 68.61282
19759 ZDHHC7 Adipose_Subcutaneous 16_49 0.9995748 28.12889
5991 FADS1 Liver 11_34 0.9993538 163.18332
12687 RP4-781K5.7 Liver 1_121 0.9943723 202.13879
5563 ABCG8 Liver 2_27 0.9943320 311.88467
7410 ABCA1 Liver 9_53 0.9925506 70.70277
1999 PRKD2 Liver 19_33 0.9847824 29.89388
8531 TNKS Liver 8_12 0.9825262 77.45465
13717 TPD52 Adipose_Subcutaneous 8_57 0.9814059 21.06905
9390 GAS6 Liver 13_62 0.9803514 73.89025
1597 PLTP Liver 20_28 0.9775544 59.24311
5544 CNIH4 Liver 1_114 0.9743418 41.39386
3755 RRBP1 Liver 20_13 0.9719348 32.36358
11790 CYP2A6 Liver 19_28 0.9693535 29.87336
7040 INHBB Liver 2_70 0.9651557 74.16214
18637 ABCA8 Adipose_Subcutaneous 17_39 0.9558067 27.28313
25057 SIPA1 Adipose_Subcutaneous 11_36 0.9482173 22.29943
6093 CSNK1G3 Liver 5_75 0.9459871 83.36709
8579 STAT5B Liver 17_25 0.9418229 30.95090
2092 SP4 Liver 7_19 0.9299248 102.14354
15027 PLPPR2 Adipose_Subcutaneous 19_10 0.9297349 26.56239
6391 TTC39B Liver 9_13 0.9260892 23.56077
14913 FCGRT Adipose_Subcutaneous 19_34 0.9226574 20.62923
16361 CCDC92 Adipose_Subcutaneous 12_75 0.9184631 25.75399
15242 CCNJ Adipose_Subcutaneous 10_61 0.9161743 20.53548
8865 FUT2 Liver 19_33 0.9059564 70.62362
17109 POR Adipose_Subcutaneous 7_48 0.8988319 31.79763
21459 PROCA1 Adipose_Subcutaneous 17_17 0.8980242 26.41207
3562 ACVR1C Liver 2_94 0.8864262 25.08450
10657 TRIM39 Liver 6_26 0.8823124 82.45322
4704 DDX56 Liver 7_33 0.8805617 57.91924
6957 USP1 Liver 1_39 0.8781087 255.86929
18314 FAM117B Adipose_Subcutaneous 2_120 0.8638575 36.68997
6100 ALLC Liver 2_2 0.8446148 28.32175
13346 SPHK2 Adipose_Subcutaneous 19_33 0.8363777 34.76342
24743 PCMTD2 Adipose_Subcutaneous 20_38 0.8314753 28.89377
14273 SCD Adipose_Subcutaneous 10_64 0.8185682 19.92754
18041 IL1RN Adipose_Subcutaneous 2_67 0.8157131 21.52025
18737 NTN5 Adipose_Subcutaneous 19_33 0.8143027 54.80574
PVE z
4435 4.859936e-03 -41.687336
2454 4.975687e-04 13.376072
12008 4.556680e-04 -17.962770
3721 1.996746e-04 -8.982702
19759 8.182541e-05 -5.184802
5991 4.745865e-04 12.926351
12687 5.849503e-04 -15.108415
5563 9.024969e-04 -20.293982
7410 2.042252e-04 7.982017
1999 8.567278e-05 5.072217
8531 2.214685e-04 11.038564
13717 6.017471e-05 -4.684363
9390 2.108090e-04 -8.923688
1597 1.685385e-04 -5.732491
5544 1.173728e-04 6.145535
3755 9.154065e-05 7.008305
11790 8.427263e-05 5.407028
7040 2.083051e-04 -8.518936
18637 7.589001e-05 4.800447
25057 6.153497e-05 -5.096493
6093 2.295092e-04 9.116291
8579 8.483262e-05 5.426252
2092 2.764261e-04 10.693191
15027 7.186983e-05 3.965665
6391 6.349836e-05 -4.334495
14913 5.539158e-05 -4.347956
16361 6.883773e-05 -5.328046
15242 5.475241e-05 -4.639318
8865 1.861991e-04 -11.927107
17109 8.317515e-05 6.044322
21459 6.902569e-05 5.543282
3562 6.470955e-05 -4.687370
10657 2.117144e-04 8.840164
4704 1.484236e-04 9.641861
6957 6.538630e-04 16.258211
18314 9.223799e-05 7.852653
6100 6.961439e-05 4.919066
13346 8.461459e-05 -8.721460
24743 6.991557e-05 -5.636854
14273 4.747105e-05 -4.541468
18041 5.108638e-05 4.455379
18737 1.298770e-04 11.132252
#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")
#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]
#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")
df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names
for (i in 1:nrow(df_gene)){
gene <- df_gene$genename[i]
ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}
df_gene <- cbind(df_gene, df_gene_pips)
#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]
#abbreviate column names
df_gene <- dplyr::rename(df_gene,
Adi_Sub="Adipose_Subcutaneous",
Brain_Cer="Brain_Cerebellum",
Adi_Vis="Adipose_Visceral_Omentum",
Blood="Whole_Blood")
df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]
#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
genename combined_pip Adi_Sub Liver
290 ACVR1C 1.2821451 3.957190e-01 0.886426187
12546 ST3GAL4 1.0408713 4.087127e-02 0.999999993
9618 PSRC1 1.0329403 3.294030e-02 1.000000000
1409 C10orf88 1.0324476 2.351908e-01 0.797256743
2582 CNIH4 1.0263117 5.196981e-02 0.974341842
4697 GAS6 1.0252303 4.487895e-02 0.980351368
5734 INHBB 1.0242873 5.913165e-02 0.965155688
14503 ZDHHC7 1.0232406 9.995748e-01 0.023665871
5464 HPR 1.0166104 1.661304e-02 0.999997354
2864 CSNK1G3 1.0136737 6.768661e-02 0.945987078
5750 INSIG2 1.0133149 1.332146e-02 0.999993445
6035 KDSR 1.0028532 5.540887e-01 0.448764468
3125 CYP2A6 1.0014824 3.212884e-02 0.969353536
4086 FADS1 0.9993538 NA 0.999353840
8854 PELO 0.9979494 2.326841e-01 0.765265259
11311 RRBP1 0.9945557 2.262086e-02 0.971934797
11157 RP4-781K5.7 0.9943723 NA 0.994372303
61 ABCG8 0.9943320 NA 0.994331953
26 ABCA1 0.9925506 NA 0.992550635
9167 PLTP 0.9900735 1.251913e-02 0.977554352
9473 PRKD2 0.9892652 4.482861e-03 0.984782375
13505 TPD52 0.9891353 9.814059e-01 0.007729362
13436 TNKS 0.9825262 NA 0.982526205
12471 SPTY2D1 0.9768431 4.886837e-01 0.488159443
12506 SRRT 0.9757710 4.263893e-01 0.549381704
12588 STAT5B 0.9748353 3.301246e-02 0.941822874
12353 SP4 0.9734963 4.357146e-02 0.929924805
13763 TTC39B 0.9716737 4.558453e-02 0.926089174
6154 KLHDC7A 0.9693547 6.579458e-01 0.311408861
31 ABCA8 0.9558067 9.558067e-01 NA
11812 SIPA1 0.9541853 9.482173e-01 0.005967923
2014 CCND2 0.9525749 7.250596e-01 0.227515311
9055 PKN3 0.9395463 4.786795e-01 0.460866780
9161 PLPPR2 0.9349051 9.297349e-01 0.005170169
14045 USP1 0.9339745 5.586579e-02 0.878108668
1978 CCDC92 0.9324492 9.184631e-01 0.013986025
4386 FCGRT 0.9253876 9.226574e-01 0.002730238
2025 CCNJ 0.9215620 9.161743e-01 0.005387656
569 ALLC 0.9204296 7.581480e-02 0.844614756
3070 CWF19L1 0.9152537 4.971167e-01 0.418137016
4613 FUT2 0.9059564 NA 0.905956407
9491 PROCA1 0.9052901 8.980242e-01 0.007265855
8749 PCMTD2 0.9050256 8.314753e-01 0.073550293
9282 POR 0.9045059 8.988319e-01 0.005673998
3290 DDX56 0.9043148 2.375309e-02 0.880561708
13607 TRIM39 0.8823125 9.626981e-08 0.882312357
4114 FAM117B 0.8638575 8.638575e-01 NA
12330 SORCS2 0.8570663 7.657866e-01 0.091279663
12414 SPHK2 0.8521159 8.363777e-01 0.015738193
12766 SYTL1 0.8431171 4.316092e-02 0.799956216
6610 LINC01184 0.8410288 6.047286e-01 0.236300159
1346 BRI3 0.8302506 6.192655e-02 0.768324031
11479 SCD 0.8185682 8.185682e-01 NA
5688 IL1RN 0.8157131 8.157131e-01 NA
8341 NTN5 0.8143027 8.143027e-01 NA
#store gene list
liver_and_adipose_genes <- df_gene$genename[df_gene$combined_pip>0.8]
Compare with Case 4 (Liver + Adipose/Cerebellum)
results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case5")
weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Brain_Cerebellum.db"
weight <- unlist(strsplit(weight, ";"))
#load information for all genes
gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())
for (i in 1:length(weight)){
sqlite <- RSQLite::dbDriver("SQLite")
db = RSQLite::dbConnect(sqlite, weight[i])
query <- function(...) RSQLite::dbGetQuery(db, ...)
gene_info_current <- query("select gene, genename, gene_type from extra")
RSQLite::dbDisconnect(db)
gene_info_current$weight <- weight[i]
gene_info <- rbind(gene_info, gene_info_current)
}
gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})
gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")
#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))
#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")
#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))
sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))
#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size
#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)
#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)
ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])
#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z
z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]
#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA
ctwas_res <- rbind(ctwas_gene_res,
ctwas_snp_res[,colnames(ctwas_gene_res)])
#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)
#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])
library(ggplot2)
library(cowplot)
load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))
#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument
#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec
#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]
#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation
#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))
title_size <- 12
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi)) +
ggtitle("Proportion Causal") +
theme_cowplot()
p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(sigma^2)) +
ggtitle("Effect Size") +
theme_cowplot()
p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(h^2[G])) +
ggtitle("PVE") +
theme_cowplot()
p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
ggtitle("Enrichment") +
theme_cowplot()
p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
plot_grid(p_pi, p_sigma2, p_enrich, p_pve)
####################
#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
SNP Liver Brain_Cerebellum
0.0001060227 0.0099041093 0.0120131986
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
SNP Liver Brain_Cerebellum
15.42194 44.38276 6.97870
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 81.65379
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
SNP Liver Brain_Cerebellum
8696600 11003 12270
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
SNP Liver Brain_Cerebellum
0.04138165 0.01407541 0.00299363
#total PVE
sum(estimated_group_pve)
[1] 0.05845069
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
SNP Liver Brain_Cerebellum
0.70797534 0.24080833 0.05121632
#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
genename group region_tag susie_pip mu2 PVE
4435 PSRC1 Liver 1_67 1.0000000 1663.38029 4.840741e-03
2454 ST3GAL4 Liver 11_77 1.0000000 171.63510 4.994895e-04
12008 HPR Liver 16_38 0.9999928 157.37765 4.579945e-04
3721 INSIG2 Liver 2_69 0.9999924 67.94668 1.977358e-04
5991 FADS1 Liver 11_34 0.9994062 162.52352 4.726923e-04
5563 ABCG8 Liver 2_27 0.9967677 310.79801 9.015555e-04
12687 RP4-781K5.7 Liver 1_121 0.9962045 202.36899 5.866955e-04
7410 ABCA1 Liver 9_53 0.9936425 70.41461 2.036166e-04
1999 PRKD2 Liver 19_33 0.9856859 30.10524 8.635767e-05
1597 PLTP Liver 20_28 0.9853706 60.30472 1.729303e-04
8531 TNKS Liver 8_12 0.9853313 76.52958 2.194481e-04
9390 GAS6 Liver 13_62 0.9847160 72.58384 2.080038e-04
5544 CNIH4 Liver 1_114 0.9829436 42.23904 1.208267e-04
3755 RRBP1 Liver 20_13 0.9780234 32.12821 9.144418e-05
7040 INHBB Liver 2_70 0.9711195 74.15764 2.095795e-04
17721 RAC1 Brain_Cerebellum 7_9 0.9663013 30.07597 8.457705e-05
11790 CYP2A6 Liver 19_29 0.9644192 31.75991 8.913852e-05
3247 KDSR Liver 18_35 0.9631438 24.75544 6.938763e-05
6093 CSNK1G3 Liver 5_75 0.9567889 84.68297 2.357939e-04
18334 ABCA8 Brain_Cerebellum 17_39 0.9544748 23.30104 6.472322e-05
24413 SIPA1 Brain_Cerebellum 11_36 0.9524771 24.52172 6.797134e-05
2092 SP4 Liver 7_19 0.9435848 101.94755 2.799484e-04
3300 C10orf88 Liver 10_77 0.9304447 36.98123 1.001365e-04
4704 DDX56 Liver 7_32 0.9292350 58.58285 1.584223e-04
6391 TTC39B Liver 9_13 0.9229868 23.53639 6.322016e-05
6778 PKN3 Liver 9_66 0.9203992 46.69957 1.250862e-04
8865 FUT2 Liver 19_33 0.9125557 98.15762 2.606776e-04
18390 P3H4 Brain_Cerebellum 17_25 0.9111093 21.08931 5.591819e-05
3562 ACVR1C Liver 2_94 0.9071780 24.93987 6.584261e-05
10657 TRIM39 Liver 6_25 0.9064821 81.99548 2.163064e-04
8579 STAT5B Liver 17_25 0.8983052 26.93111 7.040420e-05
6957 USP1 Liver 1_39 0.8836416 253.55963 6.520435e-04
1114 SRRT Liver 7_62 0.8723166 32.51974 8.255465e-05
9062 KLHDC7A Liver 1_13 0.8589053 21.72416 5.430111e-05
9072 SPTY2D1 Liver 11_13 0.8422930 32.59647 7.990135e-05
6100 ALLC Liver 2_2 0.8393685 27.74235 6.776668e-05
8418 POP7 Liver 7_62 0.8310774 41.08047 9.935672e-05
6220 PELO Liver 5_31 0.8295163 72.78264 1.757005e-04
21013 PROCA1 Brain_Cerebellum 17_17 0.8127077 29.05054 6.870826e-05
z
4435 -41.687336
2454 13.376072
12008 -17.962770
3721 -8.982702
5991 12.926351
5563 -20.293982
12687 -15.108415
7410 7.982017
1999 5.072217
1597 -5.732491
8531 11.038564
9390 -8.923688
5544 6.145535
3755 7.008305
7040 -8.518936
17721 5.713928
11790 5.407028
3247 -4.526287
6093 9.116291
18334 4.521803
24413 -5.477878
2092 10.693191
3300 -6.787850
4704 9.641861
6391 -4.334495
6778 -6.620563
8865 -11.927107
18390 4.891111
3562 -4.687370
10657 8.840164
8579 5.426252
6957 16.258211
1114 5.424996
9062 4.124187
9072 -5.557123
6100 4.919066
8418 -5.845258
6220 8.288398
21013 5.628065
#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")
#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]
#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")
df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names
for (i in 1:nrow(df_gene)){
gene <- df_gene$genename[i]
ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}
df_gene <- cbind(df_gene, df_gene_pips)
#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]
#abbreviate column names
df_gene <- dplyr::rename(df_gene,
Adi_Sub="Adipose_Subcutaneous",
Brain_Cer="Brain_Cerebellum",
Adi_Vis="Adipose_Visceral_Omentum",
Blood="Whole_Blood")
df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]
#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
genename combined_pip Liver Brain_Cer
296 ACVR1C 1.2524283 0.907177976 3.452503e-01
2564 CNIH4 1.0774810 0.982943607 9.453734e-02
12369 ST3GAL4 1.0517561 0.999999996 5.175609e-02
9504 PSRC1 1.0401559 1.000000000 4.015593e-02
11169 RRBP1 1.0343872 0.978023352 5.636382e-02
4669 GAS6 1.0304715 0.984716041 4.575543e-02
8955 PKN3 1.0206026 0.920399200 1.002034e-01
4065 FADS1 1.0205579 0.999406168 2.115178e-02
5417 HPR 1.0080726 0.999992791 8.079809e-03
5706 INSIG2 1.0079276 0.999992391 7.935239e-03
11559 SGMS1 0.9978836 0.789690387 2.081932e-01
9363 PRKD2 0.9973720 0.985685902 1.168607e-02
5987 KDSR 0.9972987 0.963143807 3.415493e-02
57 ABCG8 0.9967693 0.996767651 1.614202e-06
11006 RP4-781K5.7 0.9962045 0.996204548 NA
25 ABCA1 0.9936425 0.993642546 NA
2843 CSNK1G3 0.9920704 0.956788949 3.528140e-02
9066 PLTP 0.9894597 0.985370574 4.089157e-03
13228 TNKS 0.9853313 0.985331256 NA
9701 RAC1 0.9715526 0.005251282 9.663013e-01
5687 INHBB 0.9711195 0.971119518 NA
3130 CYP2A6 0.9644192 0.964419208 NA
12179 SP4 0.9644114 0.943584837 2.082654e-02
8758 PELO 0.9620666 0.829516273 1.325503e-01
11647 SIPA1 0.9589754 0.006498239 9.524771e-01
3289 DDX56 0.9579050 0.929235029 2.866996e-02
31 ABCA8 0.9544748 NA 9.544748e-01
1384 C10orf88 0.9502566 0.930444679 1.981197e-02
6104 KLHDC7A 0.9412314 0.858905318 8.232609e-02
1988 CCND2 0.9369342 0.416548920 5.203853e-01
5314 HIST1H2BH 0.9328322 0.451462861 4.813694e-01
13546 TTC39B 0.9229868 0.922986846 NA
12414 STAT5B 0.9210178 0.898305230 2.271252e-02
8477 P3H4 0.9202486 0.009139335 9.111093e-01
12333 SRRT 0.9138895 0.872316636 4.157289e-02
4577 FUT2 0.9125557 0.912555657 NA
5074 GSK3B 0.9109870 0.244388498 6.665985e-01
12597 SYTL1 0.9102862 0.711966328 1.983199e-01
569 ALLC 0.9093132 0.839368481 6.994473e-02
13392 TRIM39 0.9064892 0.906482144 7.101999e-06
12298 SPTY2D1 0.9012311 0.842293022 5.893807e-02
13820 USP1 0.8836416 0.883641569 NA
2464 CLDN23 0.8814676 0.782048359 9.941925e-02
3472 DNAJC13 0.8560228 0.062864965 7.931578e-01
9172 POP7 0.8310774 0.831077434 NA
1320 BRI3 0.8207590 0.792207382 2.855163e-02
9381 PROCA1 0.8204752 0.007767419 8.127077e-01
2357 CHKB 0.8057504 0.011039641 7.947108e-01
3530 DOPEY2 0.8056237 0.011200336 7.944234e-01
13849 USP47 0.8006220 0.082589523 7.180325e-01
Compare with Case 3 (Liver + Adipose) and Case 4 (Liver + Cerebellum)
results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case6")
weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db;/project2/compbio/predictdb/mashr_models/mashr_Brain_Cerebellum.db"
weight <- unlist(strsplit(weight, ";"))
#load information for all genes
gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())
for (i in 1:length(weight)){
sqlite <- RSQLite::dbDriver("SQLite")
db = RSQLite::dbConnect(sqlite, weight[i])
query <- function(...) RSQLite::dbGetQuery(db, ...)
gene_info_current <- query("select gene, genename, gene_type from extra")
RSQLite::dbDisconnect(db)
gene_info_current$weight <- weight[i]
gene_info <- rbind(gene_info, gene_info_current)
}
gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})
gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")
#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))
#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")
#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))
sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))
#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size
#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)
#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)
ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])
#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z
z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]
#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA
ctwas_res <- rbind(ctwas_gene_res,
ctwas_snp_res[,colnames(ctwas_gene_res)])
#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)
#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])
library(ggplot2)
library(cowplot)
load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))
#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument
#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec
#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]
#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation
#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))
title_size <- 12
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi)) +
ggtitle("Proportion Causal") +
theme_cowplot()
p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(sigma^2)) +
ggtitle("Effect Size") +
theme_cowplot()
p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(h^2[G])) +
ggtitle("PVE") +
theme_cowplot()
p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
ggtitle("Enrichment") +
theme_cowplot()
p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
plot_grid(p_pi, p_sigma2, p_enrich, p_pve)
####################
#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
SNP Liver Adipose_Subcutaneous
7.471396e-05 7.790609e-03 1.420501e-02
Brain_Cerebellum
8.161595e-03
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
SNP Liver Adipose_Subcutaneous
18.528817 49.651565 6.443604
Brain_Cerebellum
7.617439
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 61.28883
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
SNP Liver Adipose_Subcutaneous
8696600 11003 13082
Brain_Cerebellum
12270
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
SNP Liver Adipose_Subcutaneous
0.035036382 0.012386133 0.003484695
Brain_Cerebellum
0.002219979
#total PVE
sum(estimated_group_pve)
[1] 0.05312719
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
SNP Liver Adipose_Subcutaneous
0.65948120 0.23314113 0.06559155
Brain_Cerebellum
0.04178612
#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
genename group region_tag susie_pip mu2
12008 HPR Liver 16_38 1.0000000 585.42258
4435 PSRC1 Liver 1_67 1.0000000 1671.14146
2454 ST3GAL4 Liver 11_77 1.0000000 170.35065
3721 INSIG2 Liver 2_69 0.9999913 68.12597
19759 ZDHHC7 Adipose_Subcutaneous 16_49 0.9995468 28.36788
5991 FADS1 Liver 11_34 0.9991968 162.69551
7410 ABCA1 Liver 9_53 0.9930025 70.73631
12687 RP4-781K5.7 Liver 1_121 0.9928625 202.03345
5563 ABCG8 Liver 2_27 0.9907498 311.75406
9390 GAS6 Liver 13_62 0.9833506 74.42010
8531 TNKS Liver 8_12 0.9831763 78.01659
1999 PRKD2 Liver 19_33 0.9814805 29.95863
13717 TPD52 Adipose_Subcutaneous 8_57 0.9784216 21.34420
1597 PLTP Liver 20_28 0.9783201 57.26119
5544 CNIH4 Liver 1_114 0.9781234 42.39614
3755 RRBP1 Liver 20_13 0.9684850 32.38409
11790 CYP2A6 Liver 19_29 0.9674857 30.49600
7040 INHBB Liver 2_70 0.9665361 74.28468
32453 RAC1 Brain_Cerebellum 7_10 0.9659120 30.79984
6093 CSNK1G3 Liver 5_75 0.9485627 83.92815
2092 SP4 Liver 7_19 0.9291370 102.33080
16361 CCDC92 Adipose_Subcutaneous 12_75 0.9225948 26.75929
6391 TTC39B Liver 9_13 0.9169620 23.68758
15242 CCNJ Adipose_Subcutaneous 10_61 0.9162282 20.86439
8865 FUT2 Liver 19_33 0.9127901 69.39636
8579 STAT5B Liver 17_25 0.8896792 27.48607
14913 FCGRT Adipose_Subcutaneous 19_34 0.8872476 21.01284
3562 ACVR1C Liver 2_94 0.8863188 25.00159
10657 TRIM39 Liver 6_26 0.8833627 82.49146
4704 DDX56 Liver 7_33 0.8797603 57.71798
6957 USP1 Liver 1_39 0.8688417 255.37139
6100 ALLC Liver 2_2 0.8575725 28.18804
33122 P3H4 Brain_Cerebellum 17_25 0.8481087 22.42262
17109 POR Adipose_Subcutaneous 7_48 0.8433223 33.65969
24743 PCMTD2 Adipose_Subcutaneous 20_38 0.8346979 29.12070
18041 IL1RN Adipose_Subcutaneous 2_67 0.8236861 21.69476
PVE z
12008 1.703687e-03 -17.962770
4435 4.863327e-03 -41.687336
2454 4.957516e-04 13.376072
3721 1.982573e-04 -8.982702
19759 8.251833e-05 -5.184802
5991 4.730934e-04 12.926351
7410 2.044151e-04 7.982017
12687 5.837578e-04 -15.108415
5563 8.988691e-04 -20.293982
9390 2.129703e-04 -8.923688
8531 2.232229e-04 11.038564
1999 8.557048e-05 5.072217
13717 6.077518e-05 -4.684363
1597 1.630278e-04 -5.732491
5544 1.206814e-04 6.145535
3755 9.127355e-05 7.008305
11790 8.586334e-05 5.407028
7040 2.089477e-04 -8.518936
32453 8.657776e-05 5.713928
6093 2.316829e-04 9.116291
2092 2.766983e-04 10.693191
16361 7.184654e-05 -5.328046
6391 6.321096e-05 -4.334495
15242 5.563264e-05 -4.639318
8865 1.843435e-04 -11.927107
8579 7.116498e-05 5.426252
14913 5.425626e-05 -4.347956
3562 6.448784e-05 -4.687370
10657 2.120647e-04 8.840164
4704 1.477732e-04 9.641861
6957 6.457036e-04 16.258211
6100 7.034869e-05 4.919066
33122 5.534242e-05 4.891111
17109 8.260835e-05 6.044322
24743 7.073779e-05 -5.636854
18041 5.200402e-05 4.455379
#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")
#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]
#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")
df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names
for (i in 1:nrow(df_gene)){
gene <- df_gene$genename[i]
ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}
df_gene <- cbind(df_gene, df_gene_pips)
#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]
#abbreviate column names
df_gene <- dplyr::rename(df_gene,
Adi_Sub="Adipose_Subcutaneous",
Brain_Cer="Brain_Cerebellum",
Adi_Vis="Adipose_Visceral_Omentum",
Blood="Whole_Blood")
df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]
#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
genename combined_pip Adi_Sub Liver Brain_Cer
340 ACVR1C 1.3654339 2.993964e-01 0.886318770 1.797188e-01
2858 CNIH4 1.0895843 4.699022e-02 0.978123429 6.447065e-02
13917 ST3GAL4 1.0646736 3.245088e-02 0.999999992 3.222270e-02
5200 GAS6 1.0612617 4.218874e-02 0.983350622 3.572238e-02
9807 PELO 1.0383286 1.632128e-01 0.726817365 1.482984e-01
10622 PSRC1 1.0381159 2.756212e-02 1.000000000 1.055375e-02
1552 C10orf88 1.0320231 2.531858e-01 0.761060584 1.777679e-02
3167 CSNK1G3 1.0286374 5.708902e-02 0.948562669 2.298568e-02
12582 RRBP1 1.0264205 1.833534e-02 0.968485003 3.960011e-02
16026 ZDHHC7 1.0240935 9.995468e-01 0.024546612 NA
13837 SPTY2D1 1.0227135 4.843885e-01 0.492229758 4.609525e-02
6671 KDSR 1.0196759 5.307733e-01 0.469094971 1.980762e-02
6352 INSIG2 1.0180317 1.179547e-02 0.999991342 6.244901e-03
6331 INHBB 1.0153465 4.881036e-02 0.966536088 NA
4524 FADS1 1.0120152 NA 0.999196764 1.281842e-02
14949 TPD52 1.0042172 9.784216e-01 0.006575001 1.922055e-02
6800 KLHDC7A 1.0002542 6.270234e-01 0.325175417 4.805538e-02
6031 HPR 1.0000370 2.207853e-05 1.000000000 1.495800e-05
27 ABCA1 0.9930025 NA 0.993002550 NA
12404 RP4-781K5.7 0.9928625 NA 0.992862525 NA
13123 SIPA1 0.9920651 5.245679e-01 0.004317698 4.631795e-01
63 ABCG8 0.9907517 NA 0.990749844 1.825931e-06
10464 PRKD2 0.9892136 3.725731e-03 0.981480482 4.007413e-03
10143 PLTP 0.9888872 7.511620e-03 0.978320061 3.055470e-03
10851 RAC1 0.9869350 1.786359e-02 0.003159394 9.659120e-01
3491 CYP2A6 0.9839514 1.646572e-02 0.967485717 NA
14693 TMEM199 0.9833807 5.612285e-01 0.422152156 NA
14875 TNKS 0.9831763 NA 0.983176328 NA
13707 SP4 0.9778408 3.602920e-02 0.929136995 1.267462e-02
2222 CCND2 0.9745918 5.587958e-01 0.190178132 2.256179e-01
13875 SRRT 0.9694747 4.078990e-01 0.538226348 2.334934e-02
13022 SGMS1 0.9629557 NA 0.788350028 1.746057e-01
643 ALLC 0.9597117 6.069313e-02 0.857572527 4.144609e-02
15225 TTC39B 0.9506565 3.369447e-02 0.916962043 NA
10136 PLPPR2 0.9502305 6.849915e-01 0.003831763 2.614072e-01
10021 PKN3 0.9472770 4.713977e-01 0.462865779 1.301349e-02
2185 CCDC92 0.9321488 9.225948e-01 0.009554054 NA
33 ABCA8 0.9297873 4.263476e-01 NA 5.034396e-01
5906 HIST1H2BH 0.9278544 NA 0.531826021 3.960284e-01
13964 STAT5B 0.9231265 2.171700e-02 0.889679211 1.173026e-02
9697 PCMTD2 0.9220798 8.346979e-01 0.076295995 1.108595e-02
2233 CCNJ 0.9214843 9.162282e-01 0.005256149 NA
15525 USP1 0.9153238 4.648211e-02 0.868841683 NA
5098 FUT2 0.9127901 NA 0.912790121 NA
3667 DDX56 0.9116258 2.049019e-02 0.879760291 1.137532e-02
4846 FCGRT 0.9009173 8.872476e-01 0.002788675 1.088105e-02
5646 GSK3B 0.8956626 NA 0.232749991 6.629127e-01
14160 SYTL1 0.8947549 3.807541e-02 0.678842676 1.778368e-01
15056 TRIM39 0.8833628 9.165758e-08 0.883362687 3.400296e-08
9498 P3H4 0.8810361 2.692311e-02 0.006004294 8.481087e-01
3871 DNAJC13 0.8779756 2.873330e-02 0.061512565 7.877298e-01
7323 LINC01184 0.8758830 4.987738e-01 0.213466023 1.636432e-01
13682 SORCS2 0.8653909 7.684955e-01 0.096895389 NA
1481 BRI3 0.8637015 4.932722e-02 0.792136481 2.223784e-02
10263 POR 0.8531609 8.433223e-01 0.002459456 7.379191e-03
3935 DOPEY2 0.8486725 4.043902e-02 0.009027480 7.992060e-01
10485 PROCA1 0.8334201 4.930295e-01 0.003121419 3.372692e-01
15556 USP47 0.8248551 1.266296e-01 0.075071800 6.231538e-01
6280 IL1RN 0.8236861 8.236861e-01 NA NA
2753 CLDN23 0.8161379 2.406376e-02 0.710143733 8.193043e-02
13065 SH3TC1 0.8141052 3.709327e-02 0.015414057 7.615979e-01
9747 PDE4C 0.8040124 7.046461e-03 0.001610059 7.953558e-01
Compare with Case 3 (Liver + Adipose) and others
results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case7")
weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db;/project2/compbio/predictdb/mashr_models/mashr_Brain_Cerebellum.db"
weight <- unlist(strsplit(weight, ";"))
#load information for all genes
gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())
for (i in 1:length(weight)){
sqlite <- RSQLite::dbDriver("SQLite")
db = RSQLite::dbConnect(sqlite, weight[i])
query <- function(...) RSQLite::dbGetQuery(db, ...)
gene_info_current <- query("select gene, genename, gene_type from extra")
RSQLite::dbDisconnect(db)
gene_info_current$weight <- weight[i]
gene_info <- rbind(gene_info, gene_info_current)
}
gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})
gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")
#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))
#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")
#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))
sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))
#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size
#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)
#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- ctwas_gene_res$type
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)
ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])
#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z
z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]
#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA
ctwas_res <- rbind(ctwas_gene_res,
ctwas_snp_res[,colnames(ctwas_gene_res)])
#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)
#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])
library(ggplot2)
library(cowplot)
load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))
#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument
#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec
#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]
#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation
#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))
title_size <- 12
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi)) +
ggtitle("Proportion Causal") +
theme_cowplot()
p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(sigma^2)) +
ggtitle("Effect Size") +
theme_cowplot()
p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(h^2[G])) +
ggtitle("PVE") +
theme_cowplot()
p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
ggtitle("Enrichment") +
theme_cowplot()
p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
plot_grid(p_pi, p_sigma2, p_enrich, p_pve)
####################
#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
SNP Liver
0.0001708048 0.0102158606
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
SNP Liver
9.70388 43.32067
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 59.81014
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
SNP Liver
8696600 11003
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
SNP Liver
0.04194840 0.01417103
#total PVE
sum(estimated_group_pve)
[1] 0.05611943
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
SNP Liver
0.7474844 0.2525156
#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
genename group region_tag susie_pip mu2 PVE z
4435 PSRC1 Liver 1_67 1.0000000 1661.18355 4.834348e-03 -41.687336
2454 ST3GAL4 Liver 11_77 1.0000000 173.08249 5.037017e-04 13.376072
12008 HPR Liver 16_38 0.9999998 163.53982 4.759307e-04 -17.962770
3721 INSIG2 Liver 2_69 0.9999958 68.51740 1.993973e-04 -8.982702
5563 ABCG8 Liver 2_27 0.9999660 313.08513 9.111040e-04 -20.293982
5991 FADS1 Liver 11_34 0.9998572 163.89639 4.769004e-04 12.926351
12687 RP4-781K5.7 Liver 1_121 0.9997364 203.40050 5.917766e-04 -15.108415
7410 ABCA1 Liver 9_53 0.9956684 70.26807 2.036072e-04 7.982017
8531 TNKS Liver 8_12 0.9915320 76.52776 2.208239e-04 11.038564
9390 GAS6 Liver 13_62 0.9888998 71.30764 2.052148e-04 -8.923688
1597 PLTP Liver 20_28 0.9885315 61.47131 1.768411e-04 -5.732491
1999 PRKD2 Liver 19_33 0.9867185 30.06125 8.632181e-05 5.072217
7040 INHBB Liver 2_70 0.9832856 73.94275 2.115902e-04 -8.518936
3755 RRBP1 Liver 20_13 0.9819053 32.42030 9.264178e-05 7.008305
5544 CNIH4 Liver 1_114 0.9787252 40.77861 1.161485e-04 6.145535
2092 SP4 Liver 7_19 0.9781344 102.23228 2.910093e-04 10.693191
6093 CSNK1G3 Liver 5_75 0.9759584 84.08145 2.388096e-04 9.116291
8865 FUT2 Liver 19_33 0.9674812 104.74356 2.949105e-04 -11.927107
11790 CYP2A6 Liver 19_28 0.9643766 31.94912 8.966560e-05 5.407028
10657 TRIM39 Liver 6_25 0.9605468 81.54422 2.279460e-04 8.840164
3247 KDSR Liver 18_35 0.9578772 24.64899 6.871148e-05 -4.526287
233 NPC1L1 Liver 7_32 0.9548612 86.93035 2.415639e-04 -10.761931
4704 DDX56 Liver 7_32 0.9494642 59.88499 1.654691e-04 9.641861
6391 TTC39B Liver 9_13 0.9414293 23.21539 6.360392e-05 -4.334495
6220 PELO Liver 5_31 0.9388155 70.72581 1.932317e-04 8.288398
1114 SRRT Liver 7_62 0.9383759 32.70763 8.931947e-05 5.424996
6778 PKN3 Liver 9_66 0.9376624 47.53713 1.297179e-04 -6.620563
3300 C10orf88 Liver 10_77 0.9360721 37.17593 1.012725e-04 -6.787850
8579 STAT5B Liver 17_25 0.9310376 30.58279 8.286375e-05 5.426252
3562 ACVR1C Liver 2_94 0.9285980 25.84902 6.985414e-05 -4.687370
6957 USP1 Liver 1_39 0.8941238 253.51088 6.596515e-04 16.258211
9062 KLHDC7A Liver 1_13 0.8301423 22.36541 5.403183e-05 4.124187
8418 POP7 Liver 7_62 0.8209688 40.45771 9.666033e-05 -5.845258
9072 SPTY2D1 Liver 11_13 0.8208409 33.49049 8.000198e-05 -5.557123
8931 CRACR2B Liver 11_1 0.8079877 21.87357 5.143334e-05 -3.989585
5415 SYTL1 Liver 1_19 0.8069655 22.30889 5.239059e-05 -3.962854
6100 ALLC Liver 2_2 0.8068337 28.07319 6.591680e-05 4.919066
#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")
#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]
#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")
df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names
for (i in 1:nrow(df_gene)){
gene <- df_gene$genename[i]
ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}
df_gene <- cbind(df_gene, df_gene_pips)
#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]
#abbreviate column names
df_gene <- dplyr::rename(df_gene,
Adi_Sub="Adipose_Subcutaneous",
Brain_Cer="Brain_Cerebellum",
Adi_Vis="Adipose_Visceral_Omentum",
Blood="Whole_Blood")
df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]
#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
genename combined_pip Liver
7022 PSRC1 1.0000000 1.0000000
9062 ST3GAL4 1.0000000 1.0000000
3979 HPR 0.9999998 0.9999998
4188 INSIG2 0.9999958 0.9999958
44 ABCG8 0.9999660 0.9999660
2990 FADS1 0.9998572 0.9998572
8053 RP4-781K5.7 0.9997364 0.9997364
17 ABCA1 0.9956684 0.9956684
9738 TNKS 0.9915320 0.9915320
3423 GAS6 0.9888998 0.9888998
6672 PLTP 0.9885315 0.9885315
6913 PRKD2 0.9867185 0.9867185
4174 INHBB 0.9832856 0.9832856
8167 RRBP1 0.9819053 0.9819053
1874 CNIH4 0.9787252 0.9787252
8916 SP4 0.9781344 0.9781344
2086 CSNK1G3 0.9759584 0.9759584
3368 FUT2 0.9674812 0.9674812
2263 CYP2A6 0.9643766 0.9643766
9870 TRIM39 0.9605468 0.9605468
4395 KDSR 0.9578772 0.9578772
5972 NPC1L1 0.9548612 0.9548612
2397 DDX56 0.9494642 0.9494642
9996 TTC39B 0.9414293 0.9414293
6425 PELO 0.9388155 0.9388155
9035 SRRT 0.9383759 0.9383759
6583 PKN3 0.9376624 0.9376624
1009 C10orf88 0.9360721 0.9360721
9091 STAT5B 0.9310376 0.9310376
195 ACVR1C 0.9285980 0.9285980
10193 USP1 0.8941238 0.8941238
4483 KLHDC7A 0.8301423 0.8301423
6761 POP7 0.8209688 0.8209688
9009 SPTY2D1 0.8208409 0.8208409
2032 CRACR2B 0.8079877 0.8079877
9235 SYTL1 0.8069655 0.8069655
405 ALLC 0.8068337 0.8068337
#store gene list
liver_genes <- df_gene$genename[df_gene$combined_pip>0.8]
Compare with Case 3 (Liver + Adipose) and others
results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case8")
weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db;/project2/compbio/predictdb/mashr_models/mashr_Brain_Cerebellum.db"
weight <- unlist(strsplit(weight, ";"))
#load information for all genes
gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())
for (i in 1:length(weight)){
sqlite <- RSQLite::dbDriver("SQLite")
db = RSQLite::dbConnect(sqlite, weight[i])
query <- function(...) RSQLite::dbGetQuery(db, ...)
gene_info_current <- query("select gene, genename, gene_type from extra")
RSQLite::dbDisconnect(db)
gene_info_current$weight <- weight[i]
gene_info <- rbind(gene_info, gene_info_current)
}
gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})
gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")
#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))
#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")
#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))
sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))
#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size
#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)
#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- ctwas_gene_res$type
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)
ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])
#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z
z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]
#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA
ctwas_res <- rbind(ctwas_gene_res,
ctwas_snp_res[,colnames(ctwas_gene_res)])
#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)
#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])
library(ggplot2)
library(cowplot)
load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))
#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument
#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec
#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]
#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation
#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))
title_size <- 12
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi)) +
ggtitle("Proportion Causal") +
theme_cowplot()
p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(sigma^2)) +
ggtitle("Effect Size") +
theme_cowplot()
p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(h^2[G])) +
ggtitle("PVE") +
theme_cowplot()
p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
ggtitle("Enrichment") +
theme_cowplot()
p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
plot_grid(p_pi, p_sigma2, p_enrich, p_pve)
####################
#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
SNP Adipose_Subcutaneous
5.730722e-05 2.606610e-02
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
SNP Adipose_Subcutaneous
42.910260 6.890161
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 454.8485
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
SNP Adipose_Subcutaneous
8696600 13082
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
SNP Adipose_Subcutaneous
0.06223580 0.00683754
#total PVE
sum(estimated_group_pve)
[1] 0.06907334
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
SNP Adipose_Subcutaneous
0.90101044 0.09898956
#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
genename group region_tag susie_pip mu2
19759 ZDHHC7 Adipose_Subcutaneous 16_49 0.9997044 27.97933
13717 TPD52 Adipose_Subcutaneous 8_57 0.9915149 21.54876
17109 POR Adipose_Subcutaneous 7_48 0.9859590 32.16405
18637 ABCA8 Adipose_Subcutaneous 17_39 0.9844949 27.84557
25057 SIPA1 Adipose_Subcutaneous 11_36 0.9788123 22.77446
15027 PLPPR2 Adipose_Subcutaneous 19_10 0.9782851 27.69113
16444 C10orf88 Adipose_Subcutaneous 10_77 0.9769343 30.83296
16388 KDSR Adipose_Subcutaneous 18_35 0.9760929 20.26486
20337 PKN3 Adipose_Subcutaneous 9_66 0.9739895 40.44418
18314 FAM117B Adipose_Subcutaneous 2_120 0.9703680 37.35650
15242 CCNJ Adipose_Subcutaneous 10_61 0.9687017 21.08333
24743 PCMTD2 Adipose_Subcutaneous 20_38 0.9580519 29.17029
22969 KLHDC7A Adipose_Subcutaneous 1_13 0.9563445 18.30601
16361 CCDC92 Adipose_Subcutaneous 12_75 0.9554636 25.94733
14913 FCGRT Adipose_Subcutaneous 19_34 0.9473974 20.07001
16347 CCND2 Adipose_Subcutaneous 12_4 0.9455314 19.20330
13993 SRRT Adipose_Subcutaneous 7_62 0.9433615 24.14309
13346 SPHK2 Adipose_Subcutaneous 19_33 0.9430979 36.78330
18737 NTN5 Adipose_Subcutaneous 19_33 0.9423048 52.10799
18041 IL1RN Adipose_Subcutaneous 2_67 0.9401773 20.48820
27265 AC007950.2 Adipose_Subcutaneous 15_29 0.9399429 30.09055
22986 SPTY2D1 Adipose_Subcutaneous 11_13 0.9336852 27.66149
14240 CWF19L1 Adipose_Subcutaneous 10_64 0.9329765 28.33600
20155 TMED4 Adipose_Subcutaneous 7_33 0.9301009 36.31431
21459 PROCA1 Adipose_Subcutaneous 17_17 0.9272370 25.69084
12875 MPND Adipose_Subcutaneous 19_5 0.9226804 19.81239
16736 ACVR1C Adipose_Subcutaneous 2_94 0.9093778 19.02619
22062 KCNK3 Adipose_Subcutaneous 2_16 0.9093320 20.13920
14273 SCD Adipose_Subcutaneous 10_64 0.9020432 18.43963
21618 PDHB Adipose_Subcutaneous 3_40 0.8906162 22.83036
23583 SORCS2 Adipose_Subcutaneous 4_8 0.8776838 21.67586
24579 C2CD4A Adipose_Subcutaneous 15_28 0.8755964 21.20561
21750 PCSK9 Adipose_Subcutaneous 1_34 0.8711964 103.66507
20426 ALDH16A1 Adipose_Subcutaneous 19_34 0.8646920 19.36271
25972 LINC01184 Adipose_Subcutaneous 5_78 0.8646535 17.88900
21711 VAMP5 Adipose_Subcutaneous 2_54 0.8518568 21.44792
18321 PARP9 Adipose_Subcutaneous 3_76 0.8506879 22.09649
19793 UBASH3B Adipose_Subcutaneous 11_74 0.8500016 23.50538
14808 CTSH Adipose_Subcutaneous 15_37 0.8402426 17.01546
16496 ELF1 Adipose_Subcutaneous 13_16 0.8384040 17.75011
16438 GPAM Adipose_Subcutaneous 10_70 0.8292011 19.07708
19681 PELO Adipose_Subcutaneous 5_30 0.8089402 59.96489
PVE z
19759 8.140091e-05 -5.184802
13717 6.217873e-05 -4.684363
17109 9.228900e-05 6.044322
18637 7.977923e-05 4.800447
25057 6.487358e-05 -5.096493
15027 7.883634e-05 3.965665
16444 8.765989e-05 -6.762952
16388 5.756454e-05 -4.448194
20337 1.146385e-04 -6.780039
18314 1.054928e-04 7.852653
15242 5.943600e-05 -4.639318
24743 8.132987e-05 -5.636854
22969 5.094814e-05 4.124187
16361 7.214846e-05 -5.328046
14913 5.533501e-05 -4.347956
16347 5.284112e-05 -4.190519
13993 6.628135e-05 4.556557
13346 1.009550e-04 -8.721460
18737 1.428947e-04 11.132252
18041 5.605752e-05 4.455379
27265 8.230987e-05 5.555780
22986 7.516167e-05 -5.557123
14240 7.693599e-05 5.837199
20155 9.829425e-05 7.696786
21459 6.932493e-05 5.543282
12875 5.319961e-05 -4.366911
16736 5.035196e-05 -4.185879
22062 5.329482e-05 -4.772296
14273 4.840606e-05 -4.541468
21618 5.917301e-05 3.396693
23583 5.536492e-05 -4.572758
24579 5.403500e-05 4.535165
21750 2.628263e-04 17.210869
20426 4.872455e-05 -4.215260
25972 4.501410e-05 -3.918171
21711 5.317065e-05 4.547456
18321 5.470335e-05 3.744644
19793 5.814432e-05 4.906621
14808 4.160721e-05 3.616369
16496 4.330866e-05 -3.876887
16438 4.603542e-05 4.000718
19681 1.411672e-04 8.522224
#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")
#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]
#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")
df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names
for (i in 1:nrow(df_gene)){
gene <- df_gene$genename[i]
ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}
df_gene <- cbind(df_gene, df_gene_pips)
#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]
#abbreviate column names
df_gene <- dplyr::rename(df_gene,
Adi_Sub="Adipose_Subcutaneous",
Brain_Cer="Brain_Cerebellum",
Adi_Vis="Adipose_Visceral_Omentum",
Blood="Whole_Blood")
df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]
#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
genename combined_pip Adi_Sub
12570 ZDHHC7 0.9997044 0.9997044
11702 TPD52 0.9915149 0.9915149
8047 POR 0.9859590 0.9859590
28 ABCA8 0.9844949 0.9844949
10219 SIPA1 0.9788123 0.9788123
7944 PLPPR2 0.9782851 0.9782851
1234 C10orf88 0.9769343 0.9769343
5232 KDSR 0.9760929 0.9760929
7851 PKN3 0.9739895 0.9739895
3570 FAM117B 0.9703680 0.9703680
1770 CCNJ 0.9687017 0.9687017
7581 PCMTD2 0.9580519 0.9580519
5333 KLHDC7A 0.9563445 0.9563445
1732 CCDC92 0.9554636 0.9554636
3812 FCGRT 0.9473974 0.9473974
1761 CCND2 0.9455314 0.9455314
10816 SRRT 0.9433615 0.9433615
10739 SPHK2 0.9430979 0.9430979
7214 NTN5 0.9423048 0.9423048
4929 IL1RN 0.9401773 0.9401773
113 AC007950.2 0.9399429 0.9399429
10785 SPTY2D1 0.9336852 0.9336852
2671 CWF19L1 0.9329765 0.9329765
11420 TMED4 0.9301009 0.9301009
8227 PROCA1 0.9272370 0.9272370
6528 MPND 0.9226804 0.9226804
248 ACVR1C 0.9093778 0.9093778
5183 KCNK3 0.9093320 0.9093320
9923 SCD 0.9020432 0.9020432
7635 PDHB 0.8906162 0.8906162
10667 SORCS2 0.8776838 0.8776838
1382 C2CD4A 0.8755964 0.8755964
7593 PCSK9 0.8711964 0.8711964
453 ALDH16A1 0.8646920 0.8646920
5720 LINC01184 0.8646535 0.8646535
12231 VAMP5 0.8518568 0.8518568
7499 PARP9 0.8506879 0.8506879
12041 UBASH3B 0.8500016 0.8500016
2646 CTSH 0.8402426 0.8402426
3310 ELF1 0.8384040 0.8384040
4299 GPAM 0.8292011 0.8292011
7675 PELO 0.8089402 0.8089402
#store gene list
adipose_genes <- df_gene$genename[df_gene$combined_pip>0.8]
Compare with Case 6 (Liver + Adipose + Cerebellum)
results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case9")
weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db;/project2/compbio/predictdb/mashr_models/mashr_Brain_Cerebellum.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Visceral_Omentum.db"
weight <- unlist(strsplit(weight, ";"))
#load information for all genes
gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())
for (i in 1:length(weight)){
sqlite <- RSQLite::dbDriver("SQLite")
db = RSQLite::dbConnect(sqlite, weight[i])
query <- function(...) RSQLite::dbGetQuery(db, ...)
gene_info_current <- query("select gene, genename, gene_type from extra")
RSQLite::dbDisconnect(db)
gene_info_current$weight <- weight[i]
gene_info <- rbind(gene_info, gene_info_current)
}
gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})
gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")
#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))
#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")
#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))
sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))
#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size
#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)
#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)
ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])
#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z
z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]
#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA
ctwas_res <- rbind(ctwas_gene_res,
ctwas_snp_res[,colnames(ctwas_gene_res)])
#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)
#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])
library(ggplot2)
library(cowplot)
load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))
#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument
#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec
#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]
#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation
#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))
title_size <- 12
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi)) +
ggtitle("Proportion Causal") +
theme_cowplot()
p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(sigma^2)) +
ggtitle("Effect Size") +
theme_cowplot()
p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(h^2[G])) +
ggtitle("PVE") +
theme_cowplot()
p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
ggtitle("Enrichment") +
theme_cowplot()
p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
plot_grid(p_pi, p_sigma2, p_enrich, p_pve)
####################
#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
SNP Liver Adipose_Subcutaneous
0.0000671549 0.0074368998 0.0102148490
Brain_Cerebellum Adipose_Visceral_Omentum
0.0090081071 0.0071611334
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
SNP Liver Adipose_Subcutaneous
17.917578 51.128138 6.311241
Brain_Cerebellum Adipose_Visceral_Omentum
7.283353 13.501294
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 67.5033
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
SNP Liver Adipose_Subcutaneous
8696600 11003 13082
Brain_Cerebellum Adipose_Visceral_Omentum
12270 12921
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
SNP Liver Adipose_Subcutaneous
0.030452770 0.012175402 0.002454376
Brain_Cerebellum Adipose_Visceral_Omentum
0.002342771 0.003635579
#total PVE
sum(estimated_group_pve)
[1] 0.0510609
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
SNP Liver Adipose_Subcutaneous
0.59640098 0.23844865 0.04806763
Brain_Cerebellum Adipose_Visceral_Omentum
0.04588189 0.07120084
#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
genename group region_tag susie_pip mu2
4435 PSRC1 Liver 1_67 1.0000000 1672.65008
2454 ST3GAL4 Liver 11_77 1.0000000 170.81971
12008 HPR Liver 16_38 0.9999909 156.17533
3721 INSIG2 Liver 2_70 0.9999886 67.80738
19759 ZDHHC7 Adipose_Subcutaneous 16_49 0.9995073 28.90070
5991 FADS1 Liver 11_34 0.9993457 162.88551
12687 RP4-781K5.7 Liver 1_121 0.9944728 202.33237
7410 ABCA1 Liver 9_53 0.9936742 70.79401
5563 ABCG8 Liver 2_27 0.9932246 309.29203
9390 GAS6 Liver 13_62 0.9857220 74.67277
8531 TNKS Liver 8_12 0.9849278 78.63394
47974 GIGYF1 Adipose_Visceral_Omentum 7_62 0.9727463 35.85447
1597 PLTP Liver 20_28 0.9716597 55.11327
7040 INHBB Liver 2_70 0.9698998 74.75276
32453 RAC1 Brain_Cerebellum 7_10 0.9696209 30.53796
3755 RRBP1 Liver 20_13 0.9681886 32.42776
41704 NPC1L1 Adipose_Visceral_Omentum 7_33 0.9609917 93.62015
6093 CSNK1G3 Liver 5_75 0.9545942 84.41491
2092 SP4 Liver 7_19 0.9383471 102.44407
11790 CYP2A6 Liver 19_30 0.9362228 33.29962
8865 FUT2 Liver 19_33 0.9218977 69.39850
47687 ABCG8 Adipose_Visceral_Omentum 2_27 0.9128748 43.65045
6391 TTC39B Liver 9_13 0.9123067 23.73563
5544 CNIH4 Liver 1_114 0.9025894 42.50363
10657 TRIM39 Liver 6_26 0.8992012 82.46535
15242 CCNJ Adipose_Subcutaneous 10_61 0.8898689 21.09530
3562 ACVR1C Liver 2_94 0.8890967 25.11073
44743 FN1 Adipose_Visceral_Omentum 2_127 0.8861711 22.46646
8579 STAT5B Liver 17_25 0.8832937 27.31621
6957 USP1 Liver 1_39 0.8734886 255.83120
49316 ACP6 Adipose_Visceral_Omentum 1_73 0.8731497 21.79158
6100 ALLC Liver 2_2 0.8598451 28.04896
33122 P3H4 Brain_Cerebellum 17_25 0.8440652 22.47742
14913 FCGRT Adipose_Subcutaneous 19_34 0.8291311 21.96658
33158 DOPEY2 Brain_Cerebellum 21_17 0.8284235 23.30226
29649 PDE4C Brain_Cerebellum 19_14 0.8160274 40.15830
32727 DNAJC13 Brain_Cerebellum 3_82 0.8094458 46.19393
PVE z
4435 4.867718e-03 -41.687336
2454 4.971166e-04 13.376072
12008 4.544947e-04 -17.962770
3721 1.973296e-04 -8.982702
19759 8.406488e-05 -5.184802
5991 4.737165e-04 12.926351
12687 5.855697e-04 -15.108415
7410 2.047203e-04 7.982017
5563 8.939979e-04 -20.293982
9390 2.142087e-04 -8.923688
8531 2.253901e-04 11.038564
47974 1.014993e-04 6.632402
1597 1.558442e-04 -5.732491
7040 2.109961e-04 -8.518936
32453 8.617122e-05 5.713928
3755 9.136866e-05 7.008305
41704 2.618239e-04 11.631021
6093 2.345083e-04 9.116291
2092 2.797503e-04 10.693191
11790 9.072747e-05 5.407028
8865 1.861886e-04 -11.927107
47687 1.159632e-04 6.510103
6391 6.301762e-05 -4.334495
5544 1.116443e-04 6.145535
10657 2.157986e-04 8.840164
15242 5.463011e-05 -4.639318
3562 6.497237e-05 -4.687370
44743 5.793919e-05 -4.446065
8579 7.021759e-05 5.426252
6957 6.503259e-04 16.258211
49316 5.537297e-05 4.214320
6100 7.018711e-05 4.919066
33122 5.521319e-05 4.891111
14913 5.300367e-05 -4.347956
33158 5.617857e-05 -4.766662
29649 9.536748e-05 -6.633160
32727 1.088161e-04 -6.853813
#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")
#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]
#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")
df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names
for (i in 1:nrow(df_gene)){
gene <- df_gene$genename[i]
ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}
df_gene <- cbind(df_gene, df_gene_pips)
#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]
#abbreviate column names
df_gene <- dplyr::rename(df_gene,
Adi_Sub="Adipose_Subcutaneous",
Brain_Cer="Brain_Cerebellum",
Adi_Vis="Adipose_Visceral_Omentum",
Blood="Whole_Blood")
df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]
#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
genename combined_pip Adi_Sub Liver Brain_Cer
65 ABCG8 1.9061009 NA 0.993224587 1.491732e-06
9941 PARP9 1.5199597 3.673594e-01 0.006644270 3.625511e-01
349 ACVR1C 1.3608776 2.420229e-01 0.889096730 2.144862e-01
5380 GAS6 1.1926406 2.815637e-02 0.985722025 3.663521e-02
28 ABCA1 1.1778517 NA 0.993674247 NA
2956 CNIH4 1.0986149 3.354771e-02 0.902589378 7.148730e-02
14430 ST3GAL4 1.0705782 2.278493e-02 0.999999994 3.452498e-02
10161 PELO 1.0561341 1.111464e-01 0.371223486 1.404643e-01
16602 ZDHHC7 1.0425984 9.995073e-01 0.024285838 NA
11000 PSRC1 1.0425071 1.921100e-02 1.000000000 1.150583e-02
1603 C10orf88 1.0411604 1.474607e-01 0.515954788 1.913846e-02
3275 CSNK1G3 1.0372897 4.142693e-02 0.954594237 2.596340e-02
14350 SPTY2D1 1.0329040 4.095891e-01 0.556409140 4.984606e-02
6911 KDSR 1.0291338 4.187041e-01 0.494514349 2.286481e-02
15499 TPD52 1.0278942 5.262092e-01 0.005823745 1.886903e-02
13049 RRBP1 1.0264088 1.351937e-02 0.968188634 4.470083e-02
6247 HPR 1.0235999 1.119674e-02 0.999990926 6.520217e-03
4675 FADS1 1.0208940 NA 0.999345723 1.418185e-02
6578 INSIG2 1.0078010 2.610112e-03 0.999988555 2.697495e-03
7049 KLHDC7A 1.0043632 5.531809e-01 0.380276986 5.582087e-02
10835 PRKD2 1.0021744 2.717969e-03 0.217759117 2.618723e-03
5486 GIGYF1 1.0014547 1.352322e-02 0.003911509 1.127370e-02
9441 NPC1L1 0.9999751 1.484434e-02 0.013401014 1.073800e-02
12863 RP4-781K5.7 0.9944728 NA 0.994472818 NA
5842 GSK3B 0.9932634 NA 0.101176816 3.152329e-01
11235 RAC1 0.9916447 1.282684e-02 0.002968970 9.696209e-01
10506 PLTP 0.9897804 5.564772e-03 0.971659736 3.385850e-03
13610 SIPA1 0.9893442 4.041296e-01 0.003803708 5.760658e-01
13505 SGMS1 0.9880852 NA 0.795825679 1.922596e-01
2296 CCND2 0.9864169 3.775869e-01 0.169555913 2.309539e-01
14214 SP4 0.9861696 2.546881e-02 0.938347057 1.382209e-02
15423 TNKS 0.9849278 NA 0.984927835 NA
6557 INHBB 0.9758839 3.929670e-03 0.969899842 NA
3521 CTSH 0.9753641 2.032387e-01 0.160036482 8.245873e-02
660 ALLC 0.9718715 4.426554e-02 0.859845113 4.640541e-02
1528 BRI3 0.9698437 3.343538e-02 0.711570266 2.349788e-02
6110 HIST1H2BH 0.9627432 NA 0.547507725 3.988183e-01
15781 TTC39B 0.9576792 2.600845e-02 0.912306652 NA
302 ACP6 0.9558090 2.558880e-02 0.006626946 5.044356e-02
16125 USP53 0.9489626 2.424768e-02 0.153846127 1.466065e-02
10382 PKN3 0.9403224 3.721910e-01 0.550368578 1.776282e-02
3613 CYP2A6 0.9362417 1.889543e-05 0.936222776 NA
6848 KCNK3 0.9303282 4.926516e-01 NA NA
34 ABCA8 0.9237564 3.102420e-01 NA 6.134858e-01
5274 FUT2 0.9218977 NA 0.921897709 NA
14479 STAT5B 0.9156545 1.374625e-02 0.883293732 1.149433e-02
7597 LINC01184 0.9107463 3.193425e-01 0.179252250 1.577378e-01
16088 USP1 0.9068683 3.337975e-02 0.873488573 NA
2307 CCNJ 0.9043155 8.898689e-01 0.005104675 NA
4006 DNAJC13 0.9010062 2.127420e-02 0.059582249 8.094458e-01
15609 TRIM39 0.8992013 7.523058e-08 0.899201178 4.319550e-08
2256 CCDC92 0.8968715 7.858781e-01 0.011186093 NA
5269 FURIN 0.8906402 2.482145e-02 0.375893815 4.250627e-02
8902 MTSS1L 0.8873198 3.202082e-02 0.007347410 1.293683e-01
5158 FN1 0.8861711 NA NA NA
4839 FAM3D 0.8861558 NA 0.393523642 NA
4073 DOPEY2 0.8857849 3.062539e-02 0.008857757 8.284235e-01
14680 SYTL1 0.8772921 2.788713e-02 0.643217735 1.968976e-01
9841 P3H4 0.8772168 1.643423e-02 0.005113679 8.440652e-01
10856 PROCA1 0.8764166 2.742031e-01 0.002574531 2.880409e-01
13237 SCD 0.8743632 4.847417e-01 NA NA
5009 FCGRT 0.8592828 8.291311e-01 0.002715621 1.267679e-02
152 AC007950.2 0.8588843 3.621080e-01 0.003701136 NA
5630 GOLGA3 0.8577600 2.919996e-01 0.005583545 1.816214e-02
9031 N4BP2L1 0.8559432 1.144121e-02 0.004704934 7.841271e-01
14189 SORCS2 0.8552753 6.751668e-01 0.111231131 NA
10045 PCMTD2 0.8492358 7.182003e-01 0.098915897 1.370625e-02
13550 SH3TC1 0.8489500 2.676115e-02 0.014462946 7.892401e-01
10626 POP7 0.8489427 NA 0.456644954 NA
16120 USP47 0.8407391 8.959569e-02 0.069198629 6.682865e-01
14833 TCEA2 0.8367755 7.453639e-02 0.004625772 1.440040e-02
5666 GPAM 0.8281919 3.512448e-01 NA 2.653485e-04
10440 PLEC 0.8260408 2.201307e-02 0.005521095 9.479757e-03
10096 PDE4C 0.8216644 4.298341e-03 0.001338660 8.160274e-01
3517 CTSB 0.8194053 2.294538e-02 NA 2.855054e-02
2846 CLDN23 0.8166590 1.730850e-02 0.706511417 9.283910e-02
16601 ZDHHC6 0.8123386 3.891946e-01 0.001541926 4.149941e-01
4606 EVI5 0.8083431 2.857285e-02 0.014465622 2.701293e-02
1099 ASAP3 0.8067506 1.718864e-02 0.733717238 1.336300e-02
16949 ZNF575 0.8012029 1.917327e-01 0.003758440 1.803099e-01
Adi_Vis
65 9.128748e-01
9941 7.834050e-01
349 1.527184e-02
5380 1.421270e-01
28 1.841775e-01
2956 9.099053e-02
14430 1.326825e-02
10161 4.333000e-01
16602 1.880523e-02
11000 1.179026e-02
1603 3.586065e-01
3275 1.530517e-02
14350 1.705970e-02
6911 9.305051e-02
15499 4.769922e-01
13049 NA
6247 5.891996e-03
4675 7.366414e-03
6578 2.504801e-03
7049 1.508450e-02
10835 7.790786e-01
5486 9.727463e-01
9441 9.609917e-01
12863 NA
5842 5.768536e-01
11235 6.228072e-03
10506 9.170046e-03
13610 5.345074e-03
13505 NA
2296 2.083202e-01
14214 8.531631e-03
15423 NA
6557 2.054340e-03
3521 5.296302e-01
660 2.135544e-02
1528 2.013402e-01
6110 1.641718e-02
15781 1.936411e-02
302 8.731497e-01
16125 7.562081e-01
10382 NA
3613 NA
6848 4.376766e-01
34 2.861606e-05
5274 NA
14479 7.120199e-03
7597 2.544138e-01
16088 NA
2307 9.341877e-03
4006 1.070393e-02
15609 3.913812e-08
2256 9.980738e-02
5269 4.474187e-01
8902 7.185833e-01
5158 8.861711e-01
4839 4.926321e-01
4073 1.787821e-02
14680 9.289616e-03
9841 1.160370e-02
10856 3.115980e-01
13237 3.896215e-01
5009 1.475931e-02
152 4.930752e-01
5630 5.420147e-01
9031 5.566995e-02
14189 6.887743e-02
10045 1.841336e-02
13550 1.848578e-02
10626 3.922977e-01
16120 1.365825e-02
14833 7.432129e-01
5666 4.766818e-01
10440 7.890269e-01
10096 NA
3517 7.679094e-01
2846 NA
16601 6.608035e-03
4606 7.382917e-01
1099 4.248170e-02
16949 4.254019e-01
Compare with Case 9 (Liver + Adipose x2 + Cerebellum)
results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case10")
weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db;/project2/compbio/predictdb/mashr_models/mashr_Brain_Cerebellum.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Visceral_Omentum.db;/project2/compbio/predictdb/mashr_models/mashr_Whole_Blood.db"
weight <- unlist(strsplit(weight, ";"))
#load information for all genes
gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())
for (i in 1:length(weight)){
sqlite <- RSQLite::dbDriver("SQLite")
db = RSQLite::dbConnect(sqlite, weight[i])
query <- function(...) RSQLite::dbGetQuery(db, ...)
gene_info_current <- query("select gene, genename, gene_type from extra")
RSQLite::dbDisconnect(db)
gene_info_current$weight <- weight[i]
gene_info <- rbind(gene_info, gene_info_current)
}
gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})
gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")
#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))
#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")
#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))
sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))
#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size
#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)
#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)
ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])
#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z
z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]
#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA
ctwas_res <- rbind(ctwas_gene_res,
ctwas_snp_res[,colnames(ctwas_gene_res)])
#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)
#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])
library(ggplot2)
library(cowplot)
load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))
#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument
#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec
#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]
#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation
#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))
title_size <- 12
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi)) +
ggtitle("Proportion Causal") +
theme_cowplot()
p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(sigma^2)) +
ggtitle("Effect Size") +
theme_cowplot()
p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(h^2[G])) +
ggtitle("PVE") +
theme_cowplot()
p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
ggtitle("Enrichment") +
theme_cowplot()
p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
plot_grid(p_pi, p_sigma2, p_enrich, p_pve)
####################
#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
SNP Liver Adipose_Subcutaneous
5.501215e-05 1.690736e-02 8.072674e-03
Brain_Cerebellum Adipose_Visceral_Omentum Whole_Blood
1.103061e-03 1.030172e-02 1.313834e-03
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
SNP Liver Adipose_Subcutaneous
22.663381 10.511923 6.886394
Brain_Cerebellum Adipose_Visceral_Omentum Whole_Blood
137.309362 8.513732 74.698652
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 22.52427
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
SNP Liver Adipose_Subcutaneous
8696600 11003 13082
Brain_Cerebellum Adipose_Visceral_Omentum Whole_Blood
12270 12921 11198
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
SNP Liver Adipose_Subcutaneous
0.031553903 0.005691010 0.002116429
Brain_Cerebellum Adipose_Visceral_Omentum Whole_Blood
0.005408345 0.003297967 0.003198262
#total PVE
sum(estimated_group_pve)
[1] 0.05126592
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
SNP Liver Adipose_Subcutaneous
0.61549477 0.11100963 0.04128336
Brain_Cerebellum Adipose_Visceral_Omentum Whole_Blood
0.10549591 0.06433059 0.06238574
#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
genename group region_tag susie_pip mu2
64244 PCSK9 Whole_Blood 1_34 1.0000000 311.12261
60642 PSRC1 Whole_Blood 1_67 1.0000000 1700.14036
33017 TXNL4B Brain_Cerebellum 16_38 1.0000000 425.95505
2454 ST3GAL4 Liver 11_77 0.9999975 149.71795
60229 LDLR Whole_Blood 19_11 0.9999936 641.01378
3721 INSIG2 Liver 2_70 0.9999635 59.90457
19759 ZDHHC7 Adipose_Subcutaneous 16_49 0.9996663 29.93651
7410 ABCA1 Liver 9_53 0.9870449 62.73592
3755 RRBP1 Liver 20_13 0.9805280 29.59484
8531 TNKS Liver 8_12 0.9710705 67.34376
6391 TTC39B Liver 9_13 0.9672671 21.02116
61917 TIMD4 Whole_Blood 5_92 0.9639997 185.96921
47974 GIGYF1 Adipose_Visceral_Omentum 7_62 0.9631938 32.78921
11790 CYP2A6 Liver 19_30 0.9629681 29.97480
9390 GAS6 Liver 13_62 0.9616146 65.27598
3562 ACVR1C Liver 2_94 0.9598606 22.52879
8579 STAT5B Liver 17_25 0.9439073 27.09623
6100 ALLC Liver 2_2 0.9423566 24.93774
7040 INHBB Liver 2_70 0.9248716 66.65648
1597 PLTP Liver 20_28 0.9181188 36.27175
8865 FUT2 Liver 19_33 0.9156446 62.13345
44743 FN1 Adipose_Visceral_Omentum 2_127 0.9096370 21.02782
11726 CLDN23 Liver 8_11 0.9073017 22.55559
5415 SYTL1 Liver 1_19 0.8934568 19.72085
41704 NPC1L1 Adipose_Visceral_Omentum 7_33 0.8922234 88.29280
63167 USP1 Whole_Blood 1_39 0.8709703 259.24171
1144 ASAP3 Liver 1_16 0.8649959 30.40558
10519 SGMS1 Liver 10_33 0.8637259 21.87209
15242 CCNJ Adipose_Subcutaneous 10_61 0.8611409 21.61776
10495 PRMT6 Liver 1_67 0.8568345 28.86241
6093 CSNK1G3 Liver 5_75 0.8562573 73.22517
7394 TP53INP1 Liver 8_66 0.8305806 19.57806
14913 FCGRT Adipose_Subcutaneous 19_34 0.8303016 22.57781
2042 BCAT2 Liver 19_34 0.8243198 23.85479
39886 HLA-DOB Brain_Cerebellum 6_27 0.8173763 68.07678
32453 RAC1 Brain_Cerebellum 7_10 0.8137966 37.04397
4682 TBC1D4 Liver 13_37 0.8112086 19.82561
9966 LRRK2 Liver 12_25 0.8096716 24.60858
7355 BRI3 Liver 7_60 0.8063315 26.52881
PVE z
64244 9.054238e-04 23.237813
60642 4.947720e-03 -41.793474
33017 1.239607e-03 18.129274
2454 4.357056e-04 13.376072
60229 1.865456e-03 -24.707322
3721 1.743269e-04 -8.982702
19759 8.709164e-05 -5.184802
7410 1.802078e-04 7.982017
3755 8.444936e-05 7.008305
8531 1.903130e-04 11.038564
6391 5.917298e-05 -4.334495
61917 5.217209e-04 13.882363
47974 9.191044e-05 6.632402
11790 8.400178e-05 5.407028
9390 1.826732e-04 -8.923688
3562 6.293126e-05 -4.687370
8579 7.443180e-05 5.426252
6100 6.839000e-05 4.919066
7040 1.794090e-04 -8.518936
1597 9.691427e-05 -5.732491
8865 1.655666e-04 -11.927107
44743 5.566505e-05 -4.446065
11726 5.955609e-05 4.720010
5415 5.127662e-05 -3.962854
41704 2.292552e-04 11.631021
63167 6.570955e-04 16.258211
1144 7.653986e-05 5.283225
10519 5.497769e-05 4.873968
15242 5.417579e-05 -4.639318
10495 7.196973e-05 -5.323721
6093 1.824673e-04 9.116291
7394 4.732294e-05 4.038448
14913 5.455543e-05 -4.347956
2042 5.722576e-05 4.796398
39886 1.619352e-04 -8.430761
32453 8.773111e-05 5.713928
4682 4.680363e-05 3.843964
9966 5.798501e-05 4.792808
7355 6.225177e-05 -5.140136
#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")
#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]
#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")
df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names
for (i in 1:nrow(df_gene)){
gene <- df_gene$genename[i]
ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}
df_gene <- cbind(df_gene, df_gene_pips)
#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]
#abbreviate column names
df_gene <- dplyr::rename(df_gene,
Adi_Sub="Adipose_Subcutaneous",
Brain_Cer="Brain_Cerebellum",
Adi_Vis="Adipose_Visceral_Omentum",
Blood="Whole_Blood")
df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]
#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
genename combined_pip Adi_Sub Liver Brain_Cer
10226 PARP9 1.3324158 4.813814e-01 0.030652000 5.868925e-02
360 ACVR1C 1.3152931 2.859044e-01 0.959860638 1.664766e-02
29 ABCA1 1.2348660 NA 0.987044942 NA
5523 GAS6 1.1932815 3.144969e-02 0.961614606 1.945107e-03
67 ABCG8 1.1409033 NA 0.000747084 4.999848e-01
17052 ZDHHC7 1.1403782 9.996663e-01 0.104001437 NA
312 ACP6 1.0548100 2.853345e-02 0.050654731 2.134487e-03
10453 PELO 1.0541969 1.643865e-01 0.240314076 2.824282e-01
3025 CNIH4 1.0498530 2.457995e-02 0.755803524 2.122643e-03
11315 PSRC1 1.0497817 1.404445e-02 0.016793336 3.862892e-04
1641 C10orf88 1.0496902 1.410526e-01 0.577792876 6.247614e-04
14832 ST3GAL4 1.0459086 1.703228e-02 0.999997549 3.603977e-03
15916 TPD52 1.0320897 4.306518e-01 0.026252925 5.364300e-04
14751 SPTY2D1 1.0302435 2.577036e-01 0.738715731 1.783712e-03
7105 KDSR 1.0293851 2.078597e-01 0.691126402 6.965452e-04
3609 CTSH 1.0270150 1.341835e-01 0.374075249 3.447077e-03
4798 FADS1 1.0230721 NA 0.564858208 3.209156e-01
16206 TTC39B 1.0146168 1.974117e-02 0.967267087 NA
677 ALLC 1.0137337 3.314714e-02 0.942356586 1.686409e-03
6758 INSIG2 1.0090069 2.059694e-03 0.999963525 9.252070e-05
1236 ATP1B2 1.0077960 1.597947e-02 0.521483127 1.150982e-03
16301 TXNL4B 1.0018875 1.702622e-03 NA 1.000000e+00
11147 PRKD2 1.0003287 1.514505e-03 0.349413728 6.198809e-05
10351 PCSK9 1.0000876 4.038979e-07 NA NA
7481 LDLR 0.9999936 NA NA NA
14881 STAT5B 0.9992801 1.067772e-02 0.943907328 3.601696e-04
15447 TIMD4 0.9954486 1.540897e-02 NA NA
6000 GSK3B 0.9954397 NA 0.195862695 1.617626e-01
2349 CCND2 0.9932308 2.873757e-01 0.434477580 1.561283e-02
13424 RRBP1 0.9930750 1.012243e-02 0.980527982 1.882324e-03
5635 GIGYF1 0.9918420 1.000529e-02 0.017710305 3.587551e-04
16564 USP53 0.9827581 1.919691e-02 0.299221485 4.344066e-04
1565 BRI3 0.9738669 2.389125e-02 0.806331540 7.186130e-04
10683 PKN3 0.9734596 1.915571e-01 0.329117124 3.347155e-04
15838 TNKS 0.9710705 NA 0.971070515 NA
7248 KLHDC7A 0.9709886 2.855347e-01 0.668874823 1.199819e-03
10808 PLTP 0.9638364 1.238388e-02 0.918118793 1.328292e-04
3701 CYP2A6 0.9629823 1.423040e-05 0.962968119 NA
4966 FAM3D 0.9555788 NA 0.600502244 NA
7042 KCNK3 0.9522787 4.024430e-01 NA NA
7813 LINC01184 0.9432497 2.117952e-01 0.431682892 7.557711e-03
9708 NPC1L1 0.9424710 1.554473e-02 0.034386390 3.165223e-04
9156 MTSS1L 0.9415012 2.869371e-02 0.037988566 5.617468e-03
15082 SYTL1 0.9395033 1.756537e-02 0.893456831 6.220692e-03
4101 DNAJC13 0.9367189 1.991409e-02 0.182974607 7.098450e-01
3638 CWF19L1 0.9325472 1.827878e-01 0.454861395 3.059221e-02
6737 INHBB 0.9325080 3.446689e-03 0.924871617 NA
14587 SORCS2 0.9306663 5.363100e-01 0.296935707 NA
3352 CSNK1G3 0.9286371 5.054700e-02 0.856257343 6.359871e-04
13999 SIPA1 0.9279858 6.752746e-01 0.006928125 1.184108e-01
5409 FURIN 0.9264568 1.355123e-02 0.576553826 1.101458e-03
16526 USP1 0.9245645 2.489108e-02 0.028703145 NA
2911 CLDN23 0.9232695 1.212816e-02 0.907301712 3.118523e-03
1127 ASAP3 0.9216397 1.274738e-02 0.864995918 4.348818e-04
15905 TP53INP1 0.9197873 2.253316e-02 0.830580629 1.593936e-03
5414 FUT2 0.9156446 NA 0.915644596 NA
2309 CCDC92 0.9152343 7.742023e-01 0.028025996 NA
11168 PROCA1 0.9147101 3.424823e-01 0.005553268 5.880385e-02
5294 FN1 0.9096370 NA NA NA
2402 CD163L1 0.9047461 1.959406e-02 0.035162331 7.328086e-04
2360 CCNJ 0.9011318 8.611409e-01 0.023789855 NA
13620 SCD 0.8966746 3.822587e-01 NA NA
5783 GOLGA3 0.8939512 2.277986e-01 0.025606415 5.356020e-04
10332 PCMTD2 0.8855392 6.180144e-01 0.223554101 4.281626e-04
36 ABCA8 0.8846797 7.555040e-01 NA 1.290674e-01
13892 SGMS1 0.8789073 NA 0.863725889 1.389671e-02
10934 POP7 0.8789073 NA 0.533414346 NA
5428 FXYD7 0.8775436 NA 0.132962491 9.937231e-04
1427 BCAT2 0.8651509 4.098099e-03 0.824319782 1.169638e-04
15196 TBC1D4 0.8622324 2.038108e-02 0.811208641 9.394026e-04
8273 LRRK2 0.8615823 2.753414e-02 0.809671610 8.321358e-04
15238 TCEA2 0.8615755 5.282914e-02 0.021761223 4.443179e-04
5139 FCGRT 0.8600500 8.303016e-01 0.010602312 4.016893e-04
5819 GPAM 0.8571967 2.547350e-01 NA 1.018165e-05
11160 PRMT6 0.8568345 NA 0.856834534 NA
9840 NTN5 0.8566481 7.270804e-01 NA 1.341094e-02
11557 RAC1 0.8552519 1.108381e-02 0.016419401 8.137966e-01
1526 BMPR2 0.8470469 1.229210e-02 0.787185391 NA
545 AGPAT5 0.8377516 1.678332e-02 0.046780863 5.361411e-04
6307 HLA-DOB 0.8343305 1.111642e-03 0.001890051 8.173763e-01
3264 CRACR2B 0.8286668 4.871339e-02 0.657224870 4.912595e-04
16728 WBP1L 0.8259698 4.510818e-02 0.589511022 8.228377e-04
156 AC007950.2 0.8197025 3.090749e-01 0.019941610 NA
4597 EPHA2 0.8101171 NA 0.624840900 NA
3887 DDX56 0.8078343 9.446560e-03 0.784238219 8.070283e-04
13583 SAT2 0.8051579 9.348889e-03 0.474613148 1.745985e-02
1349 B3GNTL1 0.8025151 2.158654e-01 0.067971071 6.752778e-04
Adi_Vis Blood
10226 7.608907e-01 8.025353e-04
360 2.678639e-02 2.609409e-02
29 2.475966e-01 2.244753e-04
5523 1.982721e-01 NA
67 6.401715e-01 NA
17052 3.426133e-02 2.449138e-03
312 6.109232e-01 3.625641e-01
10453 3.657655e-01 1.302661e-03
3025 6.828877e-02 1.990581e-01
11315 1.855758e-02 1.000000e+00
1641 2.804989e-01 4.972113e-02
14832 2.223535e-02 3.039419e-03
15916 5.738659e-01 7.826934e-04
14751 2.908086e-02 2.959613e-03
7105 7.962470e-02 5.007783e-02
3609 5.118495e-01 3.459711e-03
4798 2.837018e-02 1.089281e-01
16206 2.663472e-02 9.738540e-04
677 3.654357e-02 NA
6758 6.694396e-03 1.967410e-04
1236 3.957347e-01 7.344768e-02
16301 NA 1.848591e-04
11147 6.491155e-01 2.229343e-04
10351 8.718695e-05 1.000000e+00
7481 NA 9.999936e-01
14881 1.246038e-02 3.187446e-02
15447 1.603996e-02 9.639997e-01
6000 6.361737e-01 1.640765e-03
2349 2.550588e-01 7.058831e-04
13424 NA 5.422448e-04
5635 9.631938e-01 5.738956e-04
16564 6.639053e-01 NA
1565 1.418192e-01 1.106281e-03
10683 NA 4.524507e-01
15838 NA NA
7248 1.537917e-02 NA
10808 3.295369e-02 2.472028e-04
3701 NA NA
4966 3.544714e-01 6.051230e-04
7042 5.498357e-01 NA
7813 2.836093e-01 8.604533e-03
9708 8.922234e-01 NA
9156 7.900653e-01 7.913620e-02
15082 1.540229e-02 6.858159e-03
4101 2.238685e-02 1.598327e-03
3638 2.280853e-01 3.622045e-02
6737 4.075661e-03 1.140056e-04
14587 9.574856e-02 1.672005e-03
3352 2.028048e-02 9.163172e-04
13999 5.305098e-03 1.220672e-01
5409 3.345380e-01 7.123035e-04
16526 NA 8.709703e-01
2911 NA 7.211311e-04
1127 4.062545e-02 2.836096e-03
15905 2.199261e-02 4.308695e-02
5414 NA NA
2309 1.129175e-01 8.848452e-05
11168 5.078408e-01 2.993583e-05
5294 9.096370e-01 NA
2402 5.000336e-02 7.992535e-01
2360 1.620110e-02 NA
13620 5.144160e-01 NA
5783 6.371643e-01 2.846246e-03
10332 2.807970e-02 1.546280e-02
36 1.081878e-04 NA
13892 NA 1.284725e-03
10934 2.866433e-01 5.884969e-02
5428 7.435873e-01 NA
1427 3.632639e-02 2.896209e-04
15196 2.815557e-02 1.547738e-03
8273 2.244231e-02 1.102111e-03
15238 7.865408e-01 NA
5139 1.839082e-02 3.535513e-04
5819 5.361740e-01 6.627750e-02
11160 NA NA
9840 1.160729e-01 8.383037e-05
11557 1.240857e-02 1.543520e-03
1526 4.679417e-02 7.752442e-04
545 7.726181e-01 1.033148e-03
6307 1.389156e-02 6.097016e-05
3264 1.108977e-01 1.133958e-02
16728 1.891164e-01 1.411428e-03
156 4.906860e-01 NA
4597 1.852762e-01 NA
3887 1.258655e-02 7.559634e-04
13583 2.797099e-01 2.402608e-02
1349 5.167734e-01 1.229971e-03
Compare with Case 3 (should be identical). Checking version update
results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case11")
weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db;/project2/compbio/predictdb/mashr_models/mashr_Brain_Cerebellum.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Visceral_Omentum.db;/project2/compbio/predictdb/mashr_models/mashr_Whole_Blood.db"
weight <- unlist(strsplit(weight, ";"))
#load information for all genes
gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())
for (i in 1:length(weight)){
sqlite <- RSQLite::dbDriver("SQLite")
db = RSQLite::dbConnect(sqlite, weight[i])
query <- function(...) RSQLite::dbGetQuery(db, ...)
gene_info_current <- query("select gene, genename, gene_type from extra")
RSQLite::dbDisconnect(db)
gene_info_current$weight <- weight[i]
gene_info <- rbind(gene_info, gene_info_current)
}
gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})
gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")
#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))
#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")
#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))
sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))
#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size
#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)
#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)
ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])
#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z
z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]
#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA
ctwas_res <- rbind(ctwas_gene_res,
ctwas_snp_res[,colnames(ctwas_gene_res)])
#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)
#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])
library(ggplot2)
library(cowplot)
load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))
#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument
#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec
#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]
#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation
#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))
title_size <- 12
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi)) +
ggtitle("Proportion Causal") +
theme_cowplot()
p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(sigma^2)) +
ggtitle("Effect Size") +
theme_cowplot()
p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(h^2[G])) +
ggtitle("PVE") +
theme_cowplot()
p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
ggtitle("Enrichment") +
theme_cowplot()
p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
plot_grid(p_pi, p_sigma2, p_enrich, p_pve)
####################
#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
SNP Liver Adipose_Subcutaneous
9.014951e-05 7.943610e-03 1.653321e-02
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
SNP Liver Adipose_Subcutaneous
16.908160 48.882570 6.014357
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 126.1567
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
SNP Liver Adipose_Subcutaneous
8696600 11003 13082
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
SNP Liver Adipose_Subcutaneous
0.038577093 0.012433786 0.003785652
#total PVE
sum(estimated_group_pve)
[1] 0.05479653
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
SNP Liver Adipose_Subcutaneous
0.70400612 0.22690827 0.06908562
#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
genename group region_tag susie_pip mu2
4435 PSRC1 Liver 1_67 1.0000000 1669.97618
2454 ST3GAL4 Liver 11_77 1.0000000 170.97506
12008 HPR Liver 16_38 0.9999974 156.57752
3721 INSIG2 Liver 2_69 0.9999934 68.61282
19759 ZDHHC7 Adipose_Subcutaneous 16_49 0.9995748 28.12889
5991 FADS1 Liver 11_34 0.9993538 163.18332
12687 RP4-781K5.7 Liver 1_121 0.9943723 202.13879
5563 ABCG8 Liver 2_27 0.9943320 311.88467
7410 ABCA1 Liver 9_53 0.9925506 70.70277
1999 PRKD2 Liver 19_33 0.9847824 29.89388
8531 TNKS Liver 8_12 0.9825262 77.45465
13717 TPD52 Adipose_Subcutaneous 8_57 0.9814059 21.06905
9390 GAS6 Liver 13_62 0.9803514 73.89025
1597 PLTP Liver 20_28 0.9775544 59.24311
5544 CNIH4 Liver 1_114 0.9743418 41.39386
3755 RRBP1 Liver 20_13 0.9719348 32.36358
11790 CYP2A6 Liver 19_28 0.9693535 29.87336
7040 INHBB Liver 2_70 0.9651557 74.16214
18637 ABCA8 Adipose_Subcutaneous 17_39 0.9558067 27.28313
25057 SIPA1 Adipose_Subcutaneous 11_36 0.9482173 22.29943
6093 CSNK1G3 Liver 5_75 0.9459871 83.36709
8579 STAT5B Liver 17_25 0.9418229 30.95090
2092 SP4 Liver 7_19 0.9299248 102.14354
15027 PLPPR2 Adipose_Subcutaneous 19_10 0.9297349 26.56239
6391 TTC39B Liver 9_13 0.9260892 23.56077
14913 FCGRT Adipose_Subcutaneous 19_34 0.9226574 20.62923
16361 CCDC92 Adipose_Subcutaneous 12_75 0.9184631 25.75399
15242 CCNJ Adipose_Subcutaneous 10_61 0.9161743 20.53548
8865 FUT2 Liver 19_33 0.9059564 70.62362
17109 POR Adipose_Subcutaneous 7_48 0.8988319 31.79763
21459 PROCA1 Adipose_Subcutaneous 17_17 0.8980242 26.41207
3562 ACVR1C Liver 2_94 0.8864262 25.08450
10657 TRIM39 Liver 6_26 0.8823124 82.45322
4704 DDX56 Liver 7_33 0.8805617 57.91924
6957 USP1 Liver 1_39 0.8781087 255.86929
18314 FAM117B Adipose_Subcutaneous 2_120 0.8638575 36.68997
6100 ALLC Liver 2_2 0.8446148 28.32175
13346 SPHK2 Adipose_Subcutaneous 19_33 0.8363777 34.76342
24743 PCMTD2 Adipose_Subcutaneous 20_38 0.8314753 28.89377
14273 SCD Adipose_Subcutaneous 10_64 0.8185682 19.92754
18041 IL1RN Adipose_Subcutaneous 2_67 0.8157131 21.52025
18737 NTN5 Adipose_Subcutaneous 19_33 0.8143027 54.80574
PVE z
4435 4.859936e-03 -41.687336
2454 4.975687e-04 13.376072
12008 4.556680e-04 -17.962770
3721 1.996746e-04 -8.982702
19759 8.182541e-05 -5.184802
5991 4.745865e-04 12.926351
12687 5.849503e-04 -15.108415
5563 9.024969e-04 -20.293982
7410 2.042252e-04 7.982017
1999 8.567278e-05 5.072217
8531 2.214685e-04 11.038564
13717 6.017471e-05 -4.684363
9390 2.108090e-04 -8.923688
1597 1.685385e-04 -5.732491
5544 1.173728e-04 6.145535
3755 9.154065e-05 7.008305
11790 8.427263e-05 5.407028
7040 2.083051e-04 -8.518936
18637 7.589001e-05 4.800447
25057 6.153497e-05 -5.096493
6093 2.295092e-04 9.116291
8579 8.483262e-05 5.426252
2092 2.764261e-04 10.693191
15027 7.186983e-05 3.965665
6391 6.349836e-05 -4.334495
14913 5.539158e-05 -4.347956
16361 6.883773e-05 -5.328046
15242 5.475241e-05 -4.639318
8865 1.861991e-04 -11.927107
17109 8.317515e-05 6.044322
21459 6.902569e-05 5.543282
3562 6.470955e-05 -4.687370
10657 2.117144e-04 8.840164
4704 1.484236e-04 9.641861
6957 6.538630e-04 16.258211
18314 9.223799e-05 7.852653
6100 6.961439e-05 4.919066
13346 8.461459e-05 -8.721460
24743 6.991557e-05 -5.636854
14273 4.747105e-05 -4.541468
18041 5.108638e-05 4.455379
18737 1.298770e-04 11.132252
#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")
#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]
#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")
df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names
for (i in 1:nrow(df_gene)){
gene <- df_gene$genename[i]
ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}
df_gene <- cbind(df_gene, df_gene_pips)
#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]
#abbreviate column names
df_gene <- dplyr::rename(df_gene,
Adi_Sub="Adipose_Subcutaneous",
Brain_Cer="Brain_Cerebellum",
Adi_Vis="Adipose_Visceral_Omentum",
Blood="Whole_Blood")
df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]
#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
genename combined_pip Adi_Sub Liver
290 ACVR1C 1.2821451 3.957190e-01 0.886426187
12546 ST3GAL4 1.0408713 4.087127e-02 0.999999993
9618 PSRC1 1.0329403 3.294030e-02 1.000000000
1409 C10orf88 1.0324476 2.351908e-01 0.797256743
2582 CNIH4 1.0263117 5.196981e-02 0.974341842
4697 GAS6 1.0252303 4.487895e-02 0.980351368
5734 INHBB 1.0242873 5.913165e-02 0.965155688
14503 ZDHHC7 1.0232406 9.995748e-01 0.023665871
5464 HPR 1.0166104 1.661304e-02 0.999997354
2864 CSNK1G3 1.0136737 6.768661e-02 0.945987078
5750 INSIG2 1.0133149 1.332146e-02 0.999993445
6035 KDSR 1.0028532 5.540887e-01 0.448764468
3125 CYP2A6 1.0014824 3.212884e-02 0.969353536
4086 FADS1 0.9993538 NA 0.999353840
8854 PELO 0.9979494 2.326841e-01 0.765265259
11311 RRBP1 0.9945557 2.262086e-02 0.971934797
11157 RP4-781K5.7 0.9943723 NA 0.994372303
61 ABCG8 0.9943320 NA 0.994331953
26 ABCA1 0.9925506 NA 0.992550635
9167 PLTP 0.9900735 1.251913e-02 0.977554352
9473 PRKD2 0.9892652 4.482861e-03 0.984782375
13505 TPD52 0.9891353 9.814059e-01 0.007729362
13436 TNKS 0.9825262 NA 0.982526205
12471 SPTY2D1 0.9768431 4.886837e-01 0.488159443
12506 SRRT 0.9757710 4.263893e-01 0.549381704
12588 STAT5B 0.9748353 3.301246e-02 0.941822874
12353 SP4 0.9734963 4.357146e-02 0.929924805
13763 TTC39B 0.9716737 4.558453e-02 0.926089174
6154 KLHDC7A 0.9693547 6.579458e-01 0.311408861
31 ABCA8 0.9558067 9.558067e-01 NA
11812 SIPA1 0.9541853 9.482173e-01 0.005967923
2014 CCND2 0.9525749 7.250596e-01 0.227515311
9055 PKN3 0.9395463 4.786795e-01 0.460866780
9161 PLPPR2 0.9349051 9.297349e-01 0.005170169
14045 USP1 0.9339745 5.586579e-02 0.878108668
1978 CCDC92 0.9324492 9.184631e-01 0.013986025
4386 FCGRT 0.9253876 9.226574e-01 0.002730238
2025 CCNJ 0.9215620 9.161743e-01 0.005387656
569 ALLC 0.9204296 7.581480e-02 0.844614756
3070 CWF19L1 0.9152537 4.971167e-01 0.418137016
4613 FUT2 0.9059564 NA 0.905956407
9491 PROCA1 0.9052901 8.980242e-01 0.007265855
8749 PCMTD2 0.9050256 8.314753e-01 0.073550293
9282 POR 0.9045059 8.988319e-01 0.005673998
3290 DDX56 0.9043148 2.375309e-02 0.880561708
13607 TRIM39 0.8823125 9.626981e-08 0.882312357
4114 FAM117B 0.8638575 8.638575e-01 NA
12330 SORCS2 0.8570663 7.657866e-01 0.091279663
12414 SPHK2 0.8521159 8.363777e-01 0.015738193
12766 SYTL1 0.8431171 4.316092e-02 0.799956216
6610 LINC01184 0.8410288 6.047286e-01 0.236300159
1346 BRI3 0.8302506 6.192655e-02 0.768324031
11479 SCD 0.8185682 8.185682e-01 NA
5688 IL1RN 0.8157131 8.157131e-01 NA
8341 NTN5 0.8143027 8.143027e-01 NA
Compare with Case 3. Randomly samples 50% of genes to include.
results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case16")
weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db"
weight <- unlist(strsplit(weight, ";"))
#load information for all genes
gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())
for (i in 1:length(weight)){
sqlite <- RSQLite::dbDriver("SQLite")
db = RSQLite::dbConnect(sqlite, weight[i])
query <- function(...) RSQLite::dbGetQuery(db, ...)
gene_info_current <- query("select gene, genename, gene_type from extra")
RSQLite::dbDisconnect(db)
gene_info_current$weight <- weight[i]
gene_info <- rbind(gene_info, gene_info_current)
}
gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})
gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")
#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))
#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")
#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))
sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))
#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size
#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)
#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)
ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])
#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z
z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]
#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA
ctwas_res <- rbind(ctwas_gene_res,
ctwas_snp_res[,colnames(ctwas_gene_res)])
#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)
#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])
library(ggplot2)
library(cowplot)
load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))
#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument
#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec
#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]
#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation
#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))
title_size <- 12
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi)) +
ggtitle("Proportion Causal") +
theme_cowplot()
p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(sigma^2)) +
ggtitle("Effect Size") +
theme_cowplot()
p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(h^2[G])) +
ggtitle("PVE") +
theme_cowplot()
p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
ggtitle("Enrichment") +
theme_cowplot()
p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
plot_grid(p_pi, p_sigma2, p_enrich, p_pve)
Version | Author | Date |
---|---|---|
05269c3 | wesleycrouse | 2023-01-10 |
####################
#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
SNP Liver Adipose_Subcutaneous
0.0001213438 0.0071379166 0.0226800038
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
SNP Liver Adipose_Subcutaneous
15.13968 60.59001 9.68234
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 161.6083
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
SNP Liver Adipose_Subcutaneous
8696600 5545 6497
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
SNP Liver Adipose_Subcutaneous
0.046494778 0.006979019 0.004151993
#total PVE
sum(estimated_group_pve)
[1] 0.05762579
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
SNP Liver Adipose_Subcutaneous
0.80683975 0.12110929 0.07205095
#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
genename group region_tag susie_pip mu2
2454 ST3GAL4 Liver 11_77 1.0000000 176.38090
12008 HPR Liver 16_38 0.9999978 158.76202
19759 ZDHHC7 Adipose_Subcutaneous 16_49 0.9999084 31.61800
12687 RP4-781K5.7 Liver 1_121 0.9964423 204.05543
6957 USP1 Liver 1_39 0.9931048 258.31055
13717 TPD52 Adipose_Subcutaneous 8_57 0.9893922 22.59036
5544 CNIH4 Liver 1_114 0.9821766 41.28484
18314 FAM117B Adipose_Subcutaneous 2_120 0.9801395 40.66820
8531 TNKS Liver 8_12 0.9786638 77.06108
3755 RRBP1 Liver 20_13 0.9690390 32.69412
16444 C10orf88 Adipose_Subcutaneous 10_77 0.9621970 32.65865
14913 FCGRT Adipose_Subcutaneous 19_34 0.9603780 21.21613
17109 POR Adipose_Subcutaneous 7_48 0.9593199 33.73845
7040 INHBB Liver 2_70 0.9580297 74.79794
16361 CCDC92 Adipose_Subcutaneous 12_75 0.9469543 27.36424
15242 CCNJ Adipose_Subcutaneous 10_61 0.9459691 21.65254
3247 KDSR Liver 18_35 0.9329336 24.77843
13993 SRRT Adipose_Subcutaneous 7_62 0.9235605 26.54317
19681 PELO Adipose_Subcutaneous 5_30 0.9185746 62.92380
8579 STAT5B Liver 17_25 0.9149897 31.54926
13020 ZDHHC6 Adipose_Subcutaneous 10_70 0.8991049 20.79800
14273 SCD Adipose_Subcutaneous 10_64 0.8902595 19.30919
16736 ACVR1C Adipose_Subcutaneous 2_94 0.8817067 19.80169
20155 TMED4 Adipose_Subcutaneous 7_32 0.8723188 38.77679
13346 SPHK2 Adipose_Subcutaneous 19_33 0.8705643 39.62417
1597 PLTP Liver 20_28 0.8665201 41.32560
8865 FUT2 Liver 19_33 0.8587473 103.22839
27239 DUSP14 Adipose_Subcutaneous 17_22 0.8502019 18.56556
21711 VAMP5 Adipose_Subcutaneous 2_54 0.8454320 22.48435
16347 CCND2 Adipose_Subcutaneous 12_4 0.8306329 19.93197
22062 KCNK3 Adipose_Subcutaneous 2_16 0.8299863 20.39883
14808 CTSH Adipose_Subcutaneous 15_37 0.8011763 18.48091
PVE z
2454 5.133007e-04 13.376072
12008 4.620255e-04 -17.962770
19759 9.200575e-05 -5.184802
12687 5.917259e-04 -15.108415
6957 7.465476e-04 16.258211
13717 6.504469e-05 -4.684363
5544 1.180050e-04 6.145535
18314 1.160014e-04 7.852653
8531 2.194769e-04 11.038564
3755 9.220006e-05 7.008305
16444 9.144976e-05 -6.762952
14913 5.929645e-05 -4.347956
17109 9.419090e-05 6.044322
7040 2.085398e-04 -8.518936
16361 7.541065e-05 -5.328046
15242 5.960820e-05 -4.639318
3247 6.727362e-05 -4.526287
13993 7.134086e-05 4.556557
19681 1.682092e-04 8.522224
8579 8.400897e-05 5.426252
13020 5.441920e-05 3.908646
14273 5.002660e-05 -4.541468
16736 5.080971e-05 -4.185879
20155 9.843905e-05 7.696786
13346 1.003879e-04 -8.721460
1597 1.042121e-04 -5.732491
8865 2.579793e-04 -11.927107
27239 4.593571e-05 3.886352
21711 5.531964e-05 4.547456
16347 4.818143e-05 -4.190519
22062 4.927157e-05 -4.772296
14808 4.308952e-05 3.616369
#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")
#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]
#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")
df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names
for (i in 1:nrow(df_gene)){
gene <- df_gene$genename[i]
ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}
df_gene <- cbind(df_gene, df_gene_pips)
#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]
#abbreviate column names
df_gene <- dplyr::rename(df_gene,
Adi_Sub="Adipose_Subcutaneous",
Brain_Cer="Brain_Cerebellum",
Adi_Vis="Adipose_Visceral_Omentum",
Blood="Whole_Blood")
df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]
#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
genename combined_pip Adi_Sub Liver
3741 INHBB 1.0319289 0.07389923 0.958029663
9387 ZDHHC7 1.0201173 0.99990845 0.020208878
8115 ST3GAL4 1.0000000 NA 0.999999997
3558 HPR 0.9999978 NA 0.999997774
7206 RP4-781K5.7 0.9964423 NA 0.996442264
7307 RRBP1 0.9945291 0.02549009 0.969039009
9092 USP1 0.9931048 NA 0.993104752
8750 TPD52 0.9893922 0.98939216 NA
1664 CNIH4 0.9821766 NA 0.982176629
2697 FAM117B 0.9801395 0.98013946 NA
8702 TNKS 0.9786638 NA 0.978663795
6015 POR 0.9636536 0.95931991 0.004333661
2869 FCGRT 0.9625221 0.96037797 0.002144161
912 C10orf88 0.9621970 0.96219701 NA
8140 STAT5B 0.9621156 0.04712591 0.914989707
1293 CCND2 0.9584228 0.83063294 0.127789894
1267 CCDC92 0.9536252 0.94695427 0.006670922
1301 CCNJ 0.9459691 0.94596906 NA
3935 KDSR 0.9329336 NA 0.932933551
5940 PLTP 0.9258143 0.05929421 0.866520136
8088 SRRT 0.9235605 0.92356048 NA
5741 PELO 0.9185746 0.91857462 NA
9386 ZDHHC6 0.8991049 0.89910486 NA
7426 SCD 0.8902595 0.89025947 NA
189 ACVR1C 0.8817067 0.88170666 NA
8025 SPHK2 0.8783859 0.87056431 0.007821603
8515 TMED4 0.8723188 0.87231880 NA
3015 FUT2 0.8587473 NA 0.858747314
2361 DUSP14 0.8502019 0.85020193 NA
1815 CRACR2B 0.8474532 0.25635618 0.591097068
9131 VAMP5 0.8454320 0.84543195 NA
3899 KCNK3 0.8299863 0.82998628 NA
354 ALDH16A1 0.8216692 0.70383628 0.117832960
3012 FUCA1 0.8026530 0.79242898 0.010223984
1971 CTSH 0.8011763 0.80117633 NA
Compare with Cases 3 and 16. Randomly samples 50% of genes to include.
results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case17")
weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db"
weight <- unlist(strsplit(weight, ";"))
#load information for all genes
gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())
for (i in 1:length(weight)){
sqlite <- RSQLite::dbDriver("SQLite")
db = RSQLite::dbConnect(sqlite, weight[i])
query <- function(...) RSQLite::dbGetQuery(db, ...)
gene_info_current <- query("select gene, genename, gene_type from extra")
RSQLite::dbDisconnect(db)
gene_info_current$weight <- weight[i]
gene_info <- rbind(gene_info, gene_info_current)
}
gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})
gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")
#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))
#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")
#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))
sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))
#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size
#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)
#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)
ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])
#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z
z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]
#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA
ctwas_res <- rbind(ctwas_gene_res,
ctwas_snp_res[,colnames(ctwas_gene_res)])
#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)
#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])
library(ggplot2)
library(cowplot)
load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))
#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument
#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec
#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]
#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation
#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))
title_size <- 12
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi)) +
ggtitle("Proportion Causal") +
theme_cowplot()
p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(sigma^2)) +
ggtitle("Effect Size") +
theme_cowplot()
p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(h^2[G])) +
ggtitle("PVE") +
theme_cowplot()
p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
ggtitle("Enrichment") +
theme_cowplot()
p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
plot_grid(p_pi, p_sigma2, p_enrich, p_pve)
Version | Author | Date |
---|---|---|
05269c3 | wesleycrouse | 2023-01-10 |
####################
#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
SNP Liver Adipose_Subcutaneous
6.482204e-05 6.009965e-03 3.656378e-02
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
SNP Liver Adipose_Subcutaneous
37.333321 172.303079 7.616652
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 374.5475
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
SNP Liver Adipose_Subcutaneous
8696600 5446 6596
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
SNP Liver Adipose_Subcutaneous
0.061247606 0.016412054 0.005345842
#total PVE
sum(estimated_group_pve)
[1] 0.0830055
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
SNP Liver Adipose_Subcutaneous
0.73787405 0.19772248 0.06440347
#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
genename group region_tag susie_pip mu2
2454 ST3GAL4 Liver 11_77 1.0000000 176.65176
4435 PSRC1 Liver 1_67 1.0000000 1724.56864
3721 INSIG2 Liver 2_69 0.9999900 70.76506
19759 ZDHHC7 Adipose_Subcutaneous 16_49 0.9998596 28.85537
18637 ABCA8 Adipose_Subcutaneous 17_39 0.9900544 28.23760
25057 SIPA1 Adipose_Subcutaneous 11_36 0.9865119 22.84991
5544 CNIH4 Liver 1_114 0.9847141 43.51446
14913 FCGRT Adipose_Subcutaneous 19_34 0.9756560 19.79338
24743 PCMTD2 Adipose_Subcutaneous 20_38 0.9720272 29.27508
12687 RP4-781K5.7 Liver 1_121 0.9679462 207.12864
13020 ZDHHC6 Adipose_Subcutaneous 10_70 0.9640113 21.01357
8531 TNKS Liver 8_12 0.9613470 79.96515
27265 AC007950.2 Adipose_Subcutaneous 15_29 0.9580793 30.04037
13346 SPHK2 Adipose_Subcutaneous 19_33 0.9557932 35.57941
12875 MPND Adipose_Subcutaneous 19_5 0.9550319 19.74430
14240 CWF19L1 Adipose_Subcutaneous 10_64 0.9441068 32.22444
21708 USP39 Adipose_Subcutaneous 2_54 0.9384321 18.13206
16736 ACVR1C Adipose_Subcutaneous 2_94 0.9321572 18.82547
22062 KCNK3 Adipose_Subcutaneous 2_16 0.9287638 20.22888
16496 ELF1 Adipose_Subcutaneous 13_16 0.9258803 17.25157
7040 INHBB Liver 2_70 0.9257392 76.53463
20426 ALDH16A1 Adipose_Subcutaneous 19_34 0.9181006 18.92977
5563 ABCG8 Liver 2_27 0.9175320 321.20444
22969 KLHDC7A Adipose_Subcutaneous 1_13 0.9077054 18.02272
18737 NTN5 Adipose_Subcutaneous 19_33 0.9060975 53.38487
22699 ZNF575 Adipose_Subcutaneous 19_30 0.9043218 24.20160
24579 C2CD4A Adipose_Subcutaneous 15_28 0.9014682 21.33820
3247 KDSR Liver 18_35 0.9001984 25.41426
16438 GPAM Adipose_Subcutaneous 10_70 0.8918491 18.52014
19681 PELO Adipose_Subcutaneous 5_30 0.8900162 60.82831
13615 CYP2W1 Adipose_Subcutaneous 7_2 0.8840618 19.42157
19793 UBASH3B Adipose_Subcutaneous 11_74 0.8835475 23.84138
18321 PARP9 Adipose_Subcutaneous 3_76 0.8816283 23.10575
25930 TMEM199 Adipose_Subcutaneous 17_17 0.8759915 38.41378
6391 TTC39B Liver 9_13 0.8594171 25.04182
13992 ACHE Adipose_Subcutaneous 7_62 0.8583735 26.66383
18578 ARHGAP17 Adipose_Subcutaneous 16_21 0.8450452 17.27680
22992 FUCA1 Adipose_Subcutaneous 1_17 0.8432085 17.74356
24646 PNP Adipose_Subcutaneous 14_1 0.8374570 16.79333
15981 NRBP1 Adipose_Subcutaneous 2_16 0.8322309 105.32873
20402 CCDC116 Adipose_Subcutaneous 22_4 0.8277800 20.10887
8865 FUT2 Liver 19_33 0.8172658 69.50040
24460 FAM49A Adipose_Subcutaneous 2_10 0.8111895 17.08999
27239 DUSP14 Adipose_Subcutaneous 17_22 0.8052276 17.64644
PVE z
2454 5.140889e-04 13.376072
4435 5.018810e-03 -41.687336
3721 2.059372e-04 -8.982702
19759 8.396263e-05 -5.184802
18637 8.135930e-05 4.800447
25057 6.560049e-05 -5.096493
5544 1.246993e-04 6.145535
14913 5.620010e-05 -4.347956
24743 8.281268e-05 -5.636854
12687 5.834608e-04 -15.108415
13020 5.895251e-05 3.908646
8531 2.237181e-04 11.038564
27265 8.375816e-05 5.555780
13346 9.896530e-05 -8.721460
12875 5.487567e-05 -4.366911
14240 8.853741e-05 5.837199
21708 4.951881e-05 4.083599
16736 5.106876e-05 -4.185879
22062 5.467610e-05 -4.772296
16496 4.648404e-05 -3.876887
7040 2.061897e-04 -8.518936
20426 5.057732e-05 -4.215260
5563 8.576756e-04 -20.293982
22969 4.760862e-05 4.124187
18737 1.407711e-04 11.132252
22699 6.369237e-05 -5.711154
24579 5.597943e-05 4.535165
3247 6.657881e-05 -4.526287
16438 4.806799e-05 4.000718
19681 1.575520e-04 8.522224
13615 4.996747e-05 4.311978
19793 6.130298e-05 4.906621
18321 5.928242e-05 3.744644
25930 9.792808e-05 6.615834
6391 6.263111e-05 -4.334495
13992 6.660688e-05 5.880517
18578 4.248772e-05 3.867562
22992 4.354077e-05 -3.851528
24646 4.092793e-05 -3.575272
15981 2.551003e-04 5.571352
20402 4.844209e-05 4.331655
8865 1.652993e-04 -11.927107
24460 4.034451e-05 -3.586393
27239 4.135196e-05 3.886352
#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")
#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]
#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")
df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names
for (i in 1:nrow(df_gene)){
gene <- df_gene$genename[i]
ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}
df_gene <- cbind(df_gene, df_gene_pips)
#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]
#abbreviate column names
df_gene <- dplyr::rename(df_gene,
Adi_Sub="Adipose_Subcutaneous",
Brain_Cer="Brain_Cerebellum",
Adi_Vis="Adipose_Visceral_Omentum",
Blood="Whole_Blood")
df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]
#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
genename combined_pip Adi_Sub Liver
8115 ST3GAL4 1.0957196 0.09571960 0.999999980
3758 INSIG2 1.0495571 0.04956712 0.999990027
9390 ZDHHC7 1.0122296 0.99985959 0.012370042
6238 PSRC1 1.0000000 NA 0.999999957
4015 KLHDC7A 0.9926113 0.90770536 0.084905928
21 ABCA8 0.9900544 0.99005443 NA
7623 SIPA1 0.9892596 0.98651189 0.002747697
1689 CNIH4 0.9847141 NA 0.984714122
2865 FCGRT 0.9756560 0.97565601 NA
5673 PCMTD2 0.9720272 0.97202716 NA
7198 RP4-781K5.7 0.9679462 NA 0.967946158
9389 ZDHHC6 0.9640113 0.96401125 NA
8680 TNKS 0.9613470 NA 0.961346973
86 AC007950.2 0.9580793 0.95807933 NA
8029 SPHK2 0.9557932 0.95579316 NA
4872 MPND 0.9550319 0.95503190 NA
2002 CWF19L1 0.9441068 0.94410682 NA
9097 USP39 0.9384321 0.93843206 NA
187 ACVR1C 0.9321572 0.93215717 NA
3907 KCNK3 0.9287638 0.92876384 NA
2487 ELF1 0.9258803 0.92588032 NA
3747 INHBB 0.9257392 NA 0.925739175
8565 TMEM199 0.9193194 0.87599146 0.043327909
346 ALDH16A1 0.9181006 0.91810056 NA
42 ABCG8 0.9175320 NA 0.917531976
5405 NTN5 0.9060975 0.90609755 NA
1031 C2CD4A 0.9049330 0.90146823 0.003464813
9618 ZNF575 0.9043218 0.90432184 NA
3942 KDSR 0.9001984 NA 0.900198430
3244 GPAM 0.8918491 0.89184909 NA
5738 PELO 0.8900162 0.89001616 NA
2050 CYP2W1 0.8840618 0.88406180 NA
2078 DAGLB 0.8839621 0.79698095 0.086981200
8983 UBASH3B 0.8835475 0.88354754 NA
5609 PARP9 0.8816283 0.88162827 NA
8903 TTC39B 0.8594171 NA 0.859417107
141 ACHE 0.8583735 0.85837347 NA
543 ARHGAP17 0.8482212 0.84504523 0.003175997
3000 FUCA1 0.8432085 0.84320848 NA
5981 PNP 0.8374570 0.83745695 NA
5367 NRBP1 0.8343007 0.83223094 0.002069784
1233 CCDC116 0.8277800 0.82777995 NA
3006 FUT2 0.8172658 NA 0.817265778
2772 FAM49A 0.8111895 0.81118949 NA
2370 DUSP14 0.8083621 0.80522762 0.003134518
Compare with Cases 3, 16, 17. Randomly samples 50% of genes to include.
results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case18")
weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db"
weight <- unlist(strsplit(weight, ";"))
#load information for all genes
gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())
for (i in 1:length(weight)){
sqlite <- RSQLite::dbDriver("SQLite")
db = RSQLite::dbConnect(sqlite, weight[i])
query <- function(...) RSQLite::dbGetQuery(db, ...)
gene_info_current <- query("select gene, genename, gene_type from extra")
RSQLite::dbDisconnect(db)
gene_info_current$weight <- weight[i]
gene_info <- rbind(gene_info, gene_info_current)
}
gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})
gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")
#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))
#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")
#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))
sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))
#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size
#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)
#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)
ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])
#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z
z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]
#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA
ctwas_res <- rbind(ctwas_gene_res,
ctwas_snp_res[,colnames(ctwas_gene_res)])
#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)
#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])
library(ggplot2)
library(cowplot)
load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))
#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument
#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec
#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]
#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation
#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))
title_size <- 12
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi)) +
ggtitle("Proportion Causal") +
theme_cowplot()
p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(sigma^2)) +
ggtitle("Effect Size") +
theme_cowplot()
p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(h^2[G])) +
ggtitle("PVE") +
theme_cowplot()
p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
ggtitle("Enrichment") +
theme_cowplot()
p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
plot_grid(p_pi, p_sigma2, p_enrich, p_pve)
Version | Author | Date |
---|---|---|
05269c3 | wesleycrouse | 2023-01-10 |
####################
#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
SNP Liver Adipose_Subcutaneous
0.0001106724 0.0137896334 0.0173282248
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
SNP Liver Adipose_Subcutaneous
17.760505 26.291691 5.245622
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 112.0352
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
SNP Liver Adipose_Subcutaneous
8696600 5498 6544
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
SNP Liver Adipose_Subcutaneous
0.049746723 0.005800912 0.001731070
#total PVE
sum(estimated_group_pve)
[1] 0.05727871
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
SNP Liver Adipose_Subcutaneous
0.86850292 0.10127520 0.03022188
#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
genename group region_tag susie_pip mu2
2454 ST3GAL4 Liver 11_77 1.0000000 166.72884
3721 INSIG2 Liver 2_69 0.9999936 66.56031
19759 ZDHHC7 Adipose_Subcutaneous 16_49 0.9994884 27.45044
7410 ABCA1 Liver 9_53 0.9927778 68.53207
1999 PRKD2 Liver 19_33 0.9898406 29.09938
3755 RRBP1 Liver 20_13 0.9845204 31.41717
11790 CYP2A6 Liver 19_28 0.9804901 30.45145
12687 RP4-781K5.7 Liver 1_121 0.9802779 196.86171
5544 CNIH4 Liver 1_114 0.9797271 39.98647
3247 KDSR Liver 18_35 0.9725500 23.90255
6391 TTC39B Liver 9_13 0.9688530 22.54707
7040 INHBB Liver 2_70 0.9650038 71.97295
3562 ACVR1C Liver 2_94 0.9600798 25.24584
18637 ABCA8 Adipose_Subcutaneous 17_39 0.9571227 26.34493
25057 SIPA1 Adipose_Subcutaneous 11_36 0.9423376 21.75433
6093 CSNK1G3 Liver 5_75 0.9412921 81.48936
20337 PKN3 Adipose_Subcutaneous 9_66 0.9397147 38.41851
9062 KLHDC7A Liver 1_13 0.9284125 20.70842
21459 PROCA1 Adipose_Subcutaneous 17_17 0.9166599 26.60236
14913 FCGRT Adipose_Subcutaneous 19_34 0.9101678 19.89277
2092 SP4 Liver 7_19 0.9020442 98.66457
16361 CCDC92 Adipose_Subcutaneous 12_75 0.8891349 25.42626
6220 PELO Liver 5_31 0.8856013 68.44503
10657 TRIM39 Liver 6_26 0.8669252 80.37077
3212 CCND2 Liver 12_4 0.8656458 21.48301
11726 CLDN23 Liver 8_11 0.8506412 24.21342
12875 MPND Adipose_Subcutaneous 19_5 0.8450061 20.39906
6957 USP1 Liver 1_39 0.8220847 246.91080
3881 VIL1 Liver 2_129 0.8021120 26.13075
PVE z
2454 4.852114e-04 13.376072
3721 1.937015e-04 -8.982702
19759 7.984493e-05 -5.184802
7410 1.980005e-04 7.982017
1999 8.382416e-05 5.072217
3755 9.001442e-05 7.008305
11790 8.689033e-05 5.407028
12687 5.616047e-04 -15.108415
5544 1.140088e-04 6.145535
3247 6.765135e-05 -4.526287
6391 6.357236e-05 -4.334495
7040 2.021244e-04 -8.518936
3562 7.053706e-05 -4.687370
18637 7.338122e-05 4.800447
25057 5.965854e-05 -5.096493
6093 2.232264e-04 9.116291
20337 1.050647e-04 -6.780039
9062 5.595105e-05 4.124187
21459 7.096573e-05 5.543282
14913 5.269106e-05 -4.347956
2092 2.590057e-04 10.693191
16361 6.579159e-05 -5.328046
6220 1.764008e-04 8.288398
10657 2.027683e-04 8.840164
3212 5.411972e-05 -4.065830
11726 5.994084e-05 4.720010
12875 5.016380e-05 -4.366911
6957 5.907136e-04 16.258211
3881 6.099682e-05 4.725531
#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")
#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]
#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")
df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names
for (i in 1:nrow(df_gene)){
gene <- df_gene$genename[i]
ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}
df_gene <- cbind(df_gene, df_gene_pips)
#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]
#abbreviate column names
df_gene <- dplyr::rename(df_gene,
Adi_Sub="Adipose_Subcutaneous",
Brain_Cer="Brain_Cerebellum",
Adi_Vis="Adipose_Visceral_Omentum",
Blood="Whole_Blood")
df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]
#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
genename combined_pip Adi_Sub Liver
9427 ZDHHC7 1.0484150 9.994884e-01 0.04892660
1677 CNIH4 1.0300841 5.035691e-02 0.97972714
3736 INSIG2 1.0148991 1.490554e-02 0.99999358
923 C10orf88 1.0129152 2.843843e-01 0.72853091
6162 PRKD2 1.0048176 1.497700e-02 0.98984059
8121 ST3GAL4 1.0000000 NA 0.99999999
8077 SPTY2D1 0.9999106 3.372090e-01 0.66270164
18 ABCA1 0.9927778 NA 0.99277780
8096 SRRT 0.9913786 2.465669e-01 0.74481166
7320 RRBP1 0.9845204 NA 0.98452045
2027 CYP2A6 0.9804901 NA 0.98049008
7205 RP4-781K5.7 0.9802779 NA 0.98027791
3921 KDSR 0.9725500 NA 0.97255001
8941 TTC39B 0.9688530 NA 0.96885296
3726 INHBB 0.9650038 NA 0.96500383
196 ACVR1C 0.9600798 NA 0.96007976
7644 SIPA1 0.9588468 9.423376e-01 0.01650911
23 ABCA8 0.9571227 9.571227e-01 NA
1865 CSNK1G3 0.9412921 NA 0.94129212
5888 PKN3 0.9397147 9.397147e-01 NA
8536 TMED4 0.9339946 2.523967e-01 0.68159781
4005 KLHDC7A 0.9284125 NA 0.92841251
1286 CCDC92 0.9237987 8.891349e-01 0.03466381
6173 PROCA1 0.9166599 9.166599e-01 NA
2858 FCGRT 0.9101678 9.101678e-01 NA
7994 SP4 0.9020442 NA 0.90204415
9126 USP1 0.8980912 7.600644e-02 0.82208475
5755 PELO 0.8856013 NA 0.88560133
1987 CWF19L1 0.8791238 3.338231e-01 0.54530070
8832 TRIM39 0.8669253 8.323616e-08 0.86692520
1308 CCND2 0.8656458 NA 0.86564577
8336 TBC1D4 0.8588602 5.951134e-02 0.79934884
1615 CLDN23 0.8506412 NA 0.85064120
8754 TP53INP1 0.8466237 5.579647e-02 0.79082721
4901 MPND 0.8450061 8.450061e-01 NA
7982 SORCS2 0.8384003 6.587743e-01 0.17962599
9183 VIL1 0.8021120 NA 0.80211199
Compare with Case 7 (should be identical). Checking new version.
results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case19")
weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db"
weight <- unlist(strsplit(weight, ";"))
#load information for all genes
gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())
for (i in 1:length(weight)){
sqlite <- RSQLite::dbDriver("SQLite")
db = RSQLite::dbConnect(sqlite, weight[i])
query <- function(...) RSQLite::dbGetQuery(db, ...)
gene_info_current <- query("select gene, genename, gene_type from extra")
RSQLite::dbDisconnect(db)
gene_info_current$weight <- weight[i]
gene_info <- rbind(gene_info, gene_info_current)
}
gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})
gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")
#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))
#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")
#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))
sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))
#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size
#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)
#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- ctwas_gene_res$type
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)
ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])
#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z
z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]
#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA
ctwas_res <- rbind(ctwas_gene_res,
ctwas_snp_res[,colnames(ctwas_gene_res)])
#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)
#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])
library(ggplot2)
library(cowplot)
load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))
#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument
#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec
#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]
#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation
#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))
title_size <- 12
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi)) +
ggtitle("Proportion Causal") +
theme_cowplot()
p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(sigma^2)) +
ggtitle("Effect Size") +
theme_cowplot()
p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(h^2[G])) +
ggtitle("PVE") +
theme_cowplot()
p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
ggtitle("Enrichment") +
theme_cowplot()
p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
plot_grid(p_pi, p_sigma2, p_enrich, p_pve)
Version | Author | Date |
---|---|---|
05269c3 | wesleycrouse | 2023-01-10 |
####################
#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
SNP Liver
0.0001708048 0.0102158606
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
SNP Liver
9.70388 43.32067
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 59.81014
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
SNP Liver
8696600 11003
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
SNP Liver
0.04194840 0.01417103
#total PVE
sum(estimated_group_pve)
[1] 0.05611943
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
SNP Liver
0.7474844 0.2525156
#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
genename group region_tag susie_pip mu2 PVE z
4435 PSRC1 Liver 1_67 1.0000000 1661.18355 4.834348e-03 -41.687336
2454 ST3GAL4 Liver 11_77 1.0000000 173.08249 5.037017e-04 13.376072
12008 HPR Liver 16_38 0.9999998 163.53982 4.759307e-04 -17.962770
3721 INSIG2 Liver 2_69 0.9999958 68.51740 1.993973e-04 -8.982702
5563 ABCG8 Liver 2_27 0.9999660 313.08513 9.111040e-04 -20.293982
5991 FADS1 Liver 11_34 0.9998572 163.89639 4.769004e-04 12.926351
12687 RP4-781K5.7 Liver 1_121 0.9997364 203.40050 5.917766e-04 -15.108415
7410 ABCA1 Liver 9_53 0.9956684 70.26807 2.036072e-04 7.982017
8531 TNKS Liver 8_12 0.9915320 76.52776 2.208239e-04 11.038564
9390 GAS6 Liver 13_62 0.9888998 71.30764 2.052148e-04 -8.923688
1597 PLTP Liver 20_28 0.9885315 61.47131 1.768411e-04 -5.732491
1999 PRKD2 Liver 19_33 0.9867185 30.06125 8.632181e-05 5.072217
7040 INHBB Liver 2_70 0.9832856 73.94275 2.115902e-04 -8.518936
3755 RRBP1 Liver 20_13 0.9819053 32.42030 9.264178e-05 7.008305
5544 CNIH4 Liver 1_114 0.9787252 40.77861 1.161485e-04 6.145535
2092 SP4 Liver 7_19 0.9781344 102.23228 2.910093e-04 10.693191
6093 CSNK1G3 Liver 5_75 0.9759584 84.08145 2.388096e-04 9.116291
8865 FUT2 Liver 19_33 0.9674812 104.74356 2.949105e-04 -11.927107
11790 CYP2A6 Liver 19_28 0.9643766 31.94912 8.966560e-05 5.407028
10657 TRIM39 Liver 6_25 0.9605468 81.54422 2.279460e-04 8.840164
3247 KDSR Liver 18_35 0.9578772 24.64899 6.871148e-05 -4.526287
233 NPC1L1 Liver 7_32 0.9548612 86.93035 2.415639e-04 -10.761931
4704 DDX56 Liver 7_32 0.9494642 59.88499 1.654691e-04 9.641861
6391 TTC39B Liver 9_13 0.9414293 23.21539 6.360392e-05 -4.334495
6220 PELO Liver 5_31 0.9388155 70.72581 1.932317e-04 8.288398
1114 SRRT Liver 7_62 0.9383759 32.70763 8.931947e-05 5.424996
6778 PKN3 Liver 9_66 0.9376624 47.53713 1.297179e-04 -6.620563
3300 C10orf88 Liver 10_77 0.9360721 37.17593 1.012725e-04 -6.787850
8579 STAT5B Liver 17_25 0.9310376 30.58279 8.286375e-05 5.426252
3562 ACVR1C Liver 2_94 0.9285980 25.84902 6.985414e-05 -4.687370
6957 USP1 Liver 1_39 0.8941238 253.51088 6.596515e-04 16.258211
9062 KLHDC7A Liver 1_13 0.8301423 22.36541 5.403183e-05 4.124187
8418 POP7 Liver 7_62 0.8209688 40.45771 9.666033e-05 -5.845258
9072 SPTY2D1 Liver 11_13 0.8208409 33.49049 8.000198e-05 -5.557123
8931 CRACR2B Liver 11_1 0.8079877 21.87357 5.143334e-05 -3.989585
5415 SYTL1 Liver 1_19 0.8069655 22.30889 5.239059e-05 -3.962854
6100 ALLC Liver 2_2 0.8068337 28.07319 6.591680e-05 4.919066
#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")
#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]
#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")
df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names
for (i in 1:nrow(df_gene)){
gene <- df_gene$genename[i]
ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}
df_gene <- cbind(df_gene, df_gene_pips)
#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]
#abbreviate column names
df_gene <- dplyr::rename(df_gene,
Adi_Sub="Adipose_Subcutaneous",
Brain_Cer="Brain_Cerebellum",
Adi_Vis="Adipose_Visceral_Omentum",
Blood="Whole_Blood")
df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]
#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
genename combined_pip Liver
7022 PSRC1 1.0000000 1.0000000
9062 ST3GAL4 1.0000000 1.0000000
3979 HPR 0.9999998 0.9999998
4188 INSIG2 0.9999958 0.9999958
44 ABCG8 0.9999660 0.9999660
2990 FADS1 0.9998572 0.9998572
8053 RP4-781K5.7 0.9997364 0.9997364
17 ABCA1 0.9956684 0.9956684
9738 TNKS 0.9915320 0.9915320
3423 GAS6 0.9888998 0.9888998
6672 PLTP 0.9885315 0.9885315
6913 PRKD2 0.9867185 0.9867185
4174 INHBB 0.9832856 0.9832856
8167 RRBP1 0.9819053 0.9819053
1874 CNIH4 0.9787252 0.9787252
8916 SP4 0.9781344 0.9781344
2086 CSNK1G3 0.9759584 0.9759584
3368 FUT2 0.9674812 0.9674812
2263 CYP2A6 0.9643766 0.9643766
9870 TRIM39 0.9605468 0.9605468
4395 KDSR 0.9578772 0.9578772
5972 NPC1L1 0.9548612 0.9548612
2397 DDX56 0.9494642 0.9494642
9996 TTC39B 0.9414293 0.9414293
6425 PELO 0.9388155 0.9388155
9035 SRRT 0.9383759 0.9383759
6583 PKN3 0.9376624 0.9376624
1009 C10orf88 0.9360721 0.9360721
9091 STAT5B 0.9310376 0.9310376
195 ACVR1C 0.9285980 0.9285980
10193 USP1 0.8941238 0.8941238
4483 KLHDC7A 0.8301423 0.8301423
6761 POP7 0.8209688 0.8209688
9009 SPTY2D1 0.8208409 0.8208409
2032 CRACR2B 0.8079877 0.8079877
9235 SYTL1 0.8069655 0.8069655
405 ALLC 0.8068337 0.8068337
Compare with Case 10 (Liver+Blood with 3 additional tissues).
results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case20")
weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Whole_Blood.db"
weight <- unlist(strsplit(weight, ";"))
#load information for all genes
gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())
for (i in 1:length(weight)){
sqlite <- RSQLite::dbDriver("SQLite")
db = RSQLite::dbConnect(sqlite, weight[i])
query <- function(...) RSQLite::dbGetQuery(db, ...)
gene_info_current <- query("select gene, genename, gene_type from extra")
RSQLite::dbDisconnect(db)
gene_info_current$weight <- weight[i]
gene_info <- rbind(gene_info, gene_info_current)
}
gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})
gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")
#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))
#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")
#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))
sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))
#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size
#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)
#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- ctwas_gene_res$type
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)
ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])
#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z
z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]
#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA
ctwas_res <- rbind(ctwas_gene_res,
ctwas_snp_res[,colnames(ctwas_gene_res)])
#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)
#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])
library(ggplot2)
library(cowplot)
load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))
#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument
#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec
#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]
#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation
#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))
title_size <- 12
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi)) +
ggtitle("Proportion Causal") +
theme_cowplot()
p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(sigma^2)) +
ggtitle("Effect Size") +
theme_cowplot()
p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(h^2[G])) +
ggtitle("PVE") +
theme_cowplot()
p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
ggtitle("Enrichment") +
theme_cowplot()
p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
plot_grid(p_pi, p_sigma2, p_enrich, p_pve)
####################
#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
SNP Liver Whole_Blood
0.0001136417 0.0090445205 0.0122175090
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
SNP Liver Whole_Blood
14.320712 43.766484 6.772068
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 49.06246
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
SNP Liver Whole_Blood
8696600 11003 11198
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
SNP Liver Whole_Blood
0.041188150 0.012675311 0.002696279
#total PVE
sum(estimated_group_pve)
[1] 0.05655974
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
SNP Liver Whole_Blood
0.72822383 0.22410483 0.04767134
#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
genename group region_tag susie_pip mu2 PVE
4435 PSRC1 Liver 1_67 1.0000000 1654.26995 4.814228e-03
2454 ST3GAL4 Liver 11_77 1.0000000 176.24213 5.128968e-04
20880 PCSK9 Whole_Blood 1_34 1.0000000 178.56031 5.196432e-04
12008 HPR Liver 16_38 0.9999982 157.93071 4.596064e-04
3721 INSIG2 Liver 2_69 0.9999908 67.59930 1.967245e-04
5991 FADS1 Liver 11_34 0.9995077 162.87624 4.737663e-04
5563 ABCG8 Liver 2_27 0.9979773 310.92446 9.030169e-04
12687 RP4-781K5.7 Liver 1_122 0.9969687 202.12809 5.864466e-04
7410 ABCA1 Liver 9_53 0.9932795 70.38202 2.034480e-04
1597 PLTP Liver 20_28 0.9852710 62.01770 1.778245e-04
8531 TNKS Liver 8_12 0.9847984 76.45288 2.191096e-04
9390 GAS6 Liver 13_62 0.9810190 71.38951 2.038131e-04
19842 ACP6 Whole_Blood 1_73 0.9774534 21.94703 6.242981e-05
3755 RRBP1 Liver 20_13 0.9774019 32.22010 9.164743e-05
1999 PRKD2 Liver 19_33 0.9753710 29.98132 8.510223e-05
7040 INHBB Liver 2_70 0.9701027 74.11796 2.092481e-04
6093 CSNK1G3 Liver 5_75 0.9546182 84.07065 2.335578e-04
6391 TTC39B Liver 9_13 0.9444939 23.27926 6.398654e-05
2092 SP4 Liver 7_19 0.9443983 101.96080 2.802262e-04
4704 DDX56 Liver 7_32 0.9256831 58.82731 1.584753e-04
21823 CD163L1 Whole_Blood 12_7 0.9110450 21.60008 5.726846e-05
8865 FUT2 Liver 19_33 0.9087167 110.44874 2.920852e-04
10657 TRIM39 Liver 6_26 0.9071329 81.96315 2.163764e-04
23057 ZFP28 Whole_Blood 19_38 0.8968490 25.39396 6.627810e-05
6220 PELO Liver 5_31 0.8957423 70.45702 1.836655e-04
3562 ACVR1C Liver 2_94 0.8952993 24.97910 6.508267e-05
11790 CYP2A6 Liver 19_30 0.8922077 33.52620 8.705036e-05
6957 USP1 Liver 1_39 0.8863041 254.57642 6.566308e-04
17963 GALNT6 Whole_Blood 12_33 0.8707524 22.33978 5.661010e-05
1114 SRRT Liver 7_62 0.8536980 32.62326 8.104980e-05
23737 SIPA1 Whole_Blood 11_36 0.8534699 25.15027 6.246709e-05
16404 CSE1L Whole_Blood 20_30 0.8434335 22.93780 5.630188e-05
16092 GPAM Whole_Blood 10_70 0.8265858 21.62521 5.201977e-05
9072 SPTY2D1 Liver 11_13 0.8263825 33.10832 7.962299e-05
6100 ALLC Liver 2_2 0.8241241 27.96015 6.705827e-05
z
4435 -41.687336
2454 13.376072
20880 23.237813
12008 -17.962770
3721 -8.982702
5991 12.926351
5563 -20.293982
12687 -15.108415
7410 7.982017
1597 -5.732491
8531 11.038564
9390 -8.923688
19842 4.624653
3755 7.008305
1999 5.072217
7040 -8.518936
6093 9.116291
6391 -4.334495
2092 10.693191
4704 9.641861
21823 -4.653700
8865 -11.927107
10657 8.840164
23057 -5.205678
6220 8.288398
3562 -4.687370
11790 5.407028
6957 16.258211
17963 -4.608245
1114 5.424996
23737 -5.439965
16404 -4.739463
16092 4.181558
9072 -5.557123
6100 4.919066
#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")
#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]
#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")
df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names
for (i in 1:nrow(df_gene)){
gene <- df_gene$genename[i]
ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}
df_gene <- cbind(df_gene, df_gene_pips)
#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]
#abbreviate column names
df_gene <- dplyr::rename(df_gene,
Adi_Sub="Adipose_Subcutaneous",
Brain_Cer="Brain_Cerebellum",
Adi_Vis="Adipose_Visceral_Omentum",
Blood="Whole_Blood")
df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]
#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
genename combined_pip Liver Blood
249 ACVR1C 1.2437269 0.895299344 3.484276e-01
11601 ST3GAL4 1.1517444 0.999999998 1.517444e-01
9031 PSRC1 1.0564981 1.000000000 5.649806e-02
211 ACP6 1.0267116 0.049258251 9.774534e-01
3837 FADS1 1.0160517 0.999507736 1.654400e-02
2417 CNIH4 1.0131745 0.533572074 4.796024e-01
5416 INSIG2 1.0081061 0.999990774 8.115310e-03
5138 HPR 1.0073982 0.999998172 7.399991e-03
5673 KDSR 1.0041422 0.450433471 5.537087e-01
8212 PCSK9 1.0000000 NA 1.000000e+00
23 ABCA1 0.9987345 0.993279521 5.454970e-03
55 ABCG8 0.9979773 0.997977322 NA
10312 RP4-781K5.7 0.9969687 0.996968698 NA
10452 RRBP1 0.9935015 0.977401878 1.609958e-02
8603 PLTP 0.9928058 0.985270976 7.534812e-03
8895 PRKD2 0.9895863 0.975370996 1.421529e-02
12439 TNKS 0.9847984 0.984798410 NA
4400 GAS6 0.9810190 0.981018990 NA
5398 INHBB 0.9782941 0.970102709 8.191355e-03
1306 C10orf88 0.9782659 0.656054885 3.222110e-01
2690 CSNK1G3 0.9776905 0.954618180 2.307231e-02
12766 TTC39B 0.9727619 0.944493892 2.826805e-02
8492 PKN3 0.9686191 0.279985274 6.886339e-01
11640 STAT5B 0.9676508 0.702232117 2.654187e-01
11421 SP4 0.9626436 0.944398344 1.824530e-02
973 ATP1B2 0.9608670 0.444900964 5.159661e-01
3085 DDX56 0.9485890 0.925683137 2.290588e-02
5027 HIST1H2BH 0.9329614 0.439424553 4.935369e-01
13500 ZFP28 0.9243451 0.027496082 8.968490e-01
1914 CD163L1 0.9209132 0.009868164 9.110450e-01
13031 USP1 0.9120136 0.886304145 2.570941e-02
8294 PELO 0.9117587 0.895742313 1.601634e-02
4325 FUT2 0.9087167 0.908716731 NA
12610 TRIM39 0.9071329 0.907132864 4.381281e-08
11531 SPTY2D1 0.8929224 0.826382465 6.653990e-02
2930 CYP2A6 0.8922077 0.892207662 NA
12500 TP53INP1 0.8729546 0.370714877 5.022397e-01
4381 GALNT6 0.8707524 NA 8.707524e-01
10923 SIPA1 0.8552324 0.001762444 8.534699e-01
2675 CSE1L 0.8540968 0.010663276 8.434335e-01
11566 SRRT 0.8536980 0.853698022 NA
11808 SYTL1 0.8273307 0.676328349 1.510024e-01
4660 GPAM 0.8265858 NA 8.265858e-01
509 ALLC 0.8241241 0.824124088 NA
2618 CRACR2B 0.8120701 0.539118602 2.729515e-01
12969 UGT2B17 0.8082791 0.032529550 7.757495e-01
3971 FAM3D 0.8078808 0.791335475 1.654529e-02
1863 CCND2 0.8071777 0.785894977 2.128269e-02
7360 N4BP2L2 0.8019320 0.415035736 3.868962e-01
Compare with Case 20 (Liver+Blood with separate variance parameter).
results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case22")
weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Whole_Blood.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db"
weight <- unlist(strsplit(weight, ";"))
#load information for all genes
gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())
for (i in 1:length(weight)){
sqlite <- RSQLite::dbDriver("SQLite")
db = RSQLite::dbConnect(sqlite, weight[i])
query <- function(...) RSQLite::dbGetQuery(db, ...)
gene_info_current <- query("select gene, genename, gene_type from extra")
RSQLite::dbDisconnect(db)
gene_info_current$weight <- weight[i]
gene_info <- rbind(gene_info, gene_info_current)
}
gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})
gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")
#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))
#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")
#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))
sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))
#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size
#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)
#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- ctwas_gene_res$type
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)
ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])
#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z
z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]
#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA
ctwas_res <- rbind(ctwas_gene_res,
ctwas_snp_res[,colnames(ctwas_gene_res)])
#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)
#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])
library(ggplot2)
library(cowplot)
load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))
#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument
#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec
#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]
#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation
#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))
title_size <- 12
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi)) +
ggtitle("Proportion Causal") +
theme_cowplot()
p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(sigma^2)) +
ggtitle("Effect Size") +
theme_cowplot()
p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(h^2[G])) +
ggtitle("PVE") +
theme_cowplot()
p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))
df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
Adipose_Subcutaneous="Adi_Sub",
Brain_Cerebellum="Brain_Cer",
Adipose_Visceral_Omentum="Adi_Vis",
Whole_Blood="Blood")
p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
geom_line(aes(color=group)) +
geom_point(aes(color=group)) +
xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
ggtitle("Enrichment") +
theme_cowplot()
p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) +
expand_limits(y=0) +
guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))
plot_grid(p_pi, p_sigma2, p_enrich, p_pve)
####################
#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
SNP Liver Whole_Blood
0.0000879571 0.0065084856 0.0049335631
Adipose_Subcutaneous
0.0150904720
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
SNP Liver Whole_Blood
15.552010 50.448898 15.755890
Adipose_Subcutaneous
5.596842
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 84.25444
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
SNP Liver Whole_Blood
8696600 11003 11198
Adipose_Subcutaneous
13082
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
SNP Liver Whole_Blood
0.03462001 0.01051388 0.00253317
Adipose_Subcutaneous
0.00321544
#total PVE
sum(estimated_group_pve)
[1] 0.0508825
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
SNP Liver Whole_Blood
0.68039130 0.20663057 0.04978470
Adipose_Subcutaneous
0.06319342
#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
genename group region_tag susie_pip mu2
20880 PCSK9 Whole_Blood 1_34 1.0000000 292.81506
4435 PSRC1 Liver 1_67 1.0000000 1671.58010
2454 ST3GAL4 Liver 11_77 1.0000000 173.29542
12008 HPR Liver 16_38 0.9999978 157.54089
3721 INSIG2 Liver 2_69 0.9999907 68.23259
32382 ZDHHC7 Adipose_Subcutaneous 16_49 0.9994285 27.51062
5991 FADS1 Liver 11_34 0.9986248 163.38377
5563 ABCG8 Liver 2_27 0.9968934 312.70648
12687 RP4-781K5.7 Liver 1_122 0.9958201 202.37447
7410 ABCA1 Liver 9_53 0.9920842 70.76914
16865 LDLR Whole_Blood 19_9 0.9860500 588.43695
8531 TNKS Liver 8_12 0.9819120 77.98310
9390 GAS6 Liver 13_62 0.9793993 73.98876
26340 TPD52 Adipose_Subcutaneous 8_57 0.9779838 20.92862
1597 PLTP Liver 20_28 0.9744734 57.95682
1999 PRKD2 Liver 19_33 0.9734753 29.94248
19842 ACP6 Whole_Blood 1_73 0.9731059 24.71050
3755 RRBP1 Liver 20_13 0.9665073 32.57819
7040 INHBB Liver 2_70 0.9639261 74.28462
31260 ABCA8 Adipose_Subcutaneous 17_39 0.9491281 27.01222
6093 CSNK1G3 Liver 5_75 0.9447775 83.55330
2092 SP4 Liver 7_19 0.9320244 102.37847
6391 TTC39B Liver 9_13 0.9085958 23.78221
27865 CCNJ Adipose_Subcutaneous 10_61 0.9068170 20.47059
27536 FCGRT Adipose_Subcutaneous 19_34 0.9052147 20.64055
8865 FUT2 Liver 19_33 0.9035945 72.26140
23057 ZFP28 Whole_Blood 19_38 0.8933481 28.57149
27650 PLPPR2 Adipose_Subcutaneous 19_11 0.8902840 26.71673
10657 TRIM39 Liver 6_26 0.8841069 82.54485
6957 USP1 Liver 1_39 0.8764910 256.20169
4704 DDX56 Liver 7_33 0.8744278 57.99635
21823 CD163L1 Whole_Blood 12_7 0.8729646 24.16043
28984 CCDC92 Adipose_Subcutaneous 12_75 0.8698630 25.72447
11790 CYP2A6 Liver 19_30 0.8687866 33.93950
34082 PROCA1 Adipose_Subcutaneous 17_18 0.8682486 26.72503
3562 ACVR1C Liver 2_94 0.8599134 25.30942
6100 ALLC Liver 2_2 0.8192854 28.37379
PVE z
20880 8.521454e-04 23.237813
4435 4.864604e-03 -41.687336
2454 5.043214e-04 13.376072
12008 4.584718e-04 -17.962770
3721 1.985675e-04 -8.982702
32382 8.001518e-05 -5.184802
5991 4.748228e-04 12.926351
5563 9.072060e-04 -20.293982
12687 5.864850e-04 -15.108415
7410 2.043209e-04 7.982017
16865 1.688570e-03 -24.707322
8531 2.228401e-04 11.038564
9390 2.108851e-04 -8.923688
26340 5.956519e-05 -4.684363
1597 1.643595e-04 -5.732491
1999 8.482679e-05 5.072217
19842 6.997806e-05 4.624653
3755 9.163311e-05 7.008305
7040 2.083833e-04 -8.518936
31260 7.461145e-05 4.800447
6093 2.297277e-04 9.116291
2092 2.776874e-04 10.693191
6391 6.288445e-05 -4.334495
27865 5.402197e-05 -4.639318
27536 5.437423e-05 -4.347956
8865 1.900204e-04 -11.927107
23057 7.428034e-05 -5.205678
27650 6.922009e-05 3.965665
10657 2.123807e-04 8.840164
6957 6.535062e-04 16.258211
4704 1.475859e-04 9.641861
21823 6.137925e-05 -4.653700
28984 6.512047e-05 -5.328046
11790 8.581020e-05 5.407028
34082 6.752780e-05 5.543282
3562 6.333696e-05 -4.687370
6100 6.765079e-05 4.919066
#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")
#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]
#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")
df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names
for (i in 1:nrow(df_gene)){
gene <- df_gene$genename[i]
ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}
df_gene <- cbind(df_gene, df_gene_pips)
#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]
#abbreviate column names
df_gene <- dplyr::rename(df_gene,
Adi_Sub="Adipose_Subcutaneous",
Brain_Cer="Brain_Cerebellum",
Adi_Vis="Adipose_Visceral_Omentum",
Blood="Whole_Blood")
df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]
#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
genename combined_pip Adi_Sub Liver Blood
3728 DNAJC13 1.8233864 3.751840e-01 0.797081476 6.511209e-01
308 ACVR1C 1.3061499 3.399432e-01 0.859913406 1.062933e-01
13442 ST3GAL4 1.0995558 3.619309e-02 0.999999995 6.336276e-02
263 ACP6 1.0804720 7.446485e-02 0.032901279 9.731059e-01
1499 C10orf88 1.0418891 1.942758e-01 0.659265349 1.883480e-01
2761 CNIH4 1.0399495 4.433321e-02 0.433189780 5.624265e-01
10334 PSRC1 1.0346903 2.945515e-02 1.000000000 5.235121e-03
15528 ZDHHC7 1.0308369 9.994285e-01 0.016659709 1.474871e-02
5042 GAS6 1.0208991 4.149979e-02 0.979399332 NA
6164 INHBB 1.0208851 5.443195e-02 0.963926139 2.526998e-03
5873 HPR 1.0169972 1.496283e-02 0.999997776 2.036583e-03
6184 INSIG2 1.0150037 1.229211e-02 0.999990661 2.720936e-03
6494 KDSR 1.0149336 4.049752e-01 0.303609427 3.063490e-01
3065 CSNK1G3 1.0138495 6.282587e-02 0.944777517 6.246064e-03
4386 FADS1 1.0041538 NA 0.998624768 5.529022e-03
9526 PELO 1.0007792 1.978634e-01 0.794936008 7.979861e-03
9432 PCSK9 1.0000005 4.984591e-07 NA 1.000000e+00
64 ABCG8 0.9968934 NA 0.996893377 NA
11975 RP4-781K5.7 0.9958201 NA 0.995820088 NA
9741 PKN3 0.9942721 1.506616e-01 0.138908742 7.047018e-01
28 ABCA1 0.9937233 NA 0.992084204 1.639123e-03
12136 RRBP1 0.9921614 2.120357e-02 0.966507276 4.450546e-03
14458 TPD52 0.9920075 9.779838e-01 0.005949712 8.073981e-03
9859 PLTP 0.9891232 1.235242e-02 0.974473374 2.297402e-03
13364 SPTY2D1 0.9877175 4.873760e-01 0.477324101 2.301736e-02
6833 LDLR 0.9860500 NA NA 9.860500e-01
10183 PRKD2 0.9833101 3.891769e-03 0.973475264 5.943076e-03
14384 TNKS 0.9819120 NA 0.981912022 NA
13239 SP4 0.9765264 3.908866e-02 0.932024394 5.413332e-03
13486 STAT5B 0.9720516 2.707278e-02 0.693328729 2.516501e-01
15546 ZFP28 0.9592919 4.225783e-02 0.023685899 8.933481e-01
14735 TTC39B 0.9591090 4.316469e-02 0.908595812 7.348478e-03
13401 SRRT 0.9537042 4.063502e-01 0.547353993 NA
34 ABCA8 0.9491281 9.491281e-01 NA NA
2143 CCND2 0.9464626 7.280609e-01 0.212919523 5.482203e-03
15036 USP1 0.9405275 5.117330e-02 0.876490973 1.286323e-02
5743 HIST1H2BH 0.9250086 NA 0.520901943 4.041067e-01
12673 SIPA1 0.9144629 6.702811e-01 0.001238984 2.429428e-01
2194 CD163L1 0.9134944 3.478295e-02 0.005746864 8.729646e-01
2154 CCNJ 0.9110941 9.068170e-01 0.004277143 NA
4703 FCGRT 0.9103701 9.052147e-01 0.002341236 2.814081e-03
6626 KLHDC7A 0.9076986 6.265206e-01 0.281177999 NA
4950 FUT2 0.9035945 NA 0.903594458 NA
3528 DDX56 0.8992934 2.137085e-02 0.874427779 3.494761e-03
9852 PLPPR2 0.8953828 8.902840e-01 0.005078100 2.068699e-05
604 ALLC 0.8885037 6.921824e-02 0.819285433 NA
2106 CCDC92 0.8847019 8.698630e-01 0.012579076 2.259762e-03
14568 TRIM39 0.8841070 9.649856e-08 0.884106857 1.416383e-08
9981 POR 0.8791576 7.542258e-01 0.004268438 1.206634e-01
9414 PCMTD2 0.8717502 7.520134e-01 0.064784069 5.495274e-02
10201 PROCA1 0.8717494 8.682486e-01 0.003317855 1.829653e-04
7505 LRRK2 0.8701613 6.485233e-02 0.794672124 1.063684e-02
3355 CYP2A6 0.8688247 3.808170e-05 0.868786620 NA
3299 CWF19L1 0.8514524 3.546988e-01 0.290951212 2.058024e-01
13215 SORCS2 0.8423335 7.493912e-01 0.084454568 8.487747e-03
7120 LINC01184 0.8361409 5.554749e-01 0.196860294 8.380579e-02
14019 TIMD4 0.8280940 3.212977e-02 NA 7.959643e-01
14447 TP53INP1 0.8262980 3.793612e-02 0.410230811 3.781311e-01
13673 SYTL1 0.8260800 4.360640e-02 0.688624046 9.384954e-02
1431 BRI3 0.8056284 5.803202e-02 0.737416455 1.017996e-02
3050 CSE1L 0.8042657 3.416089e-02 0.005141047 7.649638e-01
Comparison of with Case 3 (Liver + Adipose with separate parameters) with Case 9 (Liver only) and Case 10 (Adipose only)
####################
#union of separate analyses
liver_and_adipose_genes_union <- union(liver_genes, adipose_genes)
#genes found in both liver and adipose analzyed separately
intersect(liver_genes, adipose_genes)
[1] "KDSR" "PELO" "SRRT" "PKN3" "C10orf88" "ACVR1C" "KLHDC7A"
[8] "SPTY2D1"
#genes found only in liver
liver_genes[!(liver_genes %in% adipose_genes)]
[1] "PSRC1" "ST3GAL4" "HPR" "INSIG2" "ABCG8"
[6] "FADS1" "RP4-781K5.7" "ABCA1" "TNKS" "GAS6"
[11] "PLTP" "PRKD2" "INHBB" "RRBP1" "CNIH4"
[16] "SP4" "CSNK1G3" "FUT2" "CYP2A6" "TRIM39"
[21] "NPC1L1" "DDX56" "TTC39B" "STAT5B" "USP1"
[26] "POP7" "CRACR2B" "SYTL1" "ALLC"
#genes found only in adipose
adipose_genes[!(adipose_genes %in% liver_genes)]
[1] "ZDHHC7" "TPD52" "POR" "ABCA8" "SIPA1"
[6] "PLPPR2" "FAM117B" "CCNJ" "PCMTD2" "CCDC92"
[11] "FCGRT" "CCND2" "SPHK2" "NTN5" "IL1RN"
[16] "AC007950.2" "CWF19L1" "TMED4" "PROCA1" "MPND"
[21] "KCNK3" "SCD" "PDHB" "SORCS2" "C2CD4A"
[26] "PCSK9" "ALDH16A1" "LINC01184" "VAMP5" "PARP9"
[31] "UBASH3B" "CTSH" "ELF1" "GPAM"
####################
#genes found in both separate and joint analyses
liver_and_adipose_genes[liver_and_adipose_genes %in% liver_and_adipose_genes_union ]
[1] "ACVR1C" "ST3GAL4" "PSRC1" "C10orf88" "CNIH4"
[6] "GAS6" "INHBB" "ZDHHC7" "HPR" "CSNK1G3"
[11] "INSIG2" "KDSR" "CYP2A6" "FADS1" "PELO"
[16] "RRBP1" "RP4-781K5.7" "ABCG8" "ABCA1" "PLTP"
[21] "PRKD2" "TPD52" "TNKS" "SPTY2D1" "SRRT"
[26] "STAT5B" "SP4" "TTC39B" "KLHDC7A" "ABCA8"
[31] "SIPA1" "CCND2" "PKN3" "PLPPR2" "USP1"
[36] "CCDC92" "FCGRT" "CCNJ" "ALLC" "CWF19L1"
[41] "FUT2" "PROCA1" "PCMTD2" "POR" "DDX56"
[46] "TRIM39" "FAM117B" "SORCS2" "SPHK2" "SYTL1"
[51] "LINC01184" "SCD" "IL1RN" "NTN5"
#genes found only in joint analysis
liver_and_adipose_genes[!(liver_and_adipose_genes %in% liver_and_adipose_genes_union )]
[1] "BRI3"
#genes found only in separate analyses
liver_and_adipose_genes_union[!(liver_and_adipose_genes_union %in% liver_and_adipose_genes)]
[1] "NPC1L1" "POP7" "CRACR2B" "AC007950.2" "TMED4"
[6] "MPND" "KCNK3" "PDHB" "C2CD4A" "PCSK9"
[11] "ALDH16A1" "VAMP5" "PARP9" "UBASH3B" "CTSH"
[16] "ELF1" "GPAM"
####################
save.image("multigroup_image.RData")
load("multigroup_image.RData")
library(VennDiagram)
Loading required package: grid
Loading required package: futile.logger
term_gene_list <- list(liver=liver_genes, combined=liver_and_adipose_genes)
names(term_gene_list) <- c("Liver\nOnly", "Liver and\nAdipose")
v0 <- venn.diagram( rev(term_gene_list), filename=NULL, disable.logging=T,
fill = c("red", "blue"),
alpha = 0.50,
col = "transparent",
cex=1)
INFO [2023-03-21 14:32:41] [[1]]
INFO [2023-03-21 14:32:41] rev(term_gene_list)
INFO [2023-03-21 14:32:41]
INFO [2023-03-21 14:32:41] $filename
INFO [2023-03-21 14:32:41] NULL
INFO [2023-03-21 14:32:41]
INFO [2023-03-21 14:32:41] $disable.logging
INFO [2023-03-21 14:32:41] T
INFO [2023-03-21 14:32:41]
INFO [2023-03-21 14:32:41] $fill
INFO [2023-03-21 14:32:41] c("red", "blue")
INFO [2023-03-21 14:32:41]
INFO [2023-03-21 14:32:41] $alpha
INFO [2023-03-21 14:32:41] [1] 0.5
INFO [2023-03-21 14:32:41]
INFO [2023-03-21 14:32:41] $col
INFO [2023-03-21 14:32:41] [1] "transparent"
INFO [2023-03-21 14:32:41]
INFO [2023-03-21 14:32:41] $cex
INFO [2023-03-21 14:32:41] [1] 1
INFO [2023-03-21 14:32:41]
grid.newpage()
grid.draw(v0)
Version | Author | Date |
---|---|---|
903d30e | wesleycrouse | 2023-02-04 |
pdf(file = "output/SBP_Venn.pdf", width = 3, height = 3)
grid.newpage()
grid.draw(v0)
dev.off()
png
2
library(readxl)
known_annotations <- read_xlsx("data/summary_known_genes_annotations.xlsx", sheet="LDL")
New names:
• `` -> `...4`
• `` -> `...5`
known_annotations <- unique(known_annotations$`Gene Symbol`)
#number of silver standard genes in liver genes
sum(liver_genes %in% known_annotations)
[1] 6
#number of silver standard genes in liver+adipose genes
sum(liver_and_adipose_genes %in% known_annotations)
[1] 5