Last updated: 2023-03-21

Checks: 6 1

Knit directory: ctwas_applied/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20210726) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version fe2dd30. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Untracked files:
    Untracked:  LDL_all_tissue_PIPs.csv
    Untracked:  gwas.RData
    Untracked:  ld_R_info.RData
    Untracked:  z_snp_pos_ebi-a-GCST004131.RData
    Untracked:  z_snp_pos_ebi-a-GCST004132.RData
    Untracked:  z_snp_pos_ebi-a-GCST004133.RData
    Untracked:  z_snp_pos_scz-2018.RData
    Untracked:  z_snp_pos_ukb-a-360.RData
    Untracked:  z_snp_pos_ukb-d-30780_irnt.RData

Unstaged changes:
    Modified:   analysis/multigroup_testing.Rmd
    Modified:   code/automate_Rmd.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/multigroup_testing.Rmd) and HTML (docs/multigroup_testing.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd fe2dd30 wesleycrouse 2023-03-21 all tissue results!
Rmd 903d30e wesleycrouse 2023-02-04 removing duplicated section
html 903d30e wesleycrouse 2023-02-04 removing duplicated section
Rmd c688c02 wesleycrouse 2023-02-04 updating reports
html c688c02 wesleycrouse 2023-02-04 updating reports
Rmd 88f9698 wesleycrouse 2023-02-01 multigroup simulation results
html 88f9698 wesleycrouse 2023-02-01 multigroup simulation results
Rmd 6596a6d wesleycrouse 2023-01-19 adjusting SBP figure and adding Venn
Rmd a1b9876 wesleycrouse 2023-01-18 fixing error with combined pips
html a1b9876 wesleycrouse 2023-01-18 fixing error with combined pips
Rmd c03b0a4 wesleycrouse 2023-01-10 separate vs individual analyses
html c03b0a4 wesleycrouse 2023-01-10 separate vs individual analyses
Rmd 05269c3 wesleycrouse 2023-01-10 gene-level results for all cases
html 05269c3 wesleycrouse 2023-01-10 gene-level results for all cases
Rmd e2d92de wesleycrouse 2023-01-09 adding gene-level results
html 32cc717 wesleycrouse 2023-01-08 fixing case 9+10
Rmd e7e2b73 wesleycrouse 2023-01-06 more results
html e7e2b73 wesleycrouse 2023-01-06 more results
Rmd 9c0d0e7 wesleycrouse 2023-01-06 adding more plots
html 9c0d0e7 wesleycrouse 2023-01-06 adding more plots
Rmd 70f3460 wesleycrouse 2023-01-05 names in legends
html 70f3460 wesleycrouse 2023-01-05 names in legends
Rmd 9869b19 wesleycrouse 2023-01-05 tinkering with figure legend
html 9869b19 wesleycrouse 2023-01-05 tinkering with figure legend
Rmd d57dfc3 wesleycrouse 2023-01-05 trying to fix legends
html d57dfc3 wesleycrouse 2023-01-05 trying to fix legends
html 93168e6 wesleycrouse 2023-01-05 fixing plots?
Rmd 18962d7 wesleycrouse 2023-01-04 updating results
html 18962d7 wesleycrouse 2023-01-04 updating results
Rmd d8772b8 wesleycrouse 2022-12-20 updating multitrait report
html d8772b8 wesleycrouse 2022-12-20 updating multitrait report
Rmd 506b1ae wesleycrouse 2022-12-16 added case 5
html 506b1ae wesleycrouse 2022-12-16 added case 5
Rmd 356cb9b wesleycrouse 2022-12-15 multigroup testing
html 356cb9b wesleycrouse 2022-12-15 multigroup testing
Rmd 9e7d4cd wesleycrouse 2022-12-15 multigroup testing results
html 9e7d4cd wesleycrouse 2022-12-15 multigroup testing results

analysis_id <- "ukb-d-30780_irnt_multigroup"
trait_id <- "ukb-d-30780_irnt"

traits <- read.csv("/project2/mstephens/wcrouse/UKB_analysis_known_anno/ukbb_neale_v3_known_annotations.csv", head=F)
colnames(traits) <- c("trait_name", "ieu_id", "weight")

trait_name <- traits$trait_name[match(trait_id, traits$ieu_id)]

source("/project2/mstephens/wcrouse/ctwas_multigroup_testing/ctwas_config.R")

Case 1 - Liver and Adipose with shared parameters using old version

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case1")


#load information for all genes
sqlite <- RSQLite::dbDriver("SQLite")
db = RSQLite::dbConnect(sqlite, "/project2/mstephens/wcrouse/predictdb/mashr_Liver_Adipose_Subcutaneous.db")
query <- function(...) RSQLite::dbGetQuery(db, ...)
gene_info <- query("select gene, genename, gene_type from extra")
RSQLite::dbDisconnect(db)

#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$id, match, gene_info$gene), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

group_size <- c(nrow(ctwas_gene_res), n_snps)

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
rownames(estimated_group_prior_all) <- c("gene", "snp")
estimated_group_prior_all["snp",] <- estimated_group_prior_all["snp",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec
rownames(estimated_group_prior_var_all) <- c("gene", "snp")

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation
rownames(estimated_group_pve_all) <- c("gene", "snp")

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- estimated_group_prior_all["gene",]/estimated_group_prior_all["snp",]
  
title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), 2),
                 value = c(estimated_group_prior_all["gene",], estimated_group_prior_all["snp",]),
                 group = rep(c("Gene", "SNP"), each = ncol(estimated_group_prior_all)))
df$group <- as.factor(df$group)

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all ), 2),
                 value = c(estimated_group_prior_var_all["gene",], estimated_group_prior_var_all["snp",]),
                 group = rep(c("Gene", "SNP"), each = ncol(estimated_group_prior_var_all)))
df$group <- as.factor(df$group)
p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all ), 2),
                 value = c(estimated_group_pve_all["gene",], estimated_group_pve_all["snp",]),
                 group = rep(c("Gene", "SNP"), each = ncol(estimated_group_pve_all)))
df$group <- as.factor(df$group)
p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = 1:length(estimated_enrichment_all),
                 value = estimated_enrichment_all,
                 group = rep("Gene", length(estimated_enrichment_all)))
df$group <- as.factor(df$group)
p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
9e7d4cd wesleycrouse 2022-12-15
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
        gene          snp 
0.0087113264 0.0001044473 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
    gene      snp 
25.13189 15.23803 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 83.40404
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
[1]   24085 8696600
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
      gene        snp 
0.01534533 0.04028058 
#total PVE
sum(estimated_group_pve)
[1] 0.05562591
#PVE attributable to gene expression
estimated_group_pve["gene"]/sum(estimated_group_pve)
     gene 
0.2758666 

Case 2 - Liver and Adipose with shared parameters using new version

Compare with Case 1 (should be identical).

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case2")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db"
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation


#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- estimated_group_prior_all[rownames(estimated_group_prior_all)!="SNP",]/estimated_group_prior_all["SNP",]

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), 2),
                 value = c(estimated_group_prior_all["gene",], estimated_group_prior_all["SNP",]),
                 group = rep(c("Gene", "SNP"), each = ncol(estimated_group_prior_all)))
df$group <- as.factor(df$group)

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all ), 2),
                 value = c(estimated_group_prior_var_all["gene",], estimated_group_prior_var_all["SNP",]),
                 group = rep(c("Gene", "SNP"), each = ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all ), 2),
                 value = c(estimated_group_pve_all["gene",], estimated_group_pve_all["SNP",]),
                 group = rep(c("Gene", "SNP"), each = ncol(estimated_group_pve_all)))
df$group <- as.factor(df$group)

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = 1:length(estimated_enrichment_all),
                 value = estimated_enrichment_all,
                 group = rep("Gene", length(estimated_enrichment_all)))
df$group <- as.factor(df$group)
p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
9e7d4cd wesleycrouse 2022-12-15
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
         SNP         gene 
0.0001044473 0.0087113264 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
     SNP     gene 
15.23803 25.13189 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 83.40404
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
    SNP    gene 
8696600   24085 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
       SNP       gene 
0.04028058 0.01534533 
#total PVE
sum(estimated_group_pve)
[1] 0.05562591
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
      SNP      gene 
0.7241334 0.2758666 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
         genename                group region_tag susie_pip        mu2
4435        PSRC1                Liver       1_67 1.0000000 1612.01119
2454      ST3GAL4                Liver      11_77 1.0000000  165.68310
12008         HPR                Liver      16_38 0.9999942  153.90449
3721       INSIG2                Liver       2_69 0.9999909   66.28067
19759      ZDHHC7 Adipose_Subcutaneous      16_49 0.9983868   53.34968
7410        ABCA1                Liver       9_53 0.9908303   68.28114
1999        PRKD2                Liver      19_33 0.9869040   28.94550
12687 RP4-781K5.7                Liver      1_121 0.9841471  196.14753
18314     FAM117B Adipose_Subcutaneous      2_120 0.9839632   44.79118
5991        FADS1                Liver      11_34 0.9822102  157.67595
8531         TNKS                Liver       8_12 0.9786476   73.68656
13717       TPD52 Adipose_Subcutaneous       8_57 0.9779081   24.51858
21750       PCSK9 Adipose_Subcutaneous       1_34 0.9771075  114.45450
3755        RRBP1                Liver      20_13 0.9768543   31.59854
9390         GAS6                Liver      13_62 0.9742838   70.13818
5544        CNIH4                Liver      1_114 0.9707055   39.71047
11790      CYP2A6                Liver      19_28 0.9706249   29.64659
5563        ABCG8                Liver       2_27 0.9704373  299.22046
1597         PLTP                Liver      20_28 0.9675340   57.71148
7040        INHBB                Liver       2_70 0.9572705   71.92939
18637       ABCA8 Adipose_Subcutaneous      17_39 0.9536542   31.81596
6391       TTC39B                Liver       9_13 0.9449779   22.72039
25057       SIPA1 Adipose_Subcutaneous      11_36 0.9428218   24.99558
14001        CETP Adipose_Subcutaneous      16_31 0.9425720  136.68224
8579       STAT5B                Liver      17_25 0.9422519   29.57545
15027      PLPPR2 Adipose_Subcutaneous      19_10 0.9316565   31.65774
16361      CCDC92 Adipose_Subcutaneous      12_75 0.9311410   29.77569
6093      CSNK1G3                Liver       5_75 0.9288269   80.75367
14913       FCGRT Adipose_Subcutaneous      19_34 0.9262251   24.01517
17109         POR Adipose_Subcutaneous       7_48 0.9212913   37.43349
18737        NTN5 Adipose_Subcutaneous      19_33 0.9156691   58.21977
21459      PROCA1 Adipose_Subcutaneous      17_17 0.9126536   30.08013
3562       ACVR1C                Liver       2_94 0.9121658   24.53138
2092          SP4                Liver       7_19 0.8858545   98.11713
4704        DDX56                Liver       7_33 0.8816703   56.95825
15242        CCNJ Adipose_Subcutaneous      10_61 0.8809924   23.73365
5415        SYTL1                Liver       1_19 0.8600926   21.33259
13346       SPHK2 Adipose_Subcutaneous      19_33 0.8540107   38.53180
8865         FUT2                Liver      19_33 0.8537636   62.79914
10657      TRIM39                Liver       6_26 0.8520081   80.34791
6100         ALLC                Liver        2_2 0.8475769   27.29913
24743      PCMTD2 Adipose_Subcutaneous      20_38 0.8454171   33.62083
6957         USP1                Liver       1_39 0.8417188  246.59871
19681        PELO Adipose_Subcutaneous       5_31 0.8319113   70.51117
               PVE          z
4435  4.691248e-03 -41.687336
2454  4.821681e-04  13.376072
12008 4.478876e-04 -17.962770
3721  1.928871e-04  -8.982702
19759 1.550069e-04  -5.184802
7410  1.968885e-04   7.982017
1999  8.313355e-05   5.072217
12687 5.617760e-04 -15.108415
18314 1.282601e-04   7.852653
5991  4.507027e-04  12.926351
8531  2.098625e-04  11.038564
13717 6.977722e-05  -4.684363
21750 3.254584e-04  17.210869
3755  8.982912e-05   7.008305
9390  1.988659e-04  -8.923688
5544  1.121793e-04   6.145535
11790 8.374259e-05   5.407028
5563  8.450435e-04 -20.293982
1597  1.624983e-04  -5.732491
7040  2.003832e-04  -8.518936
18637 8.829910e-05   4.800447
6391  6.248241e-05  -4.334495
25057 6.858246e-05  -5.096493
14001 3.749272e-04  13.814427
8579  8.109959e-05   5.426252
15027 8.583333e-05   3.965665
16361 8.068589e-05  -5.328046
6093  2.182817e-04   9.116291
14913 6.473253e-05  -4.347956
17109 1.003639e-04   6.044322
18737 1.551420e-04  11.132252
21459 7.989250e-05   5.543282
3562  6.512024e-05  -4.687370
2092  2.529458e-04  10.693191
4704  1.461447e-04   9.641861
15242 6.084951e-05  -4.639318
5415  5.339606e-05  -3.962854
13346 9.576414e-05  -8.721460
8865  1.560313e-04 -11.927107
10657 1.992226e-04   8.840164
6100  6.733614e-05   4.919066
24743 8.271796e-05  -5.636854
6957  6.040573e-04  16.258211
19681 1.707085e-04   8.522224

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
         genename combined_pip      Adi_Sub       Liver
290        ACVR1C    1.1790070 2.668412e-01 0.912165756
9618        PSRC1    1.0192558 1.925577e-02 0.999999998
14503      ZDHHC7    1.0165439 9.983868e-01 0.018157138
12546     ST3GAL4    1.0136521 1.365215e-02 0.999999984
1409     C10orf88    1.0080617 4.744040e-01 0.533657630
5464          HPR    1.0048798 4.885572e-03 0.999994231
5750       INSIG2    1.0036327 3.641819e-03 0.999990912
6035         KDSR    0.9941972 4.168373e-01 0.577359933
2582        CNIH4    0.9918587 2.115325e-02 0.970705479
26          ABCA1    0.9908303           NA 0.990830303
13505       TPD52    0.9899198 9.779081e-01 0.012011686
4697         GAS6    0.9890546 1.477084e-02 0.974283754
9473        PRKD2    0.9881499 1.245888e-03 0.986903998
11157 RP4-781K5.7    0.9841471           NA 0.984147075
4114      FAM117B    0.9839632 9.839632e-01          NA
11311       RRBP1    0.9830187 6.164407e-03 0.976854255
4086        FADS1    0.9822102           NA 0.982210163
3125       CYP2A6    0.9798984 9.273536e-03 0.970624882
13436        TNKS    0.9786476           NA 0.978647636
8766        PCSK9    0.9771075 9.771075e-01          NA
12506        SRRT    0.9770000 3.846596e-01 0.592340404
5734        INHBB    0.9765923 1.932175e-02 0.957270545
9167         PLTP    0.9747091 7.175082e-03 0.967534041
61          ABCG8    0.9704373           NA 0.970437255
2316         CETP    0.9677379 9.425720e-01 0.025165929
8854         PELO    0.9664498 8.319113e-01 0.134538453
9055         PKN3    0.9632829 7.259529e-01 0.237330063
12471     SPTY2D1    0.9608711 4.804356e-01 0.480435554
13763      TTC39B    0.9586601 1.368227e-02 0.944977875
12588      STAT5B    0.9563934 1.414152e-02 0.942251926
11812       SIPA1    0.9553612 9.428218e-01 0.012539428
31          ABCA8    0.9536542 9.536542e-01          NA
2864      CSNK1G3    0.9531350 2.430811e-02 0.928826889
6154      KLHDC7A    0.9526815 4.763407e-01 0.476340740
1978       CCDC92    0.9482173 9.311410e-01 0.017076355
2014        CCND2    0.9428045 5.881156e-01 0.354688950
9161       PLPPR2    0.9373686 9.316565e-01 0.005712141
9282          POR    0.9316412 9.212913e-01 0.010349854
4386        FCGRT    0.9305946 9.262251e-01 0.004369478
9491       PROCA1    0.9222839 9.126536e-01 0.009630294
8341         NTN5    0.9156691 9.156691e-01          NA
8749       PCMTD2    0.9144384 8.454171e-01 0.069021260
12353         SP4    0.8982507 1.239613e-02 0.885854545
2025         CCNJ    0.8892103 8.809924e-01 0.008217871
3290        DDX56    0.8887334 7.063122e-03 0.881670323
12766       SYTL1    0.8752351 1.514255e-02 0.860092577
569          ALLC    0.8731669 2.558997e-02 0.847576882
3070      CWF19L1    0.8686406 4.818217e-01 0.386818875
12414       SPHK2    0.8684545 8.540107e-01 0.014443747
14045        USP1    0.8631778 2.145901e-02 0.841718804
4613         FUT2    0.8537636           NA 0.853763637
13607      TRIM39    0.8520081 2.519792e-08 0.852008120
2791      CRACR2B    0.8282271 7.236532e-02 0.755861754
12330      SORCS2    0.8151771 6.825692e-01 0.132607915
1346         BRI3    0.8032291 1.936736e-02 0.783861767

Case 3 - Liver and Adipose with separate parameters

Compare with Case 2 (shared/separate prior parameters).

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case3")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db"
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
70f3460 wesleycrouse 2023-01-05
9869b19 wesleycrouse 2023-01-05
d57dfc3 wesleycrouse 2023-01-05
93168e6 wesleycrouse 2023-01-05
506b1ae wesleycrouse 2022-12-16
356cb9b wesleycrouse 2022-12-15
9e7d4cd wesleycrouse 2022-12-15
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
                 SNP                Liver Adipose_Subcutaneous 
        9.014951e-05         7.943610e-03         1.653321e-02 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
                 SNP                Liver Adipose_Subcutaneous 
           16.908160            48.882570             6.014357 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 126.1567
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
                 SNP                Liver Adipose_Subcutaneous 
             8696600                11003                13082 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
                 SNP                Liver Adipose_Subcutaneous 
         0.038577093          0.012433786          0.003785652 
#total PVE
sum(estimated_group_pve)
[1] 0.05479653
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
                 SNP                Liver Adipose_Subcutaneous 
          0.70400612           0.22690827           0.06908562 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
         genename                group region_tag susie_pip        mu2
4435        PSRC1                Liver       1_67 1.0000000 1669.97618
2454      ST3GAL4                Liver      11_77 1.0000000  170.97506
12008         HPR                Liver      16_38 0.9999974  156.57752
3721       INSIG2                Liver       2_69 0.9999934   68.61282
19759      ZDHHC7 Adipose_Subcutaneous      16_49 0.9995748   28.12889
5991        FADS1                Liver      11_34 0.9993538  163.18332
12687 RP4-781K5.7                Liver      1_121 0.9943723  202.13879
5563        ABCG8                Liver       2_27 0.9943320  311.88467
7410        ABCA1                Liver       9_53 0.9925506   70.70277
1999        PRKD2                Liver      19_33 0.9847824   29.89388
8531         TNKS                Liver       8_12 0.9825262   77.45465
13717       TPD52 Adipose_Subcutaneous       8_57 0.9814059   21.06905
9390         GAS6                Liver      13_62 0.9803514   73.89025
1597         PLTP                Liver      20_28 0.9775544   59.24311
5544        CNIH4                Liver      1_114 0.9743418   41.39386
3755        RRBP1                Liver      20_13 0.9719348   32.36358
11790      CYP2A6                Liver      19_28 0.9693535   29.87336
7040        INHBB                Liver       2_70 0.9651557   74.16214
18637       ABCA8 Adipose_Subcutaneous      17_39 0.9558067   27.28313
25057       SIPA1 Adipose_Subcutaneous      11_36 0.9482173   22.29943
6093      CSNK1G3                Liver       5_75 0.9459871   83.36709
8579       STAT5B                Liver      17_25 0.9418229   30.95090
2092          SP4                Liver       7_19 0.9299248  102.14354
15027      PLPPR2 Adipose_Subcutaneous      19_10 0.9297349   26.56239
6391       TTC39B                Liver       9_13 0.9260892   23.56077
14913       FCGRT Adipose_Subcutaneous      19_34 0.9226574   20.62923
16361      CCDC92 Adipose_Subcutaneous      12_75 0.9184631   25.75399
15242        CCNJ Adipose_Subcutaneous      10_61 0.9161743   20.53548
8865         FUT2                Liver      19_33 0.9059564   70.62362
17109         POR Adipose_Subcutaneous       7_48 0.8988319   31.79763
21459      PROCA1 Adipose_Subcutaneous      17_17 0.8980242   26.41207
3562       ACVR1C                Liver       2_94 0.8864262   25.08450
10657      TRIM39                Liver       6_26 0.8823124   82.45322
4704        DDX56                Liver       7_33 0.8805617   57.91924
6957         USP1                Liver       1_39 0.8781087  255.86929
18314     FAM117B Adipose_Subcutaneous      2_120 0.8638575   36.68997
6100         ALLC                Liver        2_2 0.8446148   28.32175
13346       SPHK2 Adipose_Subcutaneous      19_33 0.8363777   34.76342
24743      PCMTD2 Adipose_Subcutaneous      20_38 0.8314753   28.89377
14273         SCD Adipose_Subcutaneous      10_64 0.8185682   19.92754
18041       IL1RN Adipose_Subcutaneous       2_67 0.8157131   21.52025
18737        NTN5 Adipose_Subcutaneous      19_33 0.8143027   54.80574
               PVE          z
4435  4.859936e-03 -41.687336
2454  4.975687e-04  13.376072
12008 4.556680e-04 -17.962770
3721  1.996746e-04  -8.982702
19759 8.182541e-05  -5.184802
5991  4.745865e-04  12.926351
12687 5.849503e-04 -15.108415
5563  9.024969e-04 -20.293982
7410  2.042252e-04   7.982017
1999  8.567278e-05   5.072217
8531  2.214685e-04  11.038564
13717 6.017471e-05  -4.684363
9390  2.108090e-04  -8.923688
1597  1.685385e-04  -5.732491
5544  1.173728e-04   6.145535
3755  9.154065e-05   7.008305
11790 8.427263e-05   5.407028
7040  2.083051e-04  -8.518936
18637 7.589001e-05   4.800447
25057 6.153497e-05  -5.096493
6093  2.295092e-04   9.116291
8579  8.483262e-05   5.426252
2092  2.764261e-04  10.693191
15027 7.186983e-05   3.965665
6391  6.349836e-05  -4.334495
14913 5.539158e-05  -4.347956
16361 6.883773e-05  -5.328046
15242 5.475241e-05  -4.639318
8865  1.861991e-04 -11.927107
17109 8.317515e-05   6.044322
21459 6.902569e-05   5.543282
3562  6.470955e-05  -4.687370
10657 2.117144e-04   8.840164
4704  1.484236e-04   9.641861
6957  6.538630e-04  16.258211
18314 9.223799e-05   7.852653
6100  6.961439e-05   4.919066
13346 8.461459e-05  -8.721460
24743 6.991557e-05  -5.636854
14273 4.747105e-05  -4.541468
18041 5.108638e-05   4.455379
18737 1.298770e-04  11.132252

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
         genename combined_pip      Adi_Sub       Liver
290        ACVR1C    1.2821451 3.957190e-01 0.886426187
12546     ST3GAL4    1.0408713 4.087127e-02 0.999999993
9618        PSRC1    1.0329403 3.294030e-02 1.000000000
1409     C10orf88    1.0324476 2.351908e-01 0.797256743
2582        CNIH4    1.0263117 5.196981e-02 0.974341842
4697         GAS6    1.0252303 4.487895e-02 0.980351368
5734        INHBB    1.0242873 5.913165e-02 0.965155688
14503      ZDHHC7    1.0232406 9.995748e-01 0.023665871
5464          HPR    1.0166104 1.661304e-02 0.999997354
2864      CSNK1G3    1.0136737 6.768661e-02 0.945987078
5750       INSIG2    1.0133149 1.332146e-02 0.999993445
6035         KDSR    1.0028532 5.540887e-01 0.448764468
3125       CYP2A6    1.0014824 3.212884e-02 0.969353536
4086        FADS1    0.9993538           NA 0.999353840
8854         PELO    0.9979494 2.326841e-01 0.765265259
11311       RRBP1    0.9945557 2.262086e-02 0.971934797
11157 RP4-781K5.7    0.9943723           NA 0.994372303
61          ABCG8    0.9943320           NA 0.994331953
26          ABCA1    0.9925506           NA 0.992550635
9167         PLTP    0.9900735 1.251913e-02 0.977554352
9473        PRKD2    0.9892652 4.482861e-03 0.984782375
13505       TPD52    0.9891353 9.814059e-01 0.007729362
13436        TNKS    0.9825262           NA 0.982526205
12471     SPTY2D1    0.9768431 4.886837e-01 0.488159443
12506        SRRT    0.9757710 4.263893e-01 0.549381704
12588      STAT5B    0.9748353 3.301246e-02 0.941822874
12353         SP4    0.9734963 4.357146e-02 0.929924805
13763      TTC39B    0.9716737 4.558453e-02 0.926089174
6154      KLHDC7A    0.9693547 6.579458e-01 0.311408861
31          ABCA8    0.9558067 9.558067e-01          NA
11812       SIPA1    0.9541853 9.482173e-01 0.005967923
2014        CCND2    0.9525749 7.250596e-01 0.227515311
9055         PKN3    0.9395463 4.786795e-01 0.460866780
9161       PLPPR2    0.9349051 9.297349e-01 0.005170169
14045        USP1    0.9339745 5.586579e-02 0.878108668
1978       CCDC92    0.9324492 9.184631e-01 0.013986025
4386        FCGRT    0.9253876 9.226574e-01 0.002730238
2025         CCNJ    0.9215620 9.161743e-01 0.005387656
569          ALLC    0.9204296 7.581480e-02 0.844614756
3070      CWF19L1    0.9152537 4.971167e-01 0.418137016
4613         FUT2    0.9059564           NA 0.905956407
9491       PROCA1    0.9052901 8.980242e-01 0.007265855
8749       PCMTD2    0.9050256 8.314753e-01 0.073550293
9282          POR    0.9045059 8.988319e-01 0.005673998
3290        DDX56    0.9043148 2.375309e-02 0.880561708
13607      TRIM39    0.8823125 9.626981e-08 0.882312357
4114      FAM117B    0.8638575 8.638575e-01          NA
12330      SORCS2    0.8570663 7.657866e-01 0.091279663
12414       SPHK2    0.8521159 8.363777e-01 0.015738193
12766       SYTL1    0.8431171 4.316092e-02 0.799956216
6610    LINC01184    0.8410288 6.047286e-01 0.236300159
1346         BRI3    0.8302506 6.192655e-02 0.768324031
11479         SCD    0.8185682 8.185682e-01          NA
5688        IL1RN    0.8157131 8.157131e-01          NA
8341         NTN5    0.8143027 8.143027e-01          NA
#store gene list
liver_and_adipose_genes <- df_gene$genename[df_gene$combined_pip>0.8]

Case 5 - Liver and Cerebellum with separate parameters

Compare with Case 4 (Liver + Adipose/Cerebellum)

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case5")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Brain_Cerebellum.db"
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
70f3460 wesleycrouse 2023-01-05
d57dfc3 wesleycrouse 2023-01-05
93168e6 wesleycrouse 2023-01-05
506b1ae wesleycrouse 2022-12-16
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
             SNP            Liver Brain_Cerebellum 
    0.0001060227     0.0099041093     0.0120131986 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
             SNP            Liver Brain_Cerebellum 
        15.42194         44.38276          6.97870 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 81.65379
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
             SNP            Liver Brain_Cerebellum 
         8696600            11003            12270 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
             SNP            Liver Brain_Cerebellum 
      0.04138165       0.01407541       0.00299363 
#total PVE
sum(estimated_group_pve)
[1] 0.05845069
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
             SNP            Liver Brain_Cerebellum 
      0.70797534       0.24080833       0.05121632 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
         genename            group region_tag susie_pip        mu2          PVE
4435        PSRC1            Liver       1_67 1.0000000 1663.38029 4.840741e-03
2454      ST3GAL4            Liver      11_77 1.0000000  171.63510 4.994895e-04
12008         HPR            Liver      16_38 0.9999928  157.37765 4.579945e-04
3721       INSIG2            Liver       2_69 0.9999924   67.94668 1.977358e-04
5991        FADS1            Liver      11_34 0.9994062  162.52352 4.726923e-04
5563        ABCG8            Liver       2_27 0.9967677  310.79801 9.015555e-04
12687 RP4-781K5.7            Liver      1_121 0.9962045  202.36899 5.866955e-04
7410        ABCA1            Liver       9_53 0.9936425   70.41461 2.036166e-04
1999        PRKD2            Liver      19_33 0.9856859   30.10524 8.635767e-05
1597         PLTP            Liver      20_28 0.9853706   60.30472 1.729303e-04
8531         TNKS            Liver       8_12 0.9853313   76.52958 2.194481e-04
9390         GAS6            Liver      13_62 0.9847160   72.58384 2.080038e-04
5544        CNIH4            Liver      1_114 0.9829436   42.23904 1.208267e-04
3755        RRBP1            Liver      20_13 0.9780234   32.12821 9.144418e-05
7040        INHBB            Liver       2_70 0.9711195   74.15764 2.095795e-04
17721        RAC1 Brain_Cerebellum        7_9 0.9663013   30.07597 8.457705e-05
11790      CYP2A6            Liver      19_29 0.9644192   31.75991 8.913852e-05
3247         KDSR            Liver      18_35 0.9631438   24.75544 6.938763e-05
6093      CSNK1G3            Liver       5_75 0.9567889   84.68297 2.357939e-04
18334       ABCA8 Brain_Cerebellum      17_39 0.9544748   23.30104 6.472322e-05
24413       SIPA1 Brain_Cerebellum      11_36 0.9524771   24.52172 6.797134e-05
2092          SP4            Liver       7_19 0.9435848  101.94755 2.799484e-04
3300     C10orf88            Liver      10_77 0.9304447   36.98123 1.001365e-04
4704        DDX56            Liver       7_32 0.9292350   58.58285 1.584223e-04
6391       TTC39B            Liver       9_13 0.9229868   23.53639 6.322016e-05
6778         PKN3            Liver       9_66 0.9203992   46.69957 1.250862e-04
8865         FUT2            Liver      19_33 0.9125557   98.15762 2.606776e-04
18390        P3H4 Brain_Cerebellum      17_25 0.9111093   21.08931 5.591819e-05
3562       ACVR1C            Liver       2_94 0.9071780   24.93987 6.584261e-05
10657      TRIM39            Liver       6_25 0.9064821   81.99548 2.163064e-04
8579       STAT5B            Liver      17_25 0.8983052   26.93111 7.040420e-05
6957         USP1            Liver       1_39 0.8836416  253.55963 6.520435e-04
1114         SRRT            Liver       7_62 0.8723166   32.51974 8.255465e-05
9062      KLHDC7A            Liver       1_13 0.8589053   21.72416 5.430111e-05
9072      SPTY2D1            Liver      11_13 0.8422930   32.59647 7.990135e-05
6100         ALLC            Liver        2_2 0.8393685   27.74235 6.776668e-05
8418         POP7            Liver       7_62 0.8310774   41.08047 9.935672e-05
6220         PELO            Liver       5_31 0.8295163   72.78264 1.757005e-04
21013      PROCA1 Brain_Cerebellum      17_17 0.8127077   29.05054 6.870826e-05
               z
4435  -41.687336
2454   13.376072
12008 -17.962770
3721   -8.982702
5991   12.926351
5563  -20.293982
12687 -15.108415
7410    7.982017
1999    5.072217
1597   -5.732491
8531   11.038564
9390   -8.923688
5544    6.145535
3755    7.008305
7040   -8.518936
17721   5.713928
11790   5.407028
3247   -4.526287
6093    9.116291
18334   4.521803
24413  -5.477878
2092   10.693191
3300   -6.787850
4704    9.641861
6391   -4.334495
6778   -6.620563
8865  -11.927107
18390   4.891111
3562   -4.687370
10657   8.840164
8579    5.426252
6957   16.258211
1114    5.424996
9062    4.124187
9072   -5.557123
6100    4.919066
8418   -5.845258
6220    8.288398
21013   5.628065

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
         genename combined_pip       Liver    Brain_Cer
296        ACVR1C    1.2524283 0.907177976 3.452503e-01
2564        CNIH4    1.0774810 0.982943607 9.453734e-02
12369     ST3GAL4    1.0517561 0.999999996 5.175609e-02
9504        PSRC1    1.0401559 1.000000000 4.015593e-02
11169       RRBP1    1.0343872 0.978023352 5.636382e-02
4669         GAS6    1.0304715 0.984716041 4.575543e-02
8955         PKN3    1.0206026 0.920399200 1.002034e-01
4065        FADS1    1.0205579 0.999406168 2.115178e-02
5417          HPR    1.0080726 0.999992791 8.079809e-03
5706       INSIG2    1.0079276 0.999992391 7.935239e-03
11559       SGMS1    0.9978836 0.789690387 2.081932e-01
9363        PRKD2    0.9973720 0.985685902 1.168607e-02
5987         KDSR    0.9972987 0.963143807 3.415493e-02
57          ABCG8    0.9967693 0.996767651 1.614202e-06
11006 RP4-781K5.7    0.9962045 0.996204548           NA
25          ABCA1    0.9936425 0.993642546           NA
2843      CSNK1G3    0.9920704 0.956788949 3.528140e-02
9066         PLTP    0.9894597 0.985370574 4.089157e-03
13228        TNKS    0.9853313 0.985331256           NA
9701         RAC1    0.9715526 0.005251282 9.663013e-01
5687        INHBB    0.9711195 0.971119518           NA
3130       CYP2A6    0.9644192 0.964419208           NA
12179         SP4    0.9644114 0.943584837 2.082654e-02
8758         PELO    0.9620666 0.829516273 1.325503e-01
11647       SIPA1    0.9589754 0.006498239 9.524771e-01
3289        DDX56    0.9579050 0.929235029 2.866996e-02
31          ABCA8    0.9544748          NA 9.544748e-01
1384     C10orf88    0.9502566 0.930444679 1.981197e-02
6104      KLHDC7A    0.9412314 0.858905318 8.232609e-02
1988        CCND2    0.9369342 0.416548920 5.203853e-01
5314    HIST1H2BH    0.9328322 0.451462861 4.813694e-01
13546      TTC39B    0.9229868 0.922986846           NA
12414      STAT5B    0.9210178 0.898305230 2.271252e-02
8477         P3H4    0.9202486 0.009139335 9.111093e-01
12333        SRRT    0.9138895 0.872316636 4.157289e-02
4577         FUT2    0.9125557 0.912555657           NA
5074        GSK3B    0.9109870 0.244388498 6.665985e-01
12597       SYTL1    0.9102862 0.711966328 1.983199e-01
569          ALLC    0.9093132 0.839368481 6.994473e-02
13392      TRIM39    0.9064892 0.906482144 7.101999e-06
12298     SPTY2D1    0.9012311 0.842293022 5.893807e-02
13820        USP1    0.8836416 0.883641569           NA
2464       CLDN23    0.8814676 0.782048359 9.941925e-02
3472      DNAJC13    0.8560228 0.062864965 7.931578e-01
9172         POP7    0.8310774 0.831077434           NA
1320         BRI3    0.8207590 0.792207382 2.855163e-02
9381       PROCA1    0.8204752 0.007767419 8.127077e-01
2357         CHKB    0.8057504 0.011039641 7.947108e-01
3530       DOPEY2    0.8056237 0.011200336 7.944234e-01
13849       USP47    0.8006220 0.082589523 7.180325e-01

Case 6 - Liver, Adipose, and Cerebellum with separate parameters

Compare with Case 3 (Liver + Adipose) and Case 4 (Liver + Cerebellum)

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case6")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db;/project2/compbio/predictdb/mashr_models/mashr_Brain_Cerebellum.db" 
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
70f3460 wesleycrouse 2023-01-05
d57dfc3 wesleycrouse 2023-01-05
93168e6 wesleycrouse 2023-01-05
d8772b8 wesleycrouse 2022-12-20
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
                 SNP                Liver Adipose_Subcutaneous 
        7.471396e-05         7.790609e-03         1.420501e-02 
    Brain_Cerebellum 
        8.161595e-03 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
                 SNP                Liver Adipose_Subcutaneous 
           18.528817            49.651565             6.443604 
    Brain_Cerebellum 
            7.617439 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 61.28883
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
                 SNP                Liver Adipose_Subcutaneous 
             8696600                11003                13082 
    Brain_Cerebellum 
               12270 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
                 SNP                Liver Adipose_Subcutaneous 
         0.035036382          0.012386133          0.003484695 
    Brain_Cerebellum 
         0.002219979 
#total PVE
sum(estimated_group_pve)
[1] 0.05312719
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
                 SNP                Liver Adipose_Subcutaneous 
          0.65948120           0.23314113           0.06559155 
    Brain_Cerebellum 
          0.04178612 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
         genename                group region_tag susie_pip        mu2
12008         HPR                Liver      16_38 1.0000000  585.42258
4435        PSRC1                Liver       1_67 1.0000000 1671.14146
2454      ST3GAL4                Liver      11_77 1.0000000  170.35065
3721       INSIG2                Liver       2_69 0.9999913   68.12597
19759      ZDHHC7 Adipose_Subcutaneous      16_49 0.9995468   28.36788
5991        FADS1                Liver      11_34 0.9991968  162.69551
7410        ABCA1                Liver       9_53 0.9930025   70.73631
12687 RP4-781K5.7                Liver      1_121 0.9928625  202.03345
5563        ABCG8                Liver       2_27 0.9907498  311.75406
9390         GAS6                Liver      13_62 0.9833506   74.42010
8531         TNKS                Liver       8_12 0.9831763   78.01659
1999        PRKD2                Liver      19_33 0.9814805   29.95863
13717       TPD52 Adipose_Subcutaneous       8_57 0.9784216   21.34420
1597         PLTP                Liver      20_28 0.9783201   57.26119
5544        CNIH4                Liver      1_114 0.9781234   42.39614
3755        RRBP1                Liver      20_13 0.9684850   32.38409
11790      CYP2A6                Liver      19_29 0.9674857   30.49600
7040        INHBB                Liver       2_70 0.9665361   74.28468
32453        RAC1     Brain_Cerebellum       7_10 0.9659120   30.79984
6093      CSNK1G3                Liver       5_75 0.9485627   83.92815
2092          SP4                Liver       7_19 0.9291370  102.33080
16361      CCDC92 Adipose_Subcutaneous      12_75 0.9225948   26.75929
6391       TTC39B                Liver       9_13 0.9169620   23.68758
15242        CCNJ Adipose_Subcutaneous      10_61 0.9162282   20.86439
8865         FUT2                Liver      19_33 0.9127901   69.39636
8579       STAT5B                Liver      17_25 0.8896792   27.48607
14913       FCGRT Adipose_Subcutaneous      19_34 0.8872476   21.01284
3562       ACVR1C                Liver       2_94 0.8863188   25.00159
10657      TRIM39                Liver       6_26 0.8833627   82.49146
4704        DDX56                Liver       7_33 0.8797603   57.71798
6957         USP1                Liver       1_39 0.8688417  255.37139
6100         ALLC                Liver        2_2 0.8575725   28.18804
33122        P3H4     Brain_Cerebellum      17_25 0.8481087   22.42262
17109         POR Adipose_Subcutaneous       7_48 0.8433223   33.65969
24743      PCMTD2 Adipose_Subcutaneous      20_38 0.8346979   29.12070
18041       IL1RN Adipose_Subcutaneous       2_67 0.8236861   21.69476
               PVE          z
12008 1.703687e-03 -17.962770
4435  4.863327e-03 -41.687336
2454  4.957516e-04  13.376072
3721  1.982573e-04  -8.982702
19759 8.251833e-05  -5.184802
5991  4.730934e-04  12.926351
7410  2.044151e-04   7.982017
12687 5.837578e-04 -15.108415
5563  8.988691e-04 -20.293982
9390  2.129703e-04  -8.923688
8531  2.232229e-04  11.038564
1999  8.557048e-05   5.072217
13717 6.077518e-05  -4.684363
1597  1.630278e-04  -5.732491
5544  1.206814e-04   6.145535
3755  9.127355e-05   7.008305
11790 8.586334e-05   5.407028
7040  2.089477e-04  -8.518936
32453 8.657776e-05   5.713928
6093  2.316829e-04   9.116291
2092  2.766983e-04  10.693191
16361 7.184654e-05  -5.328046
6391  6.321096e-05  -4.334495
15242 5.563264e-05  -4.639318
8865  1.843435e-04 -11.927107
8579  7.116498e-05   5.426252
14913 5.425626e-05  -4.347956
3562  6.448784e-05  -4.687370
10657 2.120647e-04   8.840164
4704  1.477732e-04   9.641861
6957  6.457036e-04  16.258211
6100  7.034869e-05   4.919066
33122 5.534242e-05   4.891111
17109 8.260835e-05   6.044322
24743 7.073779e-05  -5.636854
18041 5.200402e-05   4.455379

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
         genename combined_pip      Adi_Sub       Liver    Brain_Cer
340        ACVR1C    1.3654339 2.993964e-01 0.886318770 1.797188e-01
2858        CNIH4    1.0895843 4.699022e-02 0.978123429 6.447065e-02
13917     ST3GAL4    1.0646736 3.245088e-02 0.999999992 3.222270e-02
5200         GAS6    1.0612617 4.218874e-02 0.983350622 3.572238e-02
9807         PELO    1.0383286 1.632128e-01 0.726817365 1.482984e-01
10622       PSRC1    1.0381159 2.756212e-02 1.000000000 1.055375e-02
1552     C10orf88    1.0320231 2.531858e-01 0.761060584 1.777679e-02
3167      CSNK1G3    1.0286374 5.708902e-02 0.948562669 2.298568e-02
12582       RRBP1    1.0264205 1.833534e-02 0.968485003 3.960011e-02
16026      ZDHHC7    1.0240935 9.995468e-01 0.024546612           NA
13837     SPTY2D1    1.0227135 4.843885e-01 0.492229758 4.609525e-02
6671         KDSR    1.0196759 5.307733e-01 0.469094971 1.980762e-02
6352       INSIG2    1.0180317 1.179547e-02 0.999991342 6.244901e-03
6331        INHBB    1.0153465 4.881036e-02 0.966536088           NA
4524        FADS1    1.0120152           NA 0.999196764 1.281842e-02
14949       TPD52    1.0042172 9.784216e-01 0.006575001 1.922055e-02
6800      KLHDC7A    1.0002542 6.270234e-01 0.325175417 4.805538e-02
6031          HPR    1.0000370 2.207853e-05 1.000000000 1.495800e-05
27          ABCA1    0.9930025           NA 0.993002550           NA
12404 RP4-781K5.7    0.9928625           NA 0.992862525           NA
13123       SIPA1    0.9920651 5.245679e-01 0.004317698 4.631795e-01
63          ABCG8    0.9907517           NA 0.990749844 1.825931e-06
10464       PRKD2    0.9892136 3.725731e-03 0.981480482 4.007413e-03
10143        PLTP    0.9888872 7.511620e-03 0.978320061 3.055470e-03
10851        RAC1    0.9869350 1.786359e-02 0.003159394 9.659120e-01
3491       CYP2A6    0.9839514 1.646572e-02 0.967485717           NA
14693     TMEM199    0.9833807 5.612285e-01 0.422152156           NA
14875        TNKS    0.9831763           NA 0.983176328           NA
13707         SP4    0.9778408 3.602920e-02 0.929136995 1.267462e-02
2222        CCND2    0.9745918 5.587958e-01 0.190178132 2.256179e-01
13875        SRRT    0.9694747 4.078990e-01 0.538226348 2.334934e-02
13022       SGMS1    0.9629557           NA 0.788350028 1.746057e-01
643          ALLC    0.9597117 6.069313e-02 0.857572527 4.144609e-02
15225      TTC39B    0.9506565 3.369447e-02 0.916962043           NA
10136      PLPPR2    0.9502305 6.849915e-01 0.003831763 2.614072e-01
10021        PKN3    0.9472770 4.713977e-01 0.462865779 1.301349e-02
2185       CCDC92    0.9321488 9.225948e-01 0.009554054           NA
33          ABCA8    0.9297873 4.263476e-01          NA 5.034396e-01
5906    HIST1H2BH    0.9278544           NA 0.531826021 3.960284e-01
13964      STAT5B    0.9231265 2.171700e-02 0.889679211 1.173026e-02
9697       PCMTD2    0.9220798 8.346979e-01 0.076295995 1.108595e-02
2233         CCNJ    0.9214843 9.162282e-01 0.005256149           NA
15525        USP1    0.9153238 4.648211e-02 0.868841683           NA
5098         FUT2    0.9127901           NA 0.912790121           NA
3667        DDX56    0.9116258 2.049019e-02 0.879760291 1.137532e-02
4846        FCGRT    0.9009173 8.872476e-01 0.002788675 1.088105e-02
5646        GSK3B    0.8956626           NA 0.232749991 6.629127e-01
14160       SYTL1    0.8947549 3.807541e-02 0.678842676 1.778368e-01
15056      TRIM39    0.8833628 9.165758e-08 0.883362687 3.400296e-08
9498         P3H4    0.8810361 2.692311e-02 0.006004294 8.481087e-01
3871      DNAJC13    0.8779756 2.873330e-02 0.061512565 7.877298e-01
7323    LINC01184    0.8758830 4.987738e-01 0.213466023 1.636432e-01
13682      SORCS2    0.8653909 7.684955e-01 0.096895389           NA
1481         BRI3    0.8637015 4.932722e-02 0.792136481 2.223784e-02
10263         POR    0.8531609 8.433223e-01 0.002459456 7.379191e-03
3935       DOPEY2    0.8486725 4.043902e-02 0.009027480 7.992060e-01
10485      PROCA1    0.8334201 4.930295e-01 0.003121419 3.372692e-01
15556       USP47    0.8248551 1.266296e-01 0.075071800 6.231538e-01
6280        IL1RN    0.8236861 8.236861e-01          NA           NA
2753       CLDN23    0.8161379 2.406376e-02 0.710143733 8.193043e-02
13065      SH3TC1    0.8141052 3.709327e-02 0.015414057 7.615979e-01
9747        PDE4C    0.8040124 7.046461e-03 0.001610059 7.953558e-01

Case 7 - Liver only

Compare with Case 3 (Liver + Adipose) and others

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case7")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db;/project2/compbio/predictdb/mashr_models/mashr_Brain_Cerebellum.db" 
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- ctwas_gene_res$type
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
05269c3 wesleycrouse 2023-01-10
9c0d0e7 wesleycrouse 2023-01-06
70f3460 wesleycrouse 2023-01-05
d57dfc3 wesleycrouse 2023-01-05
93168e6 wesleycrouse 2023-01-05
18962d7 wesleycrouse 2023-01-04
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
         SNP        Liver 
0.0001708048 0.0102158606 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
     SNP    Liver 
 9.70388 43.32067 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 59.81014
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
    SNP   Liver 
8696600   11003 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
       SNP      Liver 
0.04194840 0.01417103 
#total PVE
sum(estimated_group_pve)
[1] 0.05611943
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
      SNP     Liver 
0.7474844 0.2525156 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
         genename group region_tag susie_pip        mu2          PVE          z
4435        PSRC1 Liver       1_67 1.0000000 1661.18355 4.834348e-03 -41.687336
2454      ST3GAL4 Liver      11_77 1.0000000  173.08249 5.037017e-04  13.376072
12008         HPR Liver      16_38 0.9999998  163.53982 4.759307e-04 -17.962770
3721       INSIG2 Liver       2_69 0.9999958   68.51740 1.993973e-04  -8.982702
5563        ABCG8 Liver       2_27 0.9999660  313.08513 9.111040e-04 -20.293982
5991        FADS1 Liver      11_34 0.9998572  163.89639 4.769004e-04  12.926351
12687 RP4-781K5.7 Liver      1_121 0.9997364  203.40050 5.917766e-04 -15.108415
7410        ABCA1 Liver       9_53 0.9956684   70.26807 2.036072e-04   7.982017
8531         TNKS Liver       8_12 0.9915320   76.52776 2.208239e-04  11.038564
9390         GAS6 Liver      13_62 0.9888998   71.30764 2.052148e-04  -8.923688
1597         PLTP Liver      20_28 0.9885315   61.47131 1.768411e-04  -5.732491
1999        PRKD2 Liver      19_33 0.9867185   30.06125 8.632181e-05   5.072217
7040        INHBB Liver       2_70 0.9832856   73.94275 2.115902e-04  -8.518936
3755        RRBP1 Liver      20_13 0.9819053   32.42030 9.264178e-05   7.008305
5544        CNIH4 Liver      1_114 0.9787252   40.77861 1.161485e-04   6.145535
2092          SP4 Liver       7_19 0.9781344  102.23228 2.910093e-04  10.693191
6093      CSNK1G3 Liver       5_75 0.9759584   84.08145 2.388096e-04   9.116291
8865         FUT2 Liver      19_33 0.9674812  104.74356 2.949105e-04 -11.927107
11790      CYP2A6 Liver      19_28 0.9643766   31.94912 8.966560e-05   5.407028
10657      TRIM39 Liver       6_25 0.9605468   81.54422 2.279460e-04   8.840164
3247         KDSR Liver      18_35 0.9578772   24.64899 6.871148e-05  -4.526287
233        NPC1L1 Liver       7_32 0.9548612   86.93035 2.415639e-04 -10.761931
4704        DDX56 Liver       7_32 0.9494642   59.88499 1.654691e-04   9.641861
6391       TTC39B Liver       9_13 0.9414293   23.21539 6.360392e-05  -4.334495
6220         PELO Liver       5_31 0.9388155   70.72581 1.932317e-04   8.288398
1114         SRRT Liver       7_62 0.9383759   32.70763 8.931947e-05   5.424996
6778         PKN3 Liver       9_66 0.9376624   47.53713 1.297179e-04  -6.620563
3300     C10orf88 Liver      10_77 0.9360721   37.17593 1.012725e-04  -6.787850
8579       STAT5B Liver      17_25 0.9310376   30.58279 8.286375e-05   5.426252
3562       ACVR1C Liver       2_94 0.9285980   25.84902 6.985414e-05  -4.687370
6957         USP1 Liver       1_39 0.8941238  253.51088 6.596515e-04  16.258211
9062      KLHDC7A Liver       1_13 0.8301423   22.36541 5.403183e-05   4.124187
8418         POP7 Liver       7_62 0.8209688   40.45771 9.666033e-05  -5.845258
9072      SPTY2D1 Liver      11_13 0.8208409   33.49049 8.000198e-05  -5.557123
8931      CRACR2B Liver       11_1 0.8079877   21.87357 5.143334e-05  -3.989585
5415        SYTL1 Liver       1_19 0.8069655   22.30889 5.239059e-05  -3.962854
6100         ALLC Liver        2_2 0.8068337   28.07319 6.591680e-05   4.919066

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
         genename combined_pip     Liver
7022        PSRC1    1.0000000 1.0000000
9062      ST3GAL4    1.0000000 1.0000000
3979          HPR    0.9999998 0.9999998
4188       INSIG2    0.9999958 0.9999958
44          ABCG8    0.9999660 0.9999660
2990        FADS1    0.9998572 0.9998572
8053  RP4-781K5.7    0.9997364 0.9997364
17          ABCA1    0.9956684 0.9956684
9738         TNKS    0.9915320 0.9915320
3423         GAS6    0.9888998 0.9888998
6672         PLTP    0.9885315 0.9885315
6913        PRKD2    0.9867185 0.9867185
4174        INHBB    0.9832856 0.9832856
8167        RRBP1    0.9819053 0.9819053
1874        CNIH4    0.9787252 0.9787252
8916          SP4    0.9781344 0.9781344
2086      CSNK1G3    0.9759584 0.9759584
3368         FUT2    0.9674812 0.9674812
2263       CYP2A6    0.9643766 0.9643766
9870       TRIM39    0.9605468 0.9605468
4395         KDSR    0.9578772 0.9578772
5972       NPC1L1    0.9548612 0.9548612
2397        DDX56    0.9494642 0.9494642
9996       TTC39B    0.9414293 0.9414293
6425         PELO    0.9388155 0.9388155
9035         SRRT    0.9383759 0.9383759
6583         PKN3    0.9376624 0.9376624
1009     C10orf88    0.9360721 0.9360721
9091       STAT5B    0.9310376 0.9310376
195        ACVR1C    0.9285980 0.9285980
10193        USP1    0.8941238 0.8941238
4483      KLHDC7A    0.8301423 0.8301423
6761         POP7    0.8209688 0.8209688
9009      SPTY2D1    0.8208409 0.8208409
2032      CRACR2B    0.8079877 0.8079877
9235        SYTL1    0.8069655 0.8069655
405          ALLC    0.8068337 0.8068337
#store gene list
liver_genes <- df_gene$genename[df_gene$combined_pip>0.8]

Case 8 - Adipose only

Compare with Case 3 (Liver + Adipose) and others

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case8")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db;/project2/compbio/predictdb/mashr_models/mashr_Brain_Cerebellum.db" 
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- ctwas_gene_res$type
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
05269c3 wesleycrouse 2023-01-10
32cc717 wesleycrouse 2023-01-08
e7e2b73 wesleycrouse 2023-01-06
9c0d0e7 wesleycrouse 2023-01-06
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
                 SNP Adipose_Subcutaneous 
        5.730722e-05         2.606610e-02 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
                 SNP Adipose_Subcutaneous 
           42.910260             6.890161 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 454.8485
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
                 SNP Adipose_Subcutaneous 
             8696600                13082 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
                 SNP Adipose_Subcutaneous 
          0.06223580           0.00683754 
#total PVE
sum(estimated_group_pve)
[1] 0.06907334
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
                 SNP Adipose_Subcutaneous 
          0.90101044           0.09898956 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
        genename                group region_tag susie_pip       mu2
19759     ZDHHC7 Adipose_Subcutaneous      16_49 0.9997044  27.97933
13717      TPD52 Adipose_Subcutaneous       8_57 0.9915149  21.54876
17109        POR Adipose_Subcutaneous       7_48 0.9859590  32.16405
18637      ABCA8 Adipose_Subcutaneous      17_39 0.9844949  27.84557
25057      SIPA1 Adipose_Subcutaneous      11_36 0.9788123  22.77446
15027     PLPPR2 Adipose_Subcutaneous      19_10 0.9782851  27.69113
16444   C10orf88 Adipose_Subcutaneous      10_77 0.9769343  30.83296
16388       KDSR Adipose_Subcutaneous      18_35 0.9760929  20.26486
20337       PKN3 Adipose_Subcutaneous       9_66 0.9739895  40.44418
18314    FAM117B Adipose_Subcutaneous      2_120 0.9703680  37.35650
15242       CCNJ Adipose_Subcutaneous      10_61 0.9687017  21.08333
24743     PCMTD2 Adipose_Subcutaneous      20_38 0.9580519  29.17029
22969    KLHDC7A Adipose_Subcutaneous       1_13 0.9563445  18.30601
16361     CCDC92 Adipose_Subcutaneous      12_75 0.9554636  25.94733
14913      FCGRT Adipose_Subcutaneous      19_34 0.9473974  20.07001
16347      CCND2 Adipose_Subcutaneous       12_4 0.9455314  19.20330
13993       SRRT Adipose_Subcutaneous       7_62 0.9433615  24.14309
13346      SPHK2 Adipose_Subcutaneous      19_33 0.9430979  36.78330
18737       NTN5 Adipose_Subcutaneous      19_33 0.9423048  52.10799
18041      IL1RN Adipose_Subcutaneous       2_67 0.9401773  20.48820
27265 AC007950.2 Adipose_Subcutaneous      15_29 0.9399429  30.09055
22986    SPTY2D1 Adipose_Subcutaneous      11_13 0.9336852  27.66149
14240    CWF19L1 Adipose_Subcutaneous      10_64 0.9329765  28.33600
20155      TMED4 Adipose_Subcutaneous       7_33 0.9301009  36.31431
21459     PROCA1 Adipose_Subcutaneous      17_17 0.9272370  25.69084
12875       MPND Adipose_Subcutaneous       19_5 0.9226804  19.81239
16736     ACVR1C Adipose_Subcutaneous       2_94 0.9093778  19.02619
22062      KCNK3 Adipose_Subcutaneous       2_16 0.9093320  20.13920
14273        SCD Adipose_Subcutaneous      10_64 0.9020432  18.43963
21618       PDHB Adipose_Subcutaneous       3_40 0.8906162  22.83036
23583     SORCS2 Adipose_Subcutaneous        4_8 0.8776838  21.67586
24579     C2CD4A Adipose_Subcutaneous      15_28 0.8755964  21.20561
21750      PCSK9 Adipose_Subcutaneous       1_34 0.8711964 103.66507
20426   ALDH16A1 Adipose_Subcutaneous      19_34 0.8646920  19.36271
25972  LINC01184 Adipose_Subcutaneous       5_78 0.8646535  17.88900
21711      VAMP5 Adipose_Subcutaneous       2_54 0.8518568  21.44792
18321      PARP9 Adipose_Subcutaneous       3_76 0.8506879  22.09649
19793    UBASH3B Adipose_Subcutaneous      11_74 0.8500016  23.50538
14808       CTSH Adipose_Subcutaneous      15_37 0.8402426  17.01546
16496       ELF1 Adipose_Subcutaneous      13_16 0.8384040  17.75011
16438       GPAM Adipose_Subcutaneous      10_70 0.8292011  19.07708
19681       PELO Adipose_Subcutaneous       5_30 0.8089402  59.96489
               PVE         z
19759 8.140091e-05 -5.184802
13717 6.217873e-05 -4.684363
17109 9.228900e-05  6.044322
18637 7.977923e-05  4.800447
25057 6.487358e-05 -5.096493
15027 7.883634e-05  3.965665
16444 8.765989e-05 -6.762952
16388 5.756454e-05 -4.448194
20337 1.146385e-04 -6.780039
18314 1.054928e-04  7.852653
15242 5.943600e-05 -4.639318
24743 8.132987e-05 -5.636854
22969 5.094814e-05  4.124187
16361 7.214846e-05 -5.328046
14913 5.533501e-05 -4.347956
16347 5.284112e-05 -4.190519
13993 6.628135e-05  4.556557
13346 1.009550e-04 -8.721460
18737 1.428947e-04 11.132252
18041 5.605752e-05  4.455379
27265 8.230987e-05  5.555780
22986 7.516167e-05 -5.557123
14240 7.693599e-05  5.837199
20155 9.829425e-05  7.696786
21459 6.932493e-05  5.543282
12875 5.319961e-05 -4.366911
16736 5.035196e-05 -4.185879
22062 5.329482e-05 -4.772296
14273 4.840606e-05 -4.541468
21618 5.917301e-05  3.396693
23583 5.536492e-05 -4.572758
24579 5.403500e-05  4.535165
21750 2.628263e-04 17.210869
20426 4.872455e-05 -4.215260
25972 4.501410e-05 -3.918171
21711 5.317065e-05  4.547456
18321 5.470335e-05  3.744644
19793 5.814432e-05  4.906621
14808 4.160721e-05  3.616369
16496 4.330866e-05 -3.876887
16438 4.603542e-05  4.000718
19681 1.411672e-04  8.522224

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
        genename combined_pip   Adi_Sub
12570     ZDHHC7    0.9997044 0.9997044
11702      TPD52    0.9915149 0.9915149
8047         POR    0.9859590 0.9859590
28         ABCA8    0.9844949 0.9844949
10219      SIPA1    0.9788123 0.9788123
7944      PLPPR2    0.9782851 0.9782851
1234    C10orf88    0.9769343 0.9769343
5232        KDSR    0.9760929 0.9760929
7851        PKN3    0.9739895 0.9739895
3570     FAM117B    0.9703680 0.9703680
1770        CCNJ    0.9687017 0.9687017
7581      PCMTD2    0.9580519 0.9580519
5333     KLHDC7A    0.9563445 0.9563445
1732      CCDC92    0.9554636 0.9554636
3812       FCGRT    0.9473974 0.9473974
1761       CCND2    0.9455314 0.9455314
10816       SRRT    0.9433615 0.9433615
10739      SPHK2    0.9430979 0.9430979
7214        NTN5    0.9423048 0.9423048
4929       IL1RN    0.9401773 0.9401773
113   AC007950.2    0.9399429 0.9399429
10785    SPTY2D1    0.9336852 0.9336852
2671     CWF19L1    0.9329765 0.9329765
11420      TMED4    0.9301009 0.9301009
8227      PROCA1    0.9272370 0.9272370
6528        MPND    0.9226804 0.9226804
248       ACVR1C    0.9093778 0.9093778
5183       KCNK3    0.9093320 0.9093320
9923         SCD    0.9020432 0.9020432
7635        PDHB    0.8906162 0.8906162
10667     SORCS2    0.8776838 0.8776838
1382      C2CD4A    0.8755964 0.8755964
7593       PCSK9    0.8711964 0.8711964
453     ALDH16A1    0.8646920 0.8646920
5720   LINC01184    0.8646535 0.8646535
12231      VAMP5    0.8518568 0.8518568
7499       PARP9    0.8506879 0.8506879
12041    UBASH3B    0.8500016 0.8500016
2646        CTSH    0.8402426 0.8402426
3310        ELF1    0.8384040 0.8384040
4299        GPAM    0.8292011 0.8292011
7675        PELO    0.8089402 0.8089402
#store gene list
adipose_genes <- df_gene$genename[df_gene$combined_pip>0.8]

Case 9 - Liver, Adipose x2, Cerebellum with separate parameters

Compare with Case 6 (Liver + Adipose + Cerebellum)

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case9")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db;/project2/compbio/predictdb/mashr_models/mashr_Brain_Cerebellum.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Visceral_Omentum.db"
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")

#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
05269c3 wesleycrouse 2023-01-10
e7e2b73 wesleycrouse 2023-01-06
9c0d0e7 wesleycrouse 2023-01-06
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
                     SNP                    Liver     Adipose_Subcutaneous 
            0.0000671549             0.0074368998             0.0102148490 
        Brain_Cerebellum Adipose_Visceral_Omentum 
            0.0090081071             0.0071611334 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
                     SNP                    Liver     Adipose_Subcutaneous 
               17.917578                51.128138                 6.311241 
        Brain_Cerebellum Adipose_Visceral_Omentum 
                7.283353                13.501294 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 67.5033
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
                     SNP                    Liver     Adipose_Subcutaneous 
                 8696600                    11003                    13082 
        Brain_Cerebellum Adipose_Visceral_Omentum 
                   12270                    12921 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
                     SNP                    Liver     Adipose_Subcutaneous 
             0.030452770              0.012175402              0.002454376 
        Brain_Cerebellum Adipose_Visceral_Omentum 
             0.002342771              0.003635579 
#total PVE
sum(estimated_group_pve)
[1] 0.0510609
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
                     SNP                    Liver     Adipose_Subcutaneous 
              0.59640098               0.23844865               0.04806763 
        Brain_Cerebellum Adipose_Visceral_Omentum 
              0.04588189               0.07120084 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
         genename                    group region_tag susie_pip        mu2
4435        PSRC1                    Liver       1_67 1.0000000 1672.65008
2454      ST3GAL4                    Liver      11_77 1.0000000  170.81971
12008         HPR                    Liver      16_38 0.9999909  156.17533
3721       INSIG2                    Liver       2_70 0.9999886   67.80738
19759      ZDHHC7     Adipose_Subcutaneous      16_49 0.9995073   28.90070
5991        FADS1                    Liver      11_34 0.9993457  162.88551
12687 RP4-781K5.7                    Liver      1_121 0.9944728  202.33237
7410        ABCA1                    Liver       9_53 0.9936742   70.79401
5563        ABCG8                    Liver       2_27 0.9932246  309.29203
9390         GAS6                    Liver      13_62 0.9857220   74.67277
8531         TNKS                    Liver       8_12 0.9849278   78.63394
47974      GIGYF1 Adipose_Visceral_Omentum       7_62 0.9727463   35.85447
1597         PLTP                    Liver      20_28 0.9716597   55.11327
7040        INHBB                    Liver       2_70 0.9698998   74.75276
32453        RAC1         Brain_Cerebellum       7_10 0.9696209   30.53796
3755        RRBP1                    Liver      20_13 0.9681886   32.42776
41704      NPC1L1 Adipose_Visceral_Omentum       7_33 0.9609917   93.62015
6093      CSNK1G3                    Liver       5_75 0.9545942   84.41491
2092          SP4                    Liver       7_19 0.9383471  102.44407
11790      CYP2A6                    Liver      19_30 0.9362228   33.29962
8865         FUT2                    Liver      19_33 0.9218977   69.39850
47687       ABCG8 Adipose_Visceral_Omentum       2_27 0.9128748   43.65045
6391       TTC39B                    Liver       9_13 0.9123067   23.73563
5544        CNIH4                    Liver      1_114 0.9025894   42.50363
10657      TRIM39                    Liver       6_26 0.8992012   82.46535
15242        CCNJ     Adipose_Subcutaneous      10_61 0.8898689   21.09530
3562       ACVR1C                    Liver       2_94 0.8890967   25.11073
44743         FN1 Adipose_Visceral_Omentum      2_127 0.8861711   22.46646
8579       STAT5B                    Liver      17_25 0.8832937   27.31621
6957         USP1                    Liver       1_39 0.8734886  255.83120
49316        ACP6 Adipose_Visceral_Omentum       1_73 0.8731497   21.79158
6100         ALLC                    Liver        2_2 0.8598451   28.04896
33122        P3H4         Brain_Cerebellum      17_25 0.8440652   22.47742
14913       FCGRT     Adipose_Subcutaneous      19_34 0.8291311   21.96658
33158      DOPEY2         Brain_Cerebellum      21_17 0.8284235   23.30226
29649       PDE4C         Brain_Cerebellum      19_14 0.8160274   40.15830
32727     DNAJC13         Brain_Cerebellum       3_82 0.8094458   46.19393
               PVE          z
4435  4.867718e-03 -41.687336
2454  4.971166e-04  13.376072
12008 4.544947e-04 -17.962770
3721  1.973296e-04  -8.982702
19759 8.406488e-05  -5.184802
5991  4.737165e-04  12.926351
12687 5.855697e-04 -15.108415
7410  2.047203e-04   7.982017
5563  8.939979e-04 -20.293982
9390  2.142087e-04  -8.923688
8531  2.253901e-04  11.038564
47974 1.014993e-04   6.632402
1597  1.558442e-04  -5.732491
7040  2.109961e-04  -8.518936
32453 8.617122e-05   5.713928
3755  9.136866e-05   7.008305
41704 2.618239e-04  11.631021
6093  2.345083e-04   9.116291
2092  2.797503e-04  10.693191
11790 9.072747e-05   5.407028
8865  1.861886e-04 -11.927107
47687 1.159632e-04   6.510103
6391  6.301762e-05  -4.334495
5544  1.116443e-04   6.145535
10657 2.157986e-04   8.840164
15242 5.463011e-05  -4.639318
3562  6.497237e-05  -4.687370
44743 5.793919e-05  -4.446065
8579  7.021759e-05   5.426252
6957  6.503259e-04  16.258211
49316 5.537297e-05   4.214320
6100  7.018711e-05   4.919066
33122 5.521319e-05   4.891111
14913 5.300367e-05  -4.347956
33158 5.617857e-05  -4.766662
29649 9.536748e-05  -6.633160
32727 1.088161e-04  -6.853813

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
         genename combined_pip      Adi_Sub       Liver    Brain_Cer
65          ABCG8    1.9061009           NA 0.993224587 1.491732e-06
9941        PARP9    1.5199597 3.673594e-01 0.006644270 3.625511e-01
349        ACVR1C    1.3608776 2.420229e-01 0.889096730 2.144862e-01
5380         GAS6    1.1926406 2.815637e-02 0.985722025 3.663521e-02
28          ABCA1    1.1778517           NA 0.993674247           NA
2956        CNIH4    1.0986149 3.354771e-02 0.902589378 7.148730e-02
14430     ST3GAL4    1.0705782 2.278493e-02 0.999999994 3.452498e-02
10161        PELO    1.0561341 1.111464e-01 0.371223486 1.404643e-01
16602      ZDHHC7    1.0425984 9.995073e-01 0.024285838           NA
11000       PSRC1    1.0425071 1.921100e-02 1.000000000 1.150583e-02
1603     C10orf88    1.0411604 1.474607e-01 0.515954788 1.913846e-02
3275      CSNK1G3    1.0372897 4.142693e-02 0.954594237 2.596340e-02
14350     SPTY2D1    1.0329040 4.095891e-01 0.556409140 4.984606e-02
6911         KDSR    1.0291338 4.187041e-01 0.494514349 2.286481e-02
15499       TPD52    1.0278942 5.262092e-01 0.005823745 1.886903e-02
13049       RRBP1    1.0264088 1.351937e-02 0.968188634 4.470083e-02
6247          HPR    1.0235999 1.119674e-02 0.999990926 6.520217e-03
4675        FADS1    1.0208940           NA 0.999345723 1.418185e-02
6578       INSIG2    1.0078010 2.610112e-03 0.999988555 2.697495e-03
7049      KLHDC7A    1.0043632 5.531809e-01 0.380276986 5.582087e-02
10835       PRKD2    1.0021744 2.717969e-03 0.217759117 2.618723e-03
5486       GIGYF1    1.0014547 1.352322e-02 0.003911509 1.127370e-02
9441       NPC1L1    0.9999751 1.484434e-02 0.013401014 1.073800e-02
12863 RP4-781K5.7    0.9944728           NA 0.994472818           NA
5842        GSK3B    0.9932634           NA 0.101176816 3.152329e-01
11235        RAC1    0.9916447 1.282684e-02 0.002968970 9.696209e-01
10506        PLTP    0.9897804 5.564772e-03 0.971659736 3.385850e-03
13610       SIPA1    0.9893442 4.041296e-01 0.003803708 5.760658e-01
13505       SGMS1    0.9880852           NA 0.795825679 1.922596e-01
2296        CCND2    0.9864169 3.775869e-01 0.169555913 2.309539e-01
14214         SP4    0.9861696 2.546881e-02 0.938347057 1.382209e-02
15423        TNKS    0.9849278           NA 0.984927835           NA
6557        INHBB    0.9758839 3.929670e-03 0.969899842           NA
3521         CTSH    0.9753641 2.032387e-01 0.160036482 8.245873e-02
660          ALLC    0.9718715 4.426554e-02 0.859845113 4.640541e-02
1528         BRI3    0.9698437 3.343538e-02 0.711570266 2.349788e-02
6110    HIST1H2BH    0.9627432           NA 0.547507725 3.988183e-01
15781      TTC39B    0.9576792 2.600845e-02 0.912306652           NA
302          ACP6    0.9558090 2.558880e-02 0.006626946 5.044356e-02
16125       USP53    0.9489626 2.424768e-02 0.153846127 1.466065e-02
10382        PKN3    0.9403224 3.721910e-01 0.550368578 1.776282e-02
3613       CYP2A6    0.9362417 1.889543e-05 0.936222776           NA
6848        KCNK3    0.9303282 4.926516e-01          NA           NA
34          ABCA8    0.9237564 3.102420e-01          NA 6.134858e-01
5274         FUT2    0.9218977           NA 0.921897709           NA
14479      STAT5B    0.9156545 1.374625e-02 0.883293732 1.149433e-02
7597    LINC01184    0.9107463 3.193425e-01 0.179252250 1.577378e-01
16088        USP1    0.9068683 3.337975e-02 0.873488573           NA
2307         CCNJ    0.9043155 8.898689e-01 0.005104675           NA
4006      DNAJC13    0.9010062 2.127420e-02 0.059582249 8.094458e-01
15609      TRIM39    0.8992013 7.523058e-08 0.899201178 4.319550e-08
2256       CCDC92    0.8968715 7.858781e-01 0.011186093           NA
5269        FURIN    0.8906402 2.482145e-02 0.375893815 4.250627e-02
8902       MTSS1L    0.8873198 3.202082e-02 0.007347410 1.293683e-01
5158          FN1    0.8861711           NA          NA           NA
4839        FAM3D    0.8861558           NA 0.393523642           NA
4073       DOPEY2    0.8857849 3.062539e-02 0.008857757 8.284235e-01
14680       SYTL1    0.8772921 2.788713e-02 0.643217735 1.968976e-01
9841         P3H4    0.8772168 1.643423e-02 0.005113679 8.440652e-01
10856      PROCA1    0.8764166 2.742031e-01 0.002574531 2.880409e-01
13237         SCD    0.8743632 4.847417e-01          NA           NA
5009        FCGRT    0.8592828 8.291311e-01 0.002715621 1.267679e-02
152    AC007950.2    0.8588843 3.621080e-01 0.003701136           NA
5630       GOLGA3    0.8577600 2.919996e-01 0.005583545 1.816214e-02
9031      N4BP2L1    0.8559432 1.144121e-02 0.004704934 7.841271e-01
14189      SORCS2    0.8552753 6.751668e-01 0.111231131           NA
10045      PCMTD2    0.8492358 7.182003e-01 0.098915897 1.370625e-02
13550      SH3TC1    0.8489500 2.676115e-02 0.014462946 7.892401e-01
10626        POP7    0.8489427           NA 0.456644954           NA
16120       USP47    0.8407391 8.959569e-02 0.069198629 6.682865e-01
14833       TCEA2    0.8367755 7.453639e-02 0.004625772 1.440040e-02
5666         GPAM    0.8281919 3.512448e-01          NA 2.653485e-04
10440        PLEC    0.8260408 2.201307e-02 0.005521095 9.479757e-03
10096       PDE4C    0.8216644 4.298341e-03 0.001338660 8.160274e-01
3517         CTSB    0.8194053 2.294538e-02          NA 2.855054e-02
2846       CLDN23    0.8166590 1.730850e-02 0.706511417 9.283910e-02
16601      ZDHHC6    0.8123386 3.891946e-01 0.001541926 4.149941e-01
4606         EVI5    0.8083431 2.857285e-02 0.014465622 2.701293e-02
1099        ASAP3    0.8067506 1.718864e-02 0.733717238 1.336300e-02
16949      ZNF575    0.8012029 1.917327e-01 0.003758440 1.803099e-01
           Adi_Vis
65    9.128748e-01
9941  7.834050e-01
349   1.527184e-02
5380  1.421270e-01
28    1.841775e-01
2956  9.099053e-02
14430 1.326825e-02
10161 4.333000e-01
16602 1.880523e-02
11000 1.179026e-02
1603  3.586065e-01
3275  1.530517e-02
14350 1.705970e-02
6911  9.305051e-02
15499 4.769922e-01
13049           NA
6247  5.891996e-03
4675  7.366414e-03
6578  2.504801e-03
7049  1.508450e-02
10835 7.790786e-01
5486  9.727463e-01
9441  9.609917e-01
12863           NA
5842  5.768536e-01
11235 6.228072e-03
10506 9.170046e-03
13610 5.345074e-03
13505           NA
2296  2.083202e-01
14214 8.531631e-03
15423           NA
6557  2.054340e-03
3521  5.296302e-01
660   2.135544e-02
1528  2.013402e-01
6110  1.641718e-02
15781 1.936411e-02
302   8.731497e-01
16125 7.562081e-01
10382           NA
3613            NA
6848  4.376766e-01
34    2.861606e-05
5274            NA
14479 7.120199e-03
7597  2.544138e-01
16088           NA
2307  9.341877e-03
4006  1.070393e-02
15609 3.913812e-08
2256  9.980738e-02
5269  4.474187e-01
8902  7.185833e-01
5158  8.861711e-01
4839  4.926321e-01
4073  1.787821e-02
14680 9.289616e-03
9841  1.160370e-02
10856 3.115980e-01
13237 3.896215e-01
5009  1.475931e-02
152   4.930752e-01
5630  5.420147e-01
9031  5.566995e-02
14189 6.887743e-02
10045 1.841336e-02
13550 1.848578e-02
10626 3.922977e-01
16120 1.365825e-02
14833 7.432129e-01
5666  4.766818e-01
10440 7.890269e-01
10096           NA
3517  7.679094e-01
2846            NA
16601 6.608035e-03
4606  7.382917e-01
1099  4.248170e-02
16949 4.254019e-01

Case 10 - Liver, Adipose x2, Cerebellum, and Blood with separate parameters

Compare with Case 9 (Liver + Adipose x2 + Cerebellum)

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case10")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db;/project2/compbio/predictdb/mashr_models/mashr_Brain_Cerebellum.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Visceral_Omentum.db;/project2/compbio/predictdb/mashr_models/mashr_Whole_Blood.db"
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
05269c3 wesleycrouse 2023-01-10
e7e2b73 wesleycrouse 2023-01-06
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
                     SNP                    Liver     Adipose_Subcutaneous 
            5.501215e-05             1.690736e-02             8.072674e-03 
        Brain_Cerebellum Adipose_Visceral_Omentum              Whole_Blood 
            1.103061e-03             1.030172e-02             1.313834e-03 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
                     SNP                    Liver     Adipose_Subcutaneous 
               22.663381                10.511923                 6.886394 
        Brain_Cerebellum Adipose_Visceral_Omentum              Whole_Blood 
              137.309362                 8.513732                74.698652 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 22.52427
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
                     SNP                    Liver     Adipose_Subcutaneous 
                 8696600                    11003                    13082 
        Brain_Cerebellum Adipose_Visceral_Omentum              Whole_Blood 
                   12270                    12921                    11198 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
                     SNP                    Liver     Adipose_Subcutaneous 
             0.031553903              0.005691010              0.002116429 
        Brain_Cerebellum Adipose_Visceral_Omentum              Whole_Blood 
             0.005408345              0.003297967              0.003198262 
#total PVE
sum(estimated_group_pve)
[1] 0.05126592
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
                     SNP                    Liver     Adipose_Subcutaneous 
              0.61549477               0.11100963               0.04128336 
        Brain_Cerebellum Adipose_Visceral_Omentum              Whole_Blood 
              0.10549591               0.06433059               0.06238574 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
      genename                    group region_tag susie_pip        mu2
64244    PCSK9              Whole_Blood       1_34 1.0000000  311.12261
60642    PSRC1              Whole_Blood       1_67 1.0000000 1700.14036
33017   TXNL4B         Brain_Cerebellum      16_38 1.0000000  425.95505
2454   ST3GAL4                    Liver      11_77 0.9999975  149.71795
60229     LDLR              Whole_Blood      19_11 0.9999936  641.01378
3721    INSIG2                    Liver       2_70 0.9999635   59.90457
19759   ZDHHC7     Adipose_Subcutaneous      16_49 0.9996663   29.93651
7410     ABCA1                    Liver       9_53 0.9870449   62.73592
3755     RRBP1                    Liver      20_13 0.9805280   29.59484
8531      TNKS                    Liver       8_12 0.9710705   67.34376
6391    TTC39B                    Liver       9_13 0.9672671   21.02116
61917    TIMD4              Whole_Blood       5_92 0.9639997  185.96921
47974   GIGYF1 Adipose_Visceral_Omentum       7_62 0.9631938   32.78921
11790   CYP2A6                    Liver      19_30 0.9629681   29.97480
9390      GAS6                    Liver      13_62 0.9616146   65.27598
3562    ACVR1C                    Liver       2_94 0.9598606   22.52879
8579    STAT5B                    Liver      17_25 0.9439073   27.09623
6100      ALLC                    Liver        2_2 0.9423566   24.93774
7040     INHBB                    Liver       2_70 0.9248716   66.65648
1597      PLTP                    Liver      20_28 0.9181188   36.27175
8865      FUT2                    Liver      19_33 0.9156446   62.13345
44743      FN1 Adipose_Visceral_Omentum      2_127 0.9096370   21.02782
11726   CLDN23                    Liver       8_11 0.9073017   22.55559
5415     SYTL1                    Liver       1_19 0.8934568   19.72085
41704   NPC1L1 Adipose_Visceral_Omentum       7_33 0.8922234   88.29280
63167     USP1              Whole_Blood       1_39 0.8709703  259.24171
1144     ASAP3                    Liver       1_16 0.8649959   30.40558
10519    SGMS1                    Liver      10_33 0.8637259   21.87209
15242     CCNJ     Adipose_Subcutaneous      10_61 0.8611409   21.61776
10495    PRMT6                    Liver       1_67 0.8568345   28.86241
6093   CSNK1G3                    Liver       5_75 0.8562573   73.22517
7394  TP53INP1                    Liver       8_66 0.8305806   19.57806
14913    FCGRT     Adipose_Subcutaneous      19_34 0.8303016   22.57781
2042     BCAT2                    Liver      19_34 0.8243198   23.85479
39886  HLA-DOB         Brain_Cerebellum       6_27 0.8173763   68.07678
32453     RAC1         Brain_Cerebellum       7_10 0.8137966   37.04397
4682    TBC1D4                    Liver      13_37 0.8112086   19.82561
9966     LRRK2                    Liver      12_25 0.8096716   24.60858
7355      BRI3                    Liver       7_60 0.8063315   26.52881
               PVE          z
64244 9.054238e-04  23.237813
60642 4.947720e-03 -41.793474
33017 1.239607e-03  18.129274
2454  4.357056e-04  13.376072
60229 1.865456e-03 -24.707322
3721  1.743269e-04  -8.982702
19759 8.709164e-05  -5.184802
7410  1.802078e-04   7.982017
3755  8.444936e-05   7.008305
8531  1.903130e-04  11.038564
6391  5.917298e-05  -4.334495
61917 5.217209e-04  13.882363
47974 9.191044e-05   6.632402
11790 8.400178e-05   5.407028
9390  1.826732e-04  -8.923688
3562  6.293126e-05  -4.687370
8579  7.443180e-05   5.426252
6100  6.839000e-05   4.919066
7040  1.794090e-04  -8.518936
1597  9.691427e-05  -5.732491
8865  1.655666e-04 -11.927107
44743 5.566505e-05  -4.446065
11726 5.955609e-05   4.720010
5415  5.127662e-05  -3.962854
41704 2.292552e-04  11.631021
63167 6.570955e-04  16.258211
1144  7.653986e-05   5.283225
10519 5.497769e-05   4.873968
15242 5.417579e-05  -4.639318
10495 7.196973e-05  -5.323721
6093  1.824673e-04   9.116291
7394  4.732294e-05   4.038448
14913 5.455543e-05  -4.347956
2042  5.722576e-05   4.796398
39886 1.619352e-04  -8.430761
32453 8.773111e-05   5.713928
4682  4.680363e-05   3.843964
9966  5.798501e-05   4.792808
7355  6.225177e-05  -5.140136

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
        genename combined_pip      Adi_Sub       Liver    Brain_Cer
10226      PARP9    1.3324158 4.813814e-01 0.030652000 5.868925e-02
360       ACVR1C    1.3152931 2.859044e-01 0.959860638 1.664766e-02
29         ABCA1    1.2348660           NA 0.987044942           NA
5523        GAS6    1.1932815 3.144969e-02 0.961614606 1.945107e-03
67         ABCG8    1.1409033           NA 0.000747084 4.999848e-01
17052     ZDHHC7    1.1403782 9.996663e-01 0.104001437           NA
312         ACP6    1.0548100 2.853345e-02 0.050654731 2.134487e-03
10453       PELO    1.0541969 1.643865e-01 0.240314076 2.824282e-01
3025       CNIH4    1.0498530 2.457995e-02 0.755803524 2.122643e-03
11315      PSRC1    1.0497817 1.404445e-02 0.016793336 3.862892e-04
1641    C10orf88    1.0496902 1.410526e-01 0.577792876 6.247614e-04
14832    ST3GAL4    1.0459086 1.703228e-02 0.999997549 3.603977e-03
15916      TPD52    1.0320897 4.306518e-01 0.026252925 5.364300e-04
14751    SPTY2D1    1.0302435 2.577036e-01 0.738715731 1.783712e-03
7105        KDSR    1.0293851 2.078597e-01 0.691126402 6.965452e-04
3609        CTSH    1.0270150 1.341835e-01 0.374075249 3.447077e-03
4798       FADS1    1.0230721           NA 0.564858208 3.209156e-01
16206     TTC39B    1.0146168 1.974117e-02 0.967267087           NA
677         ALLC    1.0137337 3.314714e-02 0.942356586 1.686409e-03
6758      INSIG2    1.0090069 2.059694e-03 0.999963525 9.252070e-05
1236      ATP1B2    1.0077960 1.597947e-02 0.521483127 1.150982e-03
16301     TXNL4B    1.0018875 1.702622e-03          NA 1.000000e+00
11147      PRKD2    1.0003287 1.514505e-03 0.349413728 6.198809e-05
10351      PCSK9    1.0000876 4.038979e-07          NA           NA
7481        LDLR    0.9999936           NA          NA           NA
14881     STAT5B    0.9992801 1.067772e-02 0.943907328 3.601696e-04
15447      TIMD4    0.9954486 1.540897e-02          NA           NA
6000       GSK3B    0.9954397           NA 0.195862695 1.617626e-01
2349       CCND2    0.9932308 2.873757e-01 0.434477580 1.561283e-02
13424      RRBP1    0.9930750 1.012243e-02 0.980527982 1.882324e-03
5635      GIGYF1    0.9918420 1.000529e-02 0.017710305 3.587551e-04
16564      USP53    0.9827581 1.919691e-02 0.299221485 4.344066e-04
1565        BRI3    0.9738669 2.389125e-02 0.806331540 7.186130e-04
10683       PKN3    0.9734596 1.915571e-01 0.329117124 3.347155e-04
15838       TNKS    0.9710705           NA 0.971070515           NA
7248     KLHDC7A    0.9709886 2.855347e-01 0.668874823 1.199819e-03
10808       PLTP    0.9638364 1.238388e-02 0.918118793 1.328292e-04
3701      CYP2A6    0.9629823 1.423040e-05 0.962968119           NA
4966       FAM3D    0.9555788           NA 0.600502244           NA
7042       KCNK3    0.9522787 4.024430e-01          NA           NA
7813   LINC01184    0.9432497 2.117952e-01 0.431682892 7.557711e-03
9708      NPC1L1    0.9424710 1.554473e-02 0.034386390 3.165223e-04
9156      MTSS1L    0.9415012 2.869371e-02 0.037988566 5.617468e-03
15082      SYTL1    0.9395033 1.756537e-02 0.893456831 6.220692e-03
4101     DNAJC13    0.9367189 1.991409e-02 0.182974607 7.098450e-01
3638     CWF19L1    0.9325472 1.827878e-01 0.454861395 3.059221e-02
6737       INHBB    0.9325080 3.446689e-03 0.924871617           NA
14587     SORCS2    0.9306663 5.363100e-01 0.296935707           NA
3352     CSNK1G3    0.9286371 5.054700e-02 0.856257343 6.359871e-04
13999      SIPA1    0.9279858 6.752746e-01 0.006928125 1.184108e-01
5409       FURIN    0.9264568 1.355123e-02 0.576553826 1.101458e-03
16526       USP1    0.9245645 2.489108e-02 0.028703145           NA
2911      CLDN23    0.9232695 1.212816e-02 0.907301712 3.118523e-03
1127       ASAP3    0.9216397 1.274738e-02 0.864995918 4.348818e-04
15905   TP53INP1    0.9197873 2.253316e-02 0.830580629 1.593936e-03
5414        FUT2    0.9156446           NA 0.915644596           NA
2309      CCDC92    0.9152343 7.742023e-01 0.028025996           NA
11168     PROCA1    0.9147101 3.424823e-01 0.005553268 5.880385e-02
5294         FN1    0.9096370           NA          NA           NA
2402     CD163L1    0.9047461 1.959406e-02 0.035162331 7.328086e-04
2360        CCNJ    0.9011318 8.611409e-01 0.023789855           NA
13620        SCD    0.8966746 3.822587e-01          NA           NA
5783      GOLGA3    0.8939512 2.277986e-01 0.025606415 5.356020e-04
10332     PCMTD2    0.8855392 6.180144e-01 0.223554101 4.281626e-04
36         ABCA8    0.8846797 7.555040e-01          NA 1.290674e-01
13892      SGMS1    0.8789073           NA 0.863725889 1.389671e-02
10934       POP7    0.8789073           NA 0.533414346           NA
5428       FXYD7    0.8775436           NA 0.132962491 9.937231e-04
1427       BCAT2    0.8651509 4.098099e-03 0.824319782 1.169638e-04
15196     TBC1D4    0.8622324 2.038108e-02 0.811208641 9.394026e-04
8273       LRRK2    0.8615823 2.753414e-02 0.809671610 8.321358e-04
15238      TCEA2    0.8615755 5.282914e-02 0.021761223 4.443179e-04
5139       FCGRT    0.8600500 8.303016e-01 0.010602312 4.016893e-04
5819        GPAM    0.8571967 2.547350e-01          NA 1.018165e-05
11160      PRMT6    0.8568345           NA 0.856834534           NA
9840        NTN5    0.8566481 7.270804e-01          NA 1.341094e-02
11557       RAC1    0.8552519 1.108381e-02 0.016419401 8.137966e-01
1526       BMPR2    0.8470469 1.229210e-02 0.787185391           NA
545       AGPAT5    0.8377516 1.678332e-02 0.046780863 5.361411e-04
6307     HLA-DOB    0.8343305 1.111642e-03 0.001890051 8.173763e-01
3264     CRACR2B    0.8286668 4.871339e-02 0.657224870 4.912595e-04
16728      WBP1L    0.8259698 4.510818e-02 0.589511022 8.228377e-04
156   AC007950.2    0.8197025 3.090749e-01 0.019941610           NA
4597       EPHA2    0.8101171           NA 0.624840900           NA
3887       DDX56    0.8078343 9.446560e-03 0.784238219 8.070283e-04
13583       SAT2    0.8051579 9.348889e-03 0.474613148 1.745985e-02
1349     B3GNTL1    0.8025151 2.158654e-01 0.067971071 6.752778e-04
           Adi_Vis        Blood
10226 7.608907e-01 8.025353e-04
360   2.678639e-02 2.609409e-02
29    2.475966e-01 2.244753e-04
5523  1.982721e-01           NA
67    6.401715e-01           NA
17052 3.426133e-02 2.449138e-03
312   6.109232e-01 3.625641e-01
10453 3.657655e-01 1.302661e-03
3025  6.828877e-02 1.990581e-01
11315 1.855758e-02 1.000000e+00
1641  2.804989e-01 4.972113e-02
14832 2.223535e-02 3.039419e-03
15916 5.738659e-01 7.826934e-04
14751 2.908086e-02 2.959613e-03
7105  7.962470e-02 5.007783e-02
3609  5.118495e-01 3.459711e-03
4798  2.837018e-02 1.089281e-01
16206 2.663472e-02 9.738540e-04
677   3.654357e-02           NA
6758  6.694396e-03 1.967410e-04
1236  3.957347e-01 7.344768e-02
16301           NA 1.848591e-04
11147 6.491155e-01 2.229343e-04
10351 8.718695e-05 1.000000e+00
7481            NA 9.999936e-01
14881 1.246038e-02 3.187446e-02
15447 1.603996e-02 9.639997e-01
6000  6.361737e-01 1.640765e-03
2349  2.550588e-01 7.058831e-04
13424           NA 5.422448e-04
5635  9.631938e-01 5.738956e-04
16564 6.639053e-01           NA
1565  1.418192e-01 1.106281e-03
10683           NA 4.524507e-01
15838           NA           NA
7248  1.537917e-02           NA
10808 3.295369e-02 2.472028e-04
3701            NA           NA
4966  3.544714e-01 6.051230e-04
7042  5.498357e-01           NA
7813  2.836093e-01 8.604533e-03
9708  8.922234e-01           NA
9156  7.900653e-01 7.913620e-02
15082 1.540229e-02 6.858159e-03
4101  2.238685e-02 1.598327e-03
3638  2.280853e-01 3.622045e-02
6737  4.075661e-03 1.140056e-04
14587 9.574856e-02 1.672005e-03
3352  2.028048e-02 9.163172e-04
13999 5.305098e-03 1.220672e-01
5409  3.345380e-01 7.123035e-04
16526           NA 8.709703e-01
2911            NA 7.211311e-04
1127  4.062545e-02 2.836096e-03
15905 2.199261e-02 4.308695e-02
5414            NA           NA
2309  1.129175e-01 8.848452e-05
11168 5.078408e-01 2.993583e-05
5294  9.096370e-01           NA
2402  5.000336e-02 7.992535e-01
2360  1.620110e-02           NA
13620 5.144160e-01           NA
5783  6.371643e-01 2.846246e-03
10332 2.807970e-02 1.546280e-02
36    1.081878e-04           NA
13892           NA 1.284725e-03
10934 2.866433e-01 5.884969e-02
5428  7.435873e-01           NA
1427  3.632639e-02 2.896209e-04
15196 2.815557e-02 1.547738e-03
8273  2.244231e-02 1.102111e-03
15238 7.865408e-01           NA
5139  1.839082e-02 3.535513e-04
5819  5.361740e-01 6.627750e-02
11160           NA           NA
9840  1.160729e-01 8.383037e-05
11557 1.240857e-02 1.543520e-03
1526  4.679417e-02 7.752442e-04
545   7.726181e-01 1.033148e-03
6307  1.389156e-02 6.097016e-05
3264  1.108977e-01 1.133958e-02
16728 1.891164e-01 1.411428e-03
156   4.906860e-01           NA
4597  1.852762e-01           NA
3887  1.258655e-02 7.559634e-04
13583 2.797099e-01 2.402608e-02
1349  5.167734e-01 1.229971e-03

Case 11 - Liver and Adipose with separate parameters

Compare with Case 3 (should be identical). Checking version update

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case11")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db;/project2/compbio/predictdb/mashr_models/mashr_Brain_Cerebellum.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Visceral_Omentum.db;/project2/compbio/predictdb/mashr_models/mashr_Whole_Blood.db"
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
05269c3 wesleycrouse 2023-01-10
e7e2b73 wesleycrouse 2023-01-06
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
                 SNP                Liver Adipose_Subcutaneous 
        9.014951e-05         7.943610e-03         1.653321e-02 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
                 SNP                Liver Adipose_Subcutaneous 
           16.908160            48.882570             6.014357 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 126.1567
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
                 SNP                Liver Adipose_Subcutaneous 
             8696600                11003                13082 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
                 SNP                Liver Adipose_Subcutaneous 
         0.038577093          0.012433786          0.003785652 
#total PVE
sum(estimated_group_pve)
[1] 0.05479653
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
                 SNP                Liver Adipose_Subcutaneous 
          0.70400612           0.22690827           0.06908562 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
         genename                group region_tag susie_pip        mu2
4435        PSRC1                Liver       1_67 1.0000000 1669.97618
2454      ST3GAL4                Liver      11_77 1.0000000  170.97506
12008         HPR                Liver      16_38 0.9999974  156.57752
3721       INSIG2                Liver       2_69 0.9999934   68.61282
19759      ZDHHC7 Adipose_Subcutaneous      16_49 0.9995748   28.12889
5991        FADS1                Liver      11_34 0.9993538  163.18332
12687 RP4-781K5.7                Liver      1_121 0.9943723  202.13879
5563        ABCG8                Liver       2_27 0.9943320  311.88467
7410        ABCA1                Liver       9_53 0.9925506   70.70277
1999        PRKD2                Liver      19_33 0.9847824   29.89388
8531         TNKS                Liver       8_12 0.9825262   77.45465
13717       TPD52 Adipose_Subcutaneous       8_57 0.9814059   21.06905
9390         GAS6                Liver      13_62 0.9803514   73.89025
1597         PLTP                Liver      20_28 0.9775544   59.24311
5544        CNIH4                Liver      1_114 0.9743418   41.39386
3755        RRBP1                Liver      20_13 0.9719348   32.36358
11790      CYP2A6                Liver      19_28 0.9693535   29.87336
7040        INHBB                Liver       2_70 0.9651557   74.16214
18637       ABCA8 Adipose_Subcutaneous      17_39 0.9558067   27.28313
25057       SIPA1 Adipose_Subcutaneous      11_36 0.9482173   22.29943
6093      CSNK1G3                Liver       5_75 0.9459871   83.36709
8579       STAT5B                Liver      17_25 0.9418229   30.95090
2092          SP4                Liver       7_19 0.9299248  102.14354
15027      PLPPR2 Adipose_Subcutaneous      19_10 0.9297349   26.56239
6391       TTC39B                Liver       9_13 0.9260892   23.56077
14913       FCGRT Adipose_Subcutaneous      19_34 0.9226574   20.62923
16361      CCDC92 Adipose_Subcutaneous      12_75 0.9184631   25.75399
15242        CCNJ Adipose_Subcutaneous      10_61 0.9161743   20.53548
8865         FUT2                Liver      19_33 0.9059564   70.62362
17109         POR Adipose_Subcutaneous       7_48 0.8988319   31.79763
21459      PROCA1 Adipose_Subcutaneous      17_17 0.8980242   26.41207
3562       ACVR1C                Liver       2_94 0.8864262   25.08450
10657      TRIM39                Liver       6_26 0.8823124   82.45322
4704        DDX56                Liver       7_33 0.8805617   57.91924
6957         USP1                Liver       1_39 0.8781087  255.86929
18314     FAM117B Adipose_Subcutaneous      2_120 0.8638575   36.68997
6100         ALLC                Liver        2_2 0.8446148   28.32175
13346       SPHK2 Adipose_Subcutaneous      19_33 0.8363777   34.76342
24743      PCMTD2 Adipose_Subcutaneous      20_38 0.8314753   28.89377
14273         SCD Adipose_Subcutaneous      10_64 0.8185682   19.92754
18041       IL1RN Adipose_Subcutaneous       2_67 0.8157131   21.52025
18737        NTN5 Adipose_Subcutaneous      19_33 0.8143027   54.80574
               PVE          z
4435  4.859936e-03 -41.687336
2454  4.975687e-04  13.376072
12008 4.556680e-04 -17.962770
3721  1.996746e-04  -8.982702
19759 8.182541e-05  -5.184802
5991  4.745865e-04  12.926351
12687 5.849503e-04 -15.108415
5563  9.024969e-04 -20.293982
7410  2.042252e-04   7.982017
1999  8.567278e-05   5.072217
8531  2.214685e-04  11.038564
13717 6.017471e-05  -4.684363
9390  2.108090e-04  -8.923688
1597  1.685385e-04  -5.732491
5544  1.173728e-04   6.145535
3755  9.154065e-05   7.008305
11790 8.427263e-05   5.407028
7040  2.083051e-04  -8.518936
18637 7.589001e-05   4.800447
25057 6.153497e-05  -5.096493
6093  2.295092e-04   9.116291
8579  8.483262e-05   5.426252
2092  2.764261e-04  10.693191
15027 7.186983e-05   3.965665
6391  6.349836e-05  -4.334495
14913 5.539158e-05  -4.347956
16361 6.883773e-05  -5.328046
15242 5.475241e-05  -4.639318
8865  1.861991e-04 -11.927107
17109 8.317515e-05   6.044322
21459 6.902569e-05   5.543282
3562  6.470955e-05  -4.687370
10657 2.117144e-04   8.840164
4704  1.484236e-04   9.641861
6957  6.538630e-04  16.258211
18314 9.223799e-05   7.852653
6100  6.961439e-05   4.919066
13346 8.461459e-05  -8.721460
24743 6.991557e-05  -5.636854
14273 4.747105e-05  -4.541468
18041 5.108638e-05   4.455379
18737 1.298770e-04  11.132252

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
         genename combined_pip      Adi_Sub       Liver
290        ACVR1C    1.2821451 3.957190e-01 0.886426187
12546     ST3GAL4    1.0408713 4.087127e-02 0.999999993
9618        PSRC1    1.0329403 3.294030e-02 1.000000000
1409     C10orf88    1.0324476 2.351908e-01 0.797256743
2582        CNIH4    1.0263117 5.196981e-02 0.974341842
4697         GAS6    1.0252303 4.487895e-02 0.980351368
5734        INHBB    1.0242873 5.913165e-02 0.965155688
14503      ZDHHC7    1.0232406 9.995748e-01 0.023665871
5464          HPR    1.0166104 1.661304e-02 0.999997354
2864      CSNK1G3    1.0136737 6.768661e-02 0.945987078
5750       INSIG2    1.0133149 1.332146e-02 0.999993445
6035         KDSR    1.0028532 5.540887e-01 0.448764468
3125       CYP2A6    1.0014824 3.212884e-02 0.969353536
4086        FADS1    0.9993538           NA 0.999353840
8854         PELO    0.9979494 2.326841e-01 0.765265259
11311       RRBP1    0.9945557 2.262086e-02 0.971934797
11157 RP4-781K5.7    0.9943723           NA 0.994372303
61          ABCG8    0.9943320           NA 0.994331953
26          ABCA1    0.9925506           NA 0.992550635
9167         PLTP    0.9900735 1.251913e-02 0.977554352
9473        PRKD2    0.9892652 4.482861e-03 0.984782375
13505       TPD52    0.9891353 9.814059e-01 0.007729362
13436        TNKS    0.9825262           NA 0.982526205
12471     SPTY2D1    0.9768431 4.886837e-01 0.488159443
12506        SRRT    0.9757710 4.263893e-01 0.549381704
12588      STAT5B    0.9748353 3.301246e-02 0.941822874
12353         SP4    0.9734963 4.357146e-02 0.929924805
13763      TTC39B    0.9716737 4.558453e-02 0.926089174
6154      KLHDC7A    0.9693547 6.579458e-01 0.311408861
31          ABCA8    0.9558067 9.558067e-01          NA
11812       SIPA1    0.9541853 9.482173e-01 0.005967923
2014        CCND2    0.9525749 7.250596e-01 0.227515311
9055         PKN3    0.9395463 4.786795e-01 0.460866780
9161       PLPPR2    0.9349051 9.297349e-01 0.005170169
14045        USP1    0.9339745 5.586579e-02 0.878108668
1978       CCDC92    0.9324492 9.184631e-01 0.013986025
4386        FCGRT    0.9253876 9.226574e-01 0.002730238
2025         CCNJ    0.9215620 9.161743e-01 0.005387656
569          ALLC    0.9204296 7.581480e-02 0.844614756
3070      CWF19L1    0.9152537 4.971167e-01 0.418137016
4613         FUT2    0.9059564           NA 0.905956407
9491       PROCA1    0.9052901 8.980242e-01 0.007265855
8749       PCMTD2    0.9050256 8.314753e-01 0.073550293
9282          POR    0.9045059 8.988319e-01 0.005673998
3290        DDX56    0.9043148 2.375309e-02 0.880561708
13607      TRIM39    0.8823125 9.626981e-08 0.882312357
4114      FAM117B    0.8638575 8.638575e-01          NA
12330      SORCS2    0.8570663 7.657866e-01 0.091279663
12414       SPHK2    0.8521159 8.363777e-01 0.015738193
12766       SYTL1    0.8431171 4.316092e-02 0.799956216
6610    LINC01184    0.8410288 6.047286e-01 0.236300159
1346         BRI3    0.8302506 6.192655e-02 0.768324031
11479         SCD    0.8185682 8.185682e-01          NA
5688        IL1RN    0.8157131 8.157131e-01          NA
8341         NTN5    0.8143027 8.143027e-01          NA

Case 12 - Liver and Adipose with shared variance parameter

Compare with Case 11 (shared/separate variance parameter).

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case12")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db;/project2/compbio/predictdb/mashr_models/mashr_Brain_Cerebellum.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Visceral_Omentum.db;/project2/compbio/predictdb/mashr_models/mashr_Whole_Blood.db"
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
05269c3 wesleycrouse 2023-01-10
e7e2b73 wesleycrouse 2023-01-06
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
                 SNP                Liver Adipose_Subcutaneous 
        0.0001116982         0.0129542199         0.0026091007 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
                 SNP                Liver Adipose_Subcutaneous 
            14.47132             28.49114             28.49114 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 20.10456
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
                 SNP                Liver Adipose_Subcutaneous 
             8696600                11003                13082 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
                 SNP                Liver Adipose_Subcutaneous 
         0.040909490          0.011818233          0.002830057 
#total PVE
sum(estimated_group_pve)
[1] 0.05555778
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
                 SNP                Liver Adipose_Subcutaneous 
          0.73634134           0.21271968           0.05093898 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
         genename                group region_tag susie_pip        mu2
4435        PSRC1                Liver       1_67 1.0000000 1626.88223
2454      ST3GAL4                Liver      11_77 1.0000000  167.96212
12008         HPR                Liver      16_38 0.9999974  155.23961
3721       INSIG2                Liver       2_69 0.9999946   66.86169
5991        FADS1                Liver      11_34 0.9959247  159.21101
7410        ABCA1                Liver       9_53 0.9943717   68.82968
12687 RP4-781K5.7                Liver      1_121 0.9941559  198.00757
5563        ABCG8                Liver       2_27 0.9925097  302.47821
19759      ZDHHC7 Adipose_Subcutaneous      16_49 0.9919186   55.36199
1999        PRKD2                Liver      19_33 0.9901754   29.13302
8531         TNKS                Liver       8_12 0.9870597   74.29154
1597         PLTP                Liver      20_28 0.9859443   60.28284
3755        RRBP1                Liver      20_13 0.9847381   31.56626
9390         GAS6                Liver      13_62 0.9843943   70.56153
5544        CNIH4                Liver      1_114 0.9825167   39.99824
11790      CYP2A6                Liver      19_28 0.9778817   30.53565
7040        INHBB                Liver       2_70 0.9742317   72.52491
6391       TTC39B                Liver       9_13 0.9637061   22.67453
8579       STAT5B                Liver      17_25 0.9590107   29.97853
6093      CSNK1G3                Liver       5_75 0.9579673   81.82850
3562       ACVR1C                Liver       2_94 0.9461956   24.87046
2092          SP4                Liver       7_19 0.9403605   99.42238
18314     FAM117B Adipose_Subcutaneous      2_120 0.9389185   45.51266
21750       PCSK9 Adipose_Subcutaneous       1_34 0.9361436  115.55426
4704        DDX56                Liver       7_33 0.9311872   57.21137
13717       TPD52 Adipose_Subcutaneous       8_57 0.9221789   25.09527
8865         FUT2                Liver      19_33 0.9220956   63.67196
10657      TRIM39                Liver       6_26 0.9152017   80.31867
1114         SRRT                Liver       7_62 0.9135259   31.83115
5415        SYTL1                Liver       1_19 0.8996933   21.03816
6100         ALLC                Liver        2_2 0.8833498   27.35371
6957         USP1                Liver       1_39 0.8754830  248.62470
14001        CETP Adipose_Subcutaneous      16_31 0.8643969  137.72613
8418         POP7                Liver       7_62 0.8632259   40.20178
8931      CRACR2B                Liver       11_1 0.8567356   20.47676
3247         KDSR                Liver      18_35 0.8539965   24.12790
7355         BRI3                Liver       7_60 0.8390940   28.65836
1144        ASAP3                Liver       1_16 0.8370239   33.31359
11726      CLDN23                Liver       8_11 0.8344469   24.40140
21459      PROCA1 Adipose_Subcutaneous      17_17 0.8230088   31.91334
3300     C10orf88                Liver      10_77 0.8225006   36.12517
16361      CCDC92 Adipose_Subcutaneous      12_75 0.8062972   30.70467
               PVE          z
4435  4.734525e-03 -41.687336
2454  4.888005e-04  13.376072
12008 4.517745e-04 -17.962770
3721  1.945787e-04  -8.982702
5991  4.614449e-04  12.926351
7410  1.991796e-04   7.982017
12687 5.728707e-04 -15.108415
5563  8.736735e-04 -20.293982
19759 1.598115e-04  -5.184802
1999  8.394946e-05   5.072217
8531  2.134043e-04  11.038564
1597  1.729682e-04  -5.732491
3755  9.046158e-05   7.008305
9390  2.021424e-04  -8.923688
5544  1.143671e-04   6.145535
11790 8.689880e-05   5.407028
7040  2.056221e-04  -8.518936
6391  6.359212e-05  -4.334495
8579  8.366697e-05   5.426252
6093  2.281264e-04   9.116291
3562  6.848337e-05  -4.687370
2092  2.720814e-04  10.693191
18314 1.243599e-04   7.852653
21750 3.148101e-04  17.210869
4704  1.550385e-04   9.641861
13717 6.734840e-05  -4.684363
8865  1.708616e-04 -11.927107
10657 2.139211e-04   8.840164
1114  8.462397e-05   5.424996
5415  5.508364e-05  -3.962854
6100  7.031845e-05   4.919066
6957  6.334499e-04  16.258211
14001 3.464574e-04  13.814427
8418  1.009927e-04  -5.845258
8931  5.105384e-05  -3.989585
3247  5.996474e-05  -4.526287
7355  6.998133e-05  -5.140136
1144  8.114833e-05   5.283225
11726 5.925620e-05   4.720010
21459 7.643583e-05   5.543282
3300  8.647020e-05  -6.787850
16361 7.204767e-05  -5.328046

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
         genename combined_pip      Adi_Sub      Liver
290        ACVR1C    1.0919254 1.457298e-01 0.94619558
14503      ZDHHC7    1.0143176 9.919186e-01 0.02239896
9618        PSRC1    1.0049042 4.904162e-03 1.00000000
12546     ST3GAL4    1.0037794 3.779390e-03 0.99999999
5464          HPR    1.0012483 1.250912e-03 0.99999743
5750       INSIG2    1.0009622 9.675573e-04 0.99999461
4086        FADS1    0.9959247           NA 0.99592468
26          ABCA1    0.9943717           NA 0.99437167
11157 RP4-781K5.7    0.9941559           NA 0.99415585
61          ABCG8    0.9925097           NA 0.99250968
9473        PRKD2    0.9905019 3.265447e-04 0.99017536
4697         GAS6    0.9884988 4.104520e-03 0.98439430
2582        CNIH4    0.9876066 5.089986e-03 0.98251665
9167         PLTP    0.9874755 1.531238e-03 0.98594429
13436        TNKS    0.9870597           NA 0.98705970
11311       RRBP1    0.9864557 1.717670e-03 0.98473807
3125       CYP2A6    0.9804714 2.589616e-03 0.97788175
5734        INHBB    0.9798619 5.630194e-03 0.97423173
6035         KDSR    0.9796114 1.256149e-01 0.85399648
8854         PELO    0.9792709 4.558552e-01 0.52341569
13763      TTC39B    0.9674541 3.747977e-03 0.96370610
2864      CSNK1G3    0.9646682 6.700874e-03 0.95796729
1409     C10orf88    0.9634702 1.409695e-01 0.82250065
12588      STAT5B    0.9625309 3.520220e-03 0.95901072
12506        SRRT    0.9560986 4.257270e-02 0.91352585
12471     SPTY2D1    0.9439342 1.604653e-01 0.78346899
12353         SP4    0.9436641 3.303588e-03 0.94036053
4114      FAM117B    0.9389185 9.389185e-01         NA
13505       TPD52    0.9382255 9.221789e-01 0.01604663
8766        PCSK9    0.9361436 9.361436e-01         NA
6154      KLHDC7A    0.9340420 1.589261e-01 0.77511594
3290        DDX56    0.9332113 2.024081e-03 0.93118722
9055         PKN3    0.9285528 3.211265e-01 0.60742625
4613         FUT2    0.9220956           NA 0.92209560
2014        CCND2    0.9195842 2.325045e-01 0.68707970
13607      TRIM39    0.9152017 7.314903e-09 0.91520172
12766       SYTL1    0.9039044 4.211042e-03 0.89969332
569          ALLC    0.8905400 7.190101e-03 0.88334985
2316         CETP    0.8903566 8.643969e-01 0.02595972
14045        USP1    0.8813365 5.853500e-03 0.87548299
2791      CRACR2B    0.8728596 1.612406e-02 0.85673557
1978       CCDC92    0.8702236 8.062972e-01 0.06392641
9279         POP7    0.8632259           NA 0.86322588
9491       PROCA1    0.8485556 8.230088e-01 0.02554682
1346         BRI3    0.8440816 4.987605e-03 0.83909398
960         ASAP3    0.8392828 2.258911e-03 0.83702393
2486       CLDN23    0.8365072 2.060353e-03 0.83444689
3070      CWF19L1    0.8235274 1.749985e-01 0.64852890
9161       PLPPR2    0.8085539 7.833952e-01 0.02515870
13494    TP53INP1    0.8041823 4.315098e-03 0.79986722
3043         CTSH    0.8029275 9.509420e-02 0.70783325

Case 13 - Liver, Adipose, and Cerebellum with shared variance parameter

Compare with Case 6.

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case13")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db;/project2/compbio/predictdb/mashr_models/mashr_Brain_Cerebellum.db" 
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
05269c3 wesleycrouse 2023-01-10
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
                 SNP                Liver Adipose_Subcutaneous 
        9.333309e-05         1.332199e-02         3.589102e-03 
    Brain_Cerebellum 
        4.108110e-03 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
                 SNP                Liver Adipose_Subcutaneous 
            15.73683             23.65586             23.65586 
    Brain_Cerebellum 
            23.65586 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 32.35335
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
                 SNP                Liver Adipose_Subcutaneous 
             8696600                11003                13082 
    Brain_Cerebellum 
               12270 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
                 SNP                Liver Adipose_Subcutaneous 
         0.037172591          0.010091115          0.003232354 
    Brain_Cerebellum 
         0.003470129 
#total PVE
sum(estimated_group_pve)
[1] 0.05396619
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
                 SNP                Liver Adipose_Subcutaneous 
          0.68881260           0.18698959           0.05989590 
    Brain_Cerebellum 
          0.06430191 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
         genename                group region_tag susie_pip        mu2
4435        PSRC1                Liver       1_67 1.0000000 1602.98634
2454      ST3GAL4                Liver      11_77 0.9999999  164.68658
3721       INSIG2                Liver       2_69 0.9999928   65.78245
12008         HPR                Liver      16_38 0.9999790  261.07796
19759      ZDHHC7 Adipose_Subcutaneous      16_49 0.9998493   36.58168
33017      TXNL4B     Brain_Cerebellum      16_38 0.9979769  242.66624
7410        ABCA1                Liver       9_53 0.9941050   67.97659
1999        PRKD2                Liver      19_33 0.9894183   28.82165
12687 RP4-781K5.7                Liver      1_121 0.9864862  195.29234
8531         TNKS                Liver       8_12 0.9861216   73.65942
1597         PLTP                Liver      20_28 0.9852447   58.34963
3755        RRBP1                Liver      20_13 0.9844824   31.26263
9390         GAS6                Liver      13_62 0.9840411   69.77028
5544        CNIH4                Liver      1_114 0.9830782   39.80171
11790      CYP2A6                Liver      19_29 0.9798128   30.02691
5991        FADS1                Liver      11_34 0.9796735  156.22031
7040        INHBB                Liver       2_70 0.9719451   71.69128
32453        RAC1     Brain_Cerebellum       7_10 0.9610294   34.51760
6391       TTC39B                Liver       9_13 0.9541690   22.58449
6093      CSNK1G3                Liver       5_75 0.9524488   80.75583
18314     FAM117B Adipose_Subcutaneous      2_120 0.9468700   45.01400
21750       PCSK9 Adipose_Subcutaneous       1_34 0.9456269  114.46717
3562       ACVR1C                Liver       2_94 0.9451931   24.26453
13717       TPD52 Adipose_Subcutaneous       8_57 0.9450360   24.65735
8579       STAT5B                Liver      17_25 0.9361063   26.07727
4704        DDX56                Liver       7_33 0.9261551   56.52302
2092          SP4                Liver       7_19 0.9219276   97.94430
8865         FUT2                Liver      19_33 0.9216246   62.28125
6100         ALLC                Liver        2_2 0.9020174   26.98982
10657      TRIM39                Liver       6_26 0.9015216   79.57352
32727     DNAJC13     Brain_Cerebellum       3_82 0.8785432   53.49055
8418         POP7                Liver       7_62 0.8763822   39.50538
1114         SRRT                Liver       7_62 0.8738374   31.28521
16361      CCDC92 Adipose_Subcutaneous      12_75 0.8728620   30.62983
1144        ASAP3                Liver       1_16 0.8636412   33.03044
7355         BRI3                Liver       7_60 0.8625795   28.34593
5415        SYTL1                Liver       1_19 0.8618968   20.82497
29649       PDE4C     Brain_Cerebellum      19_14 0.8437016   45.74631
6957         USP1                Liver       1_39 0.8429200  244.94768
8931      CRACR2B                Liver       11_1 0.8404625   20.43448
11726      CLDN23                Liver       8_11 0.8399300   24.16126
14001        CETP Adipose_Subcutaneous      16_31 0.8227964  135.37120
3247         KDSR                Liver      18_35 0.8224100   23.87317
14913       FCGRT Adipose_Subcutaneous      19_34 0.8208536   24.92849
7588        WBP1L                Liver      10_66 0.8017364   21.65757
               PVE          z
4435  4.664983e-03 -41.687336
2454  4.792681e-04  13.376072
3721  1.914376e-04  -8.982702
12008 7.597687e-04 -17.962770
19759 1.064433e-04  -5.184802
33017 7.047745e-04  18.129274
7410  1.966581e-04   7.982017
1999  8.298874e-05   5.072217
12687 5.606561e-04 -15.108415
8531  2.113874e-04  11.038564
1597  1.673025e-04  -5.732491
3755  8.956820e-05   7.008305
9390  1.998039e-04  -8.923688
5544  1.138702e-04   6.145535
11790 8.561976e-05   5.407028
5991  4.453886e-04  12.926351
7040  2.027815e-04  -8.518936
32453 9.653783e-05   5.713928
6391  6.271277e-05  -4.334495
6093  2.238390e-04   9.116291
18314 1.240390e-04   7.852653
21750 3.150076e-04  17.210869
3562  6.674408e-05  -4.687370
13717 6.781332e-05  -4.684363
8579  7.104076e-05   5.426252
4704  1.523454e-04   9.641861
2092  2.627824e-04  10.693191
8865  1.670443e-04 -11.927107
6100  7.084924e-05   4.919066
10657 2.087685e-04   8.840164
32727 1.367604e-04  -6.853813
8418  1.007558e-04  -5.845258
1114  7.955911e-05   5.424996
16361 7.780554e-05  -5.328046
1144  8.301718e-05   5.283225
7355  7.115578e-05  -5.140136
5415  5.223481e-05  -3.962854
29649 1.123221e-04  -6.633160
6957  6.008693e-04  16.258211
8931  4.998070e-05  -3.989585
11726 5.905859e-05   4.720010
14001 3.241447e-04  13.814427
3247  5.713718e-05  -4.526287
14913 5.955004e-05  -4.347956
7588  5.053144e-05  -4.255756

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
         genename combined_pip      Adi_Sub       Liver    Brain_Cer
340        ACVR1C    1.2123109 1.250360e-01 0.945193080 1.420818e-01
16026      ZDHHC7    1.0564679 9.998493e-01 0.056618571           NA
13917     ST3GAL4    1.0234454 5.882168e-03 0.999999943 1.756326e-02
10622       PSRC1    1.0137972 9.352991e-03 0.999999958 4.444220e-03
2858        CNIH4    1.0094536 7.924097e-03 0.983078156 1.845136e-02
5200         GAS6    1.0039368 7.081596e-03 0.984041137 1.281411e-02
9807         PELO    1.0035573 3.407471e-01 0.273072009 3.897383e-01
6352       INSIG2    1.0034949 1.610152e-03 0.999992806 1.891893e-03
1552     C10orf88    1.0017972 2.004888e-01 0.795673334 5.635067e-03
12582       RRBP1    1.0008865 2.551325e-03 0.984482435 1.385271e-02
4524        FADS1    1.0002394           NA 0.979673454 2.056598e-02
6031          HPR    0.9999921 4.498040e-06 0.999978989 8.646505e-06
15317      TXNL4B    0.9979811 4.171803e-06          NA 9.979769e-01
27          ABCA1    0.9941050           NA 0.994105024           NA
6671         KDSR    0.9911919 1.621890e-01 0.822409979 6.592987e-03
10464       PRKD2    0.9911063 4.924223e-04 0.989418312 1.195612e-03
10143        PLTP    0.9876519 1.553368e-03 0.985244711 8.538129e-04
12404 RP4-781K5.7    0.9864862           NA 0.986486205           NA
14875        TNKS    0.9861216           NA 0.986121637           NA
3491       CYP2A6    0.9828954 3.082616e-03 0.979812756           NA
6331        INHBB    0.9801150 8.169896e-03 0.971945109           NA
3871      DNAJC13    0.9800936 4.651118e-03 0.096899248 8.785432e-01
14693     TMEM199    0.9756318 3.027940e-01 0.672837747           NA
10851        RAC1    0.9727363 3.227328e-03 0.008479539 9.610294e-01
13837     SPTY2D1    0.9716575 2.061362e-01 0.752200412 1.332088e-02
3167      CSNK1G3    0.9698716 1.061934e-02 0.952448797 6.803433e-03
14949       TPD52    0.9687700 9.450360e-01 0.017125854 6.608215e-03
4552      FAM117B    0.9603844 9.468700e-01          NA 1.351446e-02
15225      TTC39B    0.9582627 4.093758e-03 0.954168987           NA
6800      KLHDC7A    0.9573810 2.025757e-01 0.738875664 1.592964e-02
2222        CCND2    0.9479193 2.417321e-01 0.538640933 1.675463e-01
13964      STAT5B    0.9459437 4.587638e-03 0.936106326 5.249786e-03
9715        PCSK9    0.9456269 9.456269e-01          NA           NA
13875        SRRT    0.9424619 5.633359e-02 0.873837374 1.229089e-02
10021        PKN3    0.9397041 3.846594e-01 0.549482702 5.561920e-03
13123       SIPA1    0.9395178 4.562390e-01 0.016969105 4.663096e-01
5646        GSK3B    0.9334508           NA 0.238642189 6.948086e-01
3667        DDX56    0.9327161 3.114461e-03 0.926155068 3.446609e-03
13707         SP4    0.9319115 5.564933e-03 0.921927560 4.419035e-03
643          ALLC    0.9292543 1.091907e-02 0.902017442 1.631780e-02
14160       SYTL1    0.9217650 6.370510e-03 0.861896846 5.349765e-02
5098         FUT2    0.9216246           NA 0.921624629           NA
5906    HIST1H2BH    0.9198214           NA 0.702429389 2.173920e-01
10136      PLPPR2    0.9102660 5.590873e-01 0.012997442 3.381813e-01
2185       CCDC92    0.9085885 8.728620e-01 0.035726493           NA
15056      TRIM39    0.9015216 1.208964e-08 0.901521554 9.158152e-09
33          ABCA8    0.8907048 2.420438e-01          NA 6.486610e-01
1481         BRI3    0.8776694 7.762690e-03 0.862579508 7.327203e-03
10260        POP7    0.8763822           NA 0.876382158           NA
1069        ASAP3    0.8706595 3.370851e-03 0.863641209 3.647419e-03
3085      CRACR2B    0.8705106 2.266165e-02 0.840462541 7.386389e-03
2753       CLDN23    0.8670192 3.108195e-03 0.839930047 2.398100e-02
3400         CTSH    0.8592328 1.196871e-01 0.679337870 6.020783e-02
13022       SGMS1    0.8554455           NA 0.788856552 6.658898e-02
15525        USP1    0.8521388 9.218802e-03 0.842920019           NA
2554         CETP    0.8496455 8.227964e-01 0.026849046           NA
9747        PDE4C    0.8488174 1.209061e-03 0.003906739 8.437016e-01
10485      PROCA1    0.8380053 3.774916e-01 0.008887177 4.516266e-01
4846        FCGRT    0.8359048 8.208536e-01 0.009498018 5.553204e-03
13065      SH3TC1    0.8265420 5.200926e-03 0.034925960 7.864151e-01
15721       WBP1L    0.8206076 1.215773e-02 0.801736360 6.713516e-03
9498         P3H4    0.8149476 6.337859e-03 0.021578192 7.870315e-01
7323    LINC01184    0.8126238 1.664548e-01 0.553482364 9.268660e-02
9697       PCMTD2    0.8125841 6.243396e-01 0.184851755 3.392750e-03
63          ABCG8    0.8094920           NA 0.618648580 1.908434e-01
14938    TP53INP1    0.8081431 6.231757e-03 0.787302612 1.460875e-02

Case 14 - Liver, Adipose x2, Cerebellum with shared variance parameter

Compare with Case 9.

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case14")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db;/project2/compbio/predictdb/mashr_models/mashr_Brain_Cerebellum.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Visceral_Omentum.db" 
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
05269c3 wesleycrouse 2023-01-10
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
                     SNP                    Liver     Adipose_Subcutaneous 
            8.011991e-05             1.374124e-02             2.333627e-03 
        Brain_Cerebellum Adipose_Visceral_Omentum 
            4.624670e-03             5.671851e-03 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
                     SNP                    Liver     Adipose_Subcutaneous 
                15.53794                 21.98822                 21.98822 
        Brain_Cerebellum Adipose_Visceral_Omentum 
                21.98822                 21.98822 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 20.45514
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
                     SNP                    Liver     Adipose_Subcutaneous 
                 8696600                    11003                    13082 
        Brain_Cerebellum Adipose_Visceral_Omentum 
                   12270                    12921 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
                     SNP                    Liver     Adipose_Subcutaneous 
             0.031506764              0.009674919              0.001953511 
        Brain_Cerebellum Adipose_Visceral_Omentum 
             0.003631079              0.004689551 
#total PVE
sum(estimated_group_pve)
[1] 0.05145582
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
                     SNP                    Liver     Adipose_Subcutaneous 
              0.61230705               0.18802378               0.03796483 
        Brain_Cerebellum Adipose_Visceral_Omentum 
              0.07056691               0.09113743 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
         genename                    group region_tag susie_pip        mu2
2454      ST3GAL4                    Liver      11_77 0.9999999  163.52465
4435        PSRC1                    Liver       1_67 0.9999998 1590.61483
3721       INSIG2                    Liver       2_70 0.9999901   65.06103
12008         HPR                    Liver      16_38 0.9999744  258.95364
19759      ZDHHC7     Adipose_Subcutaneous      16_49 0.9996571   35.53027
33017      TXNL4B         Brain_Cerebellum      16_38 0.9982124  241.06718
7410        ABCA1                    Liver       9_53 0.9948505   67.58742
8531         TNKS                    Liver       8_12 0.9878231   73.93134
9390         GAS6                    Liver      13_62 0.9865505   69.82255
12687 RP4-781K5.7                    Liver      1_121 0.9862844  194.13999
3755        RRBP1                    Liver      20_13 0.9854216   31.13865
1597         PLTP                    Liver      20_28 0.9832546   56.35184
41704      NPC1L1 Adipose_Visceral_Omentum       7_33 0.9827337   98.33436
5991        FADS1                    Liver      11_34 0.9805012  154.79920
47884       TIMD4 Adipose_Visceral_Omentum       5_92 0.9771478  172.63280
7040        INHBB                    Liver       2_70 0.9749223   71.62303
11790      CYP2A6                    Liver      19_30 0.9694575   31.84887
32453        RAC1         Brain_Cerebellum       7_10 0.9685718   34.30625
6391       TTC39B                    Liver       9_13 0.9579879   22.45340
6093      CSNK1G3                    Liver       5_75 0.9577573   80.41232
3562       ACVR1C                    Liver       2_94 0.9513654   24.17632
8579       STAT5B                    Liver      17_25 0.9403266   25.72122
8865         FUT2                    Liver      19_33 0.9402685   62.27858
5544        CNIH4                    Liver      1_114 0.9385873   39.69295
2092          SP4                    Liver       7_19 0.9284592   97.41280
21750       PCSK9     Adipose_Subcutaneous       1_34 0.9254081  114.39696
6100         ALLC                    Liver        2_2 0.9157473   26.76540
10657      TRIM39                    Liver       6_26 0.9146793   79.08981
32727     DNAJC13         Brain_Cerebellum       3_82 0.9018268   53.21179
18314     FAM117B     Adipose_Subcutaneous      2_120 0.8951517   44.97659
29649       PDE4C         Brain_Cerebellum      19_14 0.8738930   45.54257
1144        ASAP3                    Liver       1_16 0.8665556   32.72072
5415        SYTL1                    Liver       1_19 0.8639573   20.77840
49316        ACP6 Adipose_Visceral_Omentum       1_73 0.8564329   22.58123
11726      CLDN23                    Liver       8_11 0.8515286   23.95407
44743         FN1 Adipose_Visceral_Omentum      2_127 0.8398339   23.23263
6957         USP1                    Liver       1_39 0.8348250  243.60423
3247         KDSR                    Liver      18_35 0.8316565   23.81953
10495       PRMT6                    Liver       1_67 0.8244856   31.00826
10519       SGMS1                    Liver      10_33 0.8142843   23.42892
7355         BRI3                    Liver       7_60 0.8082640   28.29703
7394     TP53INP1                    Liver       8_66 0.8069701   20.90114
42717        CETP Adipose_Visceral_Omentum      16_31 0.8057008  128.20385
               PVE          z
2454  4.758866e-04  13.376072
4435  4.628979e-03 -41.687336
3721  1.893376e-04  -8.982702
12008 7.535832e-04 -17.962770
19759 1.033641e-04  -5.184802
33017 7.002955e-04  18.129274
7410  1.956789e-04   7.982017
8531  2.125338e-04  11.038564
9390  2.004635e-04  -8.923688
12687 5.572338e-04 -15.108415
3755  8.929809e-05   7.008305
1597  1.612480e-04  -5.732491
41704 2.812299e-04  11.631021
5991  4.417099e-04  12.926351
47884 4.909122e-04 -13.882363
7040  2.032090e-04  -8.518936
11790 8.985519e-05   5.407028
32453 9.669976e-05   5.713928
6391  6.259827e-05  -4.334495
6093  2.241292e-04   9.116291
3562  6.693570e-05  -4.687370
8579  7.038669e-05   5.426252
8865  1.704162e-04 -11.927107
5544  1.084197e-04   6.145535
2092  2.632080e-04  10.693191
21750 3.080832e-04  17.210869
6100  7.132959e-05   4.919066
10657 2.105279e-04   8.840164
32727 1.396533e-04  -6.853813
18314 1.171665e-04   7.852653
29649 1.158233e-04  -6.633160
1144  8.251628e-05   5.283225
5415  5.224258e-05  -3.962854
49316 5.628092e-05   4.214320
11726 5.936067e-05   4.720010
44743 5.678217e-05  -4.446065
6957  5.918349e-04  16.258211
3247  5.764976e-05  -4.526287
10495 7.440134e-05  -5.323721
10519 5.551990e-05   4.873968
7355  6.656017e-05  -5.140136
7394  4.908487e-05   4.038448
42717 3.006043e-04  13.379190

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
         genename combined_pip      Adi_Sub       Liver    Brain_Cer
9941        PARP9    1.3367898 1.752632e-01 0.020464569 3.464715e-01
349        ACVR1C    1.2197234 8.840689e-02 0.951365375 1.692906e-01
65          ABCG8    1.2027514           NA 0.479634687 1.614870e-01
28          ABCA1    1.1350257           NA 0.994850509           NA
5380         GAS6    1.1244255 4.797976e-03 0.986550469 1.502919e-02
16602      ZDHHC7    1.0882290 9.996571e-01 0.067367853           NA
14430     ST3GAL4    1.0345794 3.790030e-03 0.999999936 2.037016e-02
11000       PSRC1    1.0299293 6.578921e-03 0.999999760 5.036626e-03
10161        PELO    1.0222309 1.534340e-01 0.194908597 3.029556e-01
2956        CNIH4    1.0219682 5.289798e-03 0.938587304 2.218162e-02
1603     C10orf88    1.0138067 1.027695e-01 0.621010246 6.528122e-03
4006      DNAJC13    1.0133912 3.182158e-03 0.101485375 9.018268e-01
4675        FADS1    1.0086358           NA 0.980501172 2.254453e-02
15499       TPD52    1.0044493 3.062206e-01 0.016774142 6.186554e-03
13049       RRBP1    1.0035789 1.737328e-03 0.985421591 1.641995e-02
9441       NPC1L1    1.0034723 2.482109e-03 0.014553205 3.703290e-03
6578       INSIG2    1.0028254 3.201527e-04 0.999990091 8.147079e-04
6247          HPR    0.9999912 2.480870e-06 0.999974364 8.221915e-06
6911         KDSR    0.9990814 1.036676e-01 0.831656485 7.659417e-03
10835       PRKD2    0.9987838 3.297884e-04 0.385161351 8.429085e-04
5842        GSK3B    0.9987683           NA 0.133376494 3.891456e-01
15874      TXNL4B    0.9982147 2.311958e-06          NA 9.982124e-01
1207       ATP1B2    0.9970214 4.280899e-03 0.597292998 9.283089e-03
10506        PLTP    0.9903894 1.041701e-03 0.983254560 9.595032e-04
15423        TNKS    0.9878231           NA 0.987823071           NA
12863 RP4-781K5.7    0.9862844           NA 0.986284389           NA
11235        RAC1    0.9851584 2.216868e-03 0.009293936 9.685718e-01
6110    HIST1H2BH    0.9819819           NA 0.727829997 2.453451e-01
3275      CSNK1G3    0.9819220 7.330774e-03 0.957757289 7.832039e-03
13610       SIPA1    0.9811207 2.201683e-01 0.014544757 7.410119e-01
15038       TIMD4    0.9805106 3.362891e-03          NA           NA
6557        INHBB    0.9767347 5.283353e-04 0.974922323           NA
15781      TTC39B    0.9701548 2.981142e-03 0.957987868           NA
3613       CYP2A6    0.9694603 2.734585e-06 0.969457529           NA
7049      KLHDC7A    0.9661042 1.385995e-01 0.799004743 1.855851e-02
2296        CCND2    0.9634558 1.333439e-01 0.472869841 1.605435e-01
660          ALLC    0.9563182 7.001686e-03 0.915747272 1.818883e-02
14350     SPTY2D1    0.9544993 1.372094e-01 0.792099997 1.494413e-02
14479      STAT5B    0.9534677 2.430210e-03 0.940326577 4.811873e-03
3521         CTSH    0.9465317 5.063007e-02 0.403932102 3.968600e-02
14214         SP4    0.9424401 3.595851e-03 0.928459160 4.871056e-03
5274         FUT2    0.9402685           NA 0.940268467           NA
6848        KCNK3    0.9383147 2.169602e-01          NA           NA
14680       SYTL1    0.9351640 3.878521e-03 0.863957281 6.025065e-02
10382        PKN3    0.9342521 2.669245e-01 0.658515936 8.811628e-03
14388        SRRT    0.9308504 5.107147e-02 0.709487289 1.300360e-02
4839        FAM3D    0.9259339           NA 0.654029085           NA
10063       PCSK9    0.9254159 9.254081e-01          NA           NA
4703      FAM117B    0.9253382 8.951517e-01          NA 2.469870e-02
2640         CETP    0.9222033 1.089913e-01 0.007511238           NA
16125       USP53    0.9190482 3.378685e-03 0.313188188 4.576141e-03
1528         BRI3    0.9163998 4.844171e-03 0.808263964 8.021471e-03
15609      TRIM39    0.9146793 9.545220e-09 0.914679266 1.250456e-08
302          ACP6    0.9058891 3.986275e-03 0.021157340 2.431259e-02
10626        POP7    0.8974311           NA 0.621928921           NA
1099        ASAP3    0.8960256 2.314637e-03 0.866555631 4.431925e-03
13505       SGMS1    0.8949302           NA 0.814284252 8.064599e-02
34          ABCA8    0.8933682 1.467232e-01          NA 7.466184e-01
2846       CLDN23    0.8806895 2.112599e-03 0.851528582 2.704834e-02
10856      PROCA1    0.8784318 1.627774e-01 0.007714765 3.366492e-01
10096       PDE4C    0.8783760 7.299161e-04 0.003753065 8.738930e-01
5269        FURIN    0.8772669 3.286720e-03 0.606232257 1.483153e-02
2256       CCDC92    0.8755605 7.109047e-01 0.036113868           NA
3191      CRACR2B    0.8725415 1.412015e-02 0.771997764 7.613515e-03
7597    LINC01184    0.8647870 8.968323e-02 0.471900673 8.624555e-02
13200        SAT2    0.8593716 2.534473e-03 0.486474190 1.664328e-01
4606         EVI5    0.8435626 5.917553e-03 0.030332569 7.883672e-03
16088        USP1    0.8412454 6.420390e-03 0.834825019           NA
16288       WBP1L    0.8405799 8.619076e-03 0.695297409 9.200620e-03
5158          FN1    0.8398339           NA          NA           NA
15488    TP53INP1    0.8360355 4.173620e-03 0.806970098 1.675254e-02
10848       PRMT6    0.8244856           NA 0.824485637           NA
10440        PLEC    0.8240726 9.456251e-03 0.017667345 4.822109e-03
9841         P3H4    0.8160806 3.271571e-03 0.019300644 7.827384e-01
8902       MTSS1L    0.8085281 5.534016e-03 0.022811592 8.122599e-02
1489        BMPR2    0.8072129 3.649574e-03 0.790727984           NA
11118         PXK    0.8066197 2.073185e-03 0.784400538 1.527962e-02
2725         CHKB    0.8029959 5.617590e-03 0.027583169 7.557156e-01
           Adi_Vis
9941  7.945905e-01
349   1.066055e-02
65    5.616297e-01
28    1.401752e-01
5380  1.180479e-01
16602 2.120407e-02
14430 1.041928e-02
11000 1.831404e-02
10161 3.709326e-01
2956  5.590947e-02
1603  2.834988e-01
4006  6.896829e-03
4675  5.590103e-03
15499 6.752680e-01
13049           NA
9441  9.827337e-01
6578  1.700410e-03
6247  6.114189e-06
6911  5.609789e-02
10835 6.124497e-01
5842  4.762462e-01
15874           NA
1207  3.861644e-01
10506 5.133609e-03
15423           NA
12863           NA
11235 5.075788e-03
6110  8.806773e-03
3275  9.001904e-03
13610 5.395695e-03
15038 9.771478e-01
6557  1.284058e-03
15781 9.185775e-03
3613            NA
7049  9.941534e-03
2296  1.966985e-01
660   1.538040e-02
14350 1.024579e-02
14479 5.899090e-03
3521  4.522836e-01
14214 5.514058e-03
5274            NA
6848  7.213545e-01
14680 7.077569e-03
10382           NA
14388 1.572881e-01
4839  2.719048e-01
10063 7.787106e-06
4703  5.487817e-03
2640  8.057008e-01
16125 5.979051e-01
1528  9.527019e-02
15609 2.346795e-08
302   8.564329e-01
10626 2.755022e-01
1099  2.272339e-02
13505           NA
34    2.652221e-05
2846            NA
10856 3.712904e-01
10096           NA
5269  2.529164e-01
2256  1.285420e-01
3191  7.881006e-02
7597  2.169575e-01
13200 2.039301e-01
4606  7.994288e-01
16088           NA
16288 1.274628e-01
5158  8.398339e-01
15488 8.139272e-03
10848           NA
10440 7.921269e-01
9841  1.076991e-02
8902  6.989565e-01
1489  1.283534e-02
11118 4.866383e-03
2725  1.407956e-02

Case 15 - Liver, Adipose x2, Cerebellum, and Blood with shared variance parameter

Compare with Case 10.

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case15")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db;/project2/compbio/predictdb/mashr_models/mashr_Brain_Cerebellum.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Visceral_Omentum.db;/project2/compbio/predictdb/mashr_models/mashr_Whole_Blood.db" 
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
05269c3 wesleycrouse 2023-01-10
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
                     SNP                    Liver     Adipose_Subcutaneous 
            5.997328e-05             1.378168e-02             1.627161e-03 
        Brain_Cerebellum Adipose_Visceral_Omentum              Whole_Blood 
            4.654519e-03             6.406225e-03             4.905430e-03 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
                     SNP                    Liver     Adipose_Subcutaneous 
                21.29890                 17.61462                 17.61462 
        Brain_Cerebellum Adipose_Visceral_Omentum              Whole_Blood 
                17.61462                 17.61462                 17.61462 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 36.97439
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
                     SNP                    Liver     Adipose_Subcutaneous 
                 8696600                    11003                    13082 
        Brain_Cerebellum Adipose_Visceral_Omentum              Whole_Blood 
                   12270                    12921                    11198 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
                     SNP                    Liver     Adipose_Subcutaneous 
             0.032328446              0.007773326              0.001091185 
        Brain_Cerebellum Adipose_Visceral_Omentum              Whole_Blood 
             0.002927609              0.004243185              0.002815861 
#total PVE
sum(estimated_group_pve)
[1] 0.05117961
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
                     SNP                    Liver     Adipose_Subcutaneous 
              0.63166650               0.15188326               0.02132070 
        Brain_Cerebellum Adipose_Visceral_Omentum              Whole_Blood 
              0.05720263               0.08290772               0.05501919 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
         genename                    group region_tag susie_pip       mu2
64244       PCSK9              Whole_Blood       1_34 1.0000000 292.76479
2454      ST3GAL4                    Liver      11_77 0.9999999 160.37659
3721       INSIG2                    Liver       2_70 0.9999780  63.04649
12008         HPR                    Liver      16_38 0.9999690 259.30802
19759      ZDHHC7     Adipose_Subcutaneous      16_49 0.9992405  47.80047
33017      TXNL4B         Brain_Cerebellum      16_38 0.9978588 240.67845
7410        ABCA1                    Liver       9_53 0.9930547  66.39710
8531         TNKS                    Liver       8_12 0.9825829  72.56899
3755        RRBP1                    Liver      20_13 0.9813621  30.71945
9390         GAS6                    Liver      13_62 0.9811752  68.43100
1597         PLTP                    Liver      20_28 0.9801259  53.19192
41704      NPC1L1 Adipose_Visceral_Omentum       7_33 0.9735995  96.67612
5991        FADS1                    Liver      11_34 0.9727025 150.38724
6391       TTC39B                    Liver       9_13 0.9686127  22.03305
7040        INHBB                    Liver       2_70 0.9625310  70.38978
47974      GIGYF1 Adipose_Visceral_Omentum       7_62 0.9551856  35.66909
3562       ACVR1C                    Liver       2_94 0.9531294  23.60520
11790      CYP2A6                    Liver      19_30 0.9496021  31.56196
8865         FUT2                    Liver      19_33 0.9464805  61.99164
6093      CSNK1G3                    Liver       5_75 0.9330060  78.96441
6100         ALLC                    Liver        2_2 0.9308567  26.30727
32453        RAC1         Brain_Cerebellum       7_10 0.9144500  34.00134
65187     CD163L1              Whole_Blood       12_7 0.9042107  24.23628
12687 RP4-781K5.7                    Liver      1_122 0.9006950 190.12812
32727     DNAJC13         Brain_Cerebellum       3_82 0.8887667  52.22553
1144        ASAP3                    Liver       1_16 0.8736426  32.36623
11726      CLDN23                    Liver       8_11 0.8728192  23.72253
44743         FN1 Adipose_Visceral_Omentum      2_127 0.8724687  22.80080
29649       PDE4C         Brain_Cerebellum      19_14 0.8716891  44.70195
10582       BMPR2                    Liver      2_120 0.8689699  26.83324
10495       PRMT6                    Liver       1_67 0.8493627  30.45207
10519       SGMS1                    Liver      10_33 0.8446207  22.94364
10657      TRIM39                    Liver       6_26 0.8429601  78.50605
2092          SP4                    Liver       7_19 0.8407578  94.63856
63206        ACP6              Whole_Blood       1_73 0.8383956  25.01972
60229        LDLR              Whole_Blood      19_11 0.8368447 593.57321
9072      SPTY2D1                    Liver      11_13 0.8351811  31.13204
9062      KLHDC7A                    Liver       1_14 0.8328382  20.47970
7355         BRI3                    Liver       7_60 0.8174548  28.01608
47064       PARP9 Adipose_Visceral_Omentum       3_76 0.8158884  44.66119
8579       STAT5B                    Liver      17_25 0.8155340  25.59802
61327      GALNT6              Whole_Blood      12_33 0.8143397  25.56438
5415        SYTL1                    Liver       1_19 0.8120530  20.61173
2042        BCAT2                    Liver      19_34 0.8068614  25.01148
               PVE          z
64244 8.519991e-04  23.237813
2454  4.667252e-04  13.376072
3721  1.834728e-04  -8.982702
12008 7.546104e-04 -17.962770
19759 1.390025e-04  -5.184802
33017 6.989186e-04  18.129274
7410  1.918857e-04   7.982017
8531  2.075108e-04  11.038564
3755  8.773301e-05   7.008305
9390  1.953978e-04  -8.923688
1597  1.517217e-04  -5.732491
41704 2.739175e-04  11.631021
5991  4.257075e-04  12.926351
6391  6.210766e-05  -4.334495
7040  1.971717e-04  -8.518936
47974 9.915167e-05   6.632402
3562  6.547567e-05  -4.687370
11790 8.722198e-05   5.407028
8865  1.707517e-04 -11.927107
6093  2.144056e-04   9.116291
6100  7.126544e-05   4.919066
32453 9.048495e-05   5.713928
65187 6.377580e-05  -4.653700
12687 4.983614e-04 -15.108415
32727 1.350800e-04  -6.853813
1144  8.228985e-05   5.283225
11726 6.025673e-05   4.720010
44743 5.789221e-05  -4.446065
29649 1.133988e-04  -6.633160
10582 6.785753e-05   6.124267
10495 7.527145e-05  -5.323721
10519 5.639548e-05   4.873968
10657 1.925885e-04   8.840164
2092  2.315578e-04  10.693191
63206 6.104524e-05   4.624653
60229 1.445571e-03 -24.707322
9072  7.566735e-05  -5.557123
9062  4.963688e-05   4.124187
7355  6.664865e-05  -5.140136
47064 1.060428e-04  -5.774700
8579  6.075314e-05   5.426252
61327 6.058446e-05  -4.608245
5415  4.871011e-05  -3.962854
2042  5.872981e-05   4.796398

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
         genename combined_pip      Adi_Sub       Liver    Brain_Cer
10226       PARP9    1.4308781 1.519292e-01 0.024129290 4.329102e-01
360        ACVR1C    1.3036139 5.074009e-02 0.953129384 1.399178e-01
67          ABCG8    1.2069050           NA 0.434535100 1.468627e-01
29          ABCA1    1.1614071           NA 0.993054668           NA
5523         GAS6    1.1547396 4.506936e-03 0.981175213 1.985197e-02
312          ACP6    1.1296139 6.596314e-03 0.091205500 3.738112e-02
17052      ZDHHC7    1.0953959 9.992405e-01 0.061577934           NA
14832     ST3GAL4    1.0813854 2.961384e-03 0.999999896 2.576713e-02
3025        CNIH4    1.0599120 4.137111e-03 0.562703161 2.994367e-02
10453        PELO    1.0358787 1.058721e-01 0.207167952 3.009728e-01
1641     C10orf88    1.0328905 6.601899e-02 0.556949060 8.760006e-03
4101      DNAJC13    1.0190738 2.663348e-03 0.110956108 8.887667e-01
7105         KDSR    1.0186299 5.615361e-02 0.644381267 8.297076e-03
4798        FADS1    1.0167687           NA 0.972702545 2.382618e-02
6000        GSK3B    1.0138053           NA 0.130139485 3.681680e-01
1236       ATP1B2    1.0103493 2.541863e-03 0.418564541 1.060222e-02
15916       TPD52    1.0100159 2.143243e-01 0.017530961 6.251955e-03
6758       INSIG2    1.0052442 2.977092e-04 0.999977987 1.022556e-03
13424       RRBP1    1.0051624 1.330835e-03 0.981362074 1.846136e-02
10351       PCSK9    1.0002112 4.394542e-08          NA           NA
6417          HPR    1.0000155 3.772786e-06 0.999968988 1.635417e-05
11147       PRKD2    0.9992412 2.426080e-04 0.361714176 8.774225e-04
9708       NPC1L1    0.9981034 2.114637e-03 0.017812181 4.577061e-03
16301      TXNL4B    0.9979009 2.899377e-06          NA 9.978588e-01
16206      TTC39B    0.9937705 3.005724e-03 0.968612730           NA
10808        PLTP    0.9905956 9.053188e-04 0.980125864 1.177587e-03
10683        PKN3    0.9891016 6.867927e-02 0.229236749 4.846650e-03
6276    HIST1H2BH    0.9884412           NA 0.577415719 1.954543e-01
3609         CTSH    0.9882271 3.564833e-02 0.395012588 4.020559e-02
15838        TNKS    0.9825829           NA 0.982582910           NA
14751     SPTY2D1    0.9823364 1.008440e-01 0.835181051 1.606191e-02
5635       GIGYF1    0.9778993 1.483488e-03 0.012522079 4.239943e-03
14881      STAT5B    0.9772340 1.539039e-03 0.815534028 4.397837e-03
677          ALLC    0.9721077 5.008677e-03 0.930856702 1.868180e-02
2349        CCND2    0.9697584 9.362689e-02 0.481213404 1.640967e-01
6737        INHBB    0.9657958 4.815641e-04 0.962531032           NA
2402      CD163L1    0.9655985 2.695375e-03 0.022649978 7.695039e-03
3352      CSNK1G3    0.9652926 6.363780e-03 0.933006013 8.608097e-03
7248      KLHDC7A    0.9585576 1.002259e-01 0.832838200 1.615643e-02
13892       SGMS1    0.9524003           NA 0.844620687 9.904945e-02
3701       CYP2A6    0.9496038 1.754665e-06 0.949602067           NA
13999       SIPA1    0.9489006 9.121091e-02 0.004135282 5.020743e-01
5414         FUT2    0.9464805           NA 0.946480517           NA
16564       USP53    0.9455196 2.577652e-03 0.302334662 5.031552e-03
1565         BRI3    0.9442122 3.461941e-03 0.817454820 8.240618e-03
11557        RAC1    0.9393129 1.665167e-03 0.011623292 9.144500e-01
9156       MTSS1L    0.9382238 4.742774e-03 0.027808741 4.658898e-02
4966        FAM3D    0.9356779           NA 0.634157602           NA
15905    TP53INP1    0.9343577 2.663519e-03 0.679685974 1.438418e-02
15082       SYTL1    0.9283932 3.221785e-03 0.812052988 5.911484e-02
1127        ASAP3    0.9212051 1.803639e-03 0.873642601 5.142749e-03
13583        SAT2    0.9144152 1.888946e-03 0.425265647 1.459870e-01
2911       CLDN23    0.9086215 1.653754e-03 0.872819244 2.871524e-02
7042        KCNK3    0.9084032 1.943485e-01          NA           NA
1526        BMPR2    0.9066524 1.305668e-03 0.868969860           NA
5819         GPAM    0.9046438 7.095703e-02          NA 5.778909e-05
36          ABCA8    0.9031119 1.202185e-01          NA 7.828581e-01
13235 RP4-781K5.7    0.9006950           NA 0.900695006           NA
7813    LINC01184    0.8939048 5.984191e-02 0.451679896 8.290055e-02
10384       PDE4C    0.8762915 5.405850e-04 0.004061809 8.716891e-01
3264      CRACR2B    0.8737420 9.743371e-03 0.690045002 7.662668e-03
5294          FN1    0.8724687           NA          NA           NA
15447       TIMD4    0.8675721 2.520664e-03          NA           NA
11168      PROCA1    0.8622085 1.140214e-01 0.003839579 3.223608e-01
16728       WBP1L    0.8610061 6.466268e-03 0.692176124 1.016603e-02
14612         SP4    0.8602652 2.853310e-03 0.840757779 5.002334e-03
11160       PRMT6    0.8493627           NA 0.849362712           NA
4173       DOPEY2    0.8431043 3.263231e-03 0.027102774 7.860172e-01
16029      TRIM39    0.8429602 7.495891e-09 0.842960120 1.454488e-08
488         ADRB1    0.8416613           NA 0.021448212 1.233835e-02
1427        BCAT2    0.8394486 8.010068e-04 0.806861371 1.974668e-03
7481         LDLR    0.8368447           NA          NA           NA
5497       GALNT6    0.8323414 1.050330e-03          NA 1.282054e-02
10126        P3H4    0.8281064 2.201859e-03 0.018865258 7.888172e-01
5428        FXYD7    0.8272627           NA 0.117421645 1.539604e-02
2787         CHKB    0.8241340 4.091569e-03 0.029509760 7.627180e-01
16642        VMO1    0.8232267 5.787303e-02 0.046059477 1.570066e-02
15238       TCEA2    0.8227721 1.261868e-02 0.013908664 5.044009e-03
3334        CSE1L    0.8210105 3.477631e-03 0.020007686 9.922384e-03
8273        LRRK2    0.8193944 3.773646e-03 0.787316081 9.458423e-03
10934        POP7    0.8184221           NA 0.433860574           NA
16446     UGT2B17    0.8168090           NA 0.057284449           NA
13620         SCD    0.8164174 1.668603e-01          NA           NA
6168        HDAC4    0.8154751 9.568603e-03          NA 2.352026e-01
16630        VIL1    0.8142470           NA 0.478112411 1.636876e-01
4728         EVI5    0.8141611 3.778304e-03 0.028640427 7.476750e-03
15196      TBC1D4    0.8138947 2.829593e-03 0.773813997 1.088922e-02
5783       GOLGA3    0.8043157 7.476477e-02 0.017190072 6.389660e-03
5409        FURIN    0.8042007 2.031917e-03 0.533152501 1.269168e-02
11438         PXK    0.8021113 1.522146e-03 0.774449773 1.573902e-02
4597        EPHA2    0.8010674           NA 0.659623468           NA
3638      CWF19L1    0.8008662 4.926466e-02 0.344712512 1.368782e-01
           Adi_Vis        Blood
10226 8.158884e-01 6.020929e-03
360   1.281438e-02 1.470122e-01
67    6.255072e-01           NA
29    1.666862e-01 1.666200e-03
5523  1.492055e-01           NA
312   1.560354e-01 8.383956e-01
17052 2.892216e-02 5.655225e-03
14832 1.290917e-02 3.974780e-02
3025  4.090627e-02 4.222217e-01
10453 4.127543e-01 9.111571e-03
1641  2.923077e-01 1.088548e-01
4101  9.471814e-03 7.215847e-03
7105  5.106271e-02 2.587352e-01
4798  6.988141e-03 1.325183e-02
6000  5.047989e-01 1.069893e-02
1236  3.170682e-01 2.615725e-01
15916 7.659263e-01 5.982393e-03
6758  2.577573e-03 1.368353e-03
13424           NA 4.008095e-03
10351 2.111971e-04 1.000000e+00
6417  1.511495e-05 1.125949e-05
11147 6.347876e-01 1.619455e-03
9708  9.735995e-01           NA
16301           NA 3.921409e-05
16206 1.419573e-02 7.956273e-03
10808 6.967531e-03 1.419332e-03
10683           NA 6.863390e-01
6276  9.593350e-03 2.059778e-01
3609  4.903029e-01 2.705769e-02
15838           NA           NA
14751 1.222518e-02 1.802427e-02
5635  9.551856e-01 4.468192e-03
14881 6.049277e-03 1.497138e-01
677   1.756052e-02           NA
2349  2.254363e-01 5.385098e-03
6737  1.895788e-03 8.874120e-04
2402  2.834735e-02 9.042107e-01
3352  1.105921e-02 6.255448e-03
7248  9.337014e-03           NA
13892           NA 8.730209e-03
3701            NA           NA
13999 2.229806e-03 3.492503e-01
5414            NA           NA
16564 6.355757e-01           NA
1565  1.065510e-01 8.503828e-03
11557 6.250281e-03 5.324140e-03
9156  4.862434e-01 3.728399e-01
4966  2.968784e-01 4.641949e-03
15905 1.040139e-02 2.272227e-01
15082 9.042491e-03 4.496106e-02
1127  2.622625e-02 1.438986e-02
13583 2.006007e-01 1.406729e-01
2911            NA 5.433295e-03
7042  7.140548e-01           NA
1526  2.986019e-02 6.516715e-03
5819  4.256887e-01 4.079402e-01
36    3.528660e-05           NA
13235           NA           NA
7813  2.338804e-01 6.560207e-02
10384           NA           NA
3264  8.161475e-02 8.467617e-02
5294  8.724687e-01           NA
15447 4.895922e-01 3.754592e-01
11168 4.217749e-01 2.118025e-04
16728 1.409831e-01 1.121459e-02
14612 6.901923e-03 4.749864e-03
11160           NA           NA
4173  1.540102e-02 1.132009e-02
16029 2.977054e-08 1.509583e-08
488   1.001329e-02 7.978614e-01
1427  2.655481e-02 3.256752e-03
7481            NA 8.368447e-01
5497  4.130780e-03 8.143397e-01
10126 1.147635e-02 6.745750e-03
5428  6.944450e-01           NA
2787  1.659231e-02 1.122239e-02
16642 3.896622e-01 3.139314e-01
15238 7.912008e-01           NA
3334  1.363641e-02 7.739664e-01
8273  1.040062e-02 8.445593e-03
10934 2.176742e-01 1.668873e-01
16446           NA 7.595245e-01
13620 6.495571e-01           NA
6168  3.229111e-01 2.477929e-01
16630           NA 1.724470e-01
4728  5.562280e-01 2.180377e-01
15196 1.455130e-02 1.181063e-02
5783  6.871320e-01 1.883915e-02
5409  2.502172e-01 6.107413e-03
11438 5.898261e-03 4.502080e-03
4597  1.414439e-01           NA
3638  1.527439e-01 1.172669e-01

Case 16 - Liver and Adipose with separate parameters (50% downsample #1)

Compare with Case 3. Randomly samples 50% of genes to include.

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case16")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db" 
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
05269c3 wesleycrouse 2023-01-10
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
                 SNP                Liver Adipose_Subcutaneous 
        0.0001213438         0.0071379166         0.0226800038 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
                 SNP                Liver Adipose_Subcutaneous 
            15.13968             60.59001              9.68234 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 161.6083
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
                 SNP                Liver Adipose_Subcutaneous 
             8696600                 5545                 6497 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
                 SNP                Liver Adipose_Subcutaneous 
         0.046494778          0.006979019          0.004151993 
#total PVE
sum(estimated_group_pve)
[1] 0.05762579
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
                 SNP                Liver Adipose_Subcutaneous 
          0.80683975           0.12110929           0.07205095 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
         genename                group region_tag susie_pip       mu2
2454      ST3GAL4                Liver      11_77 1.0000000 176.38090
12008         HPR                Liver      16_38 0.9999978 158.76202
19759      ZDHHC7 Adipose_Subcutaneous      16_49 0.9999084  31.61800
12687 RP4-781K5.7                Liver      1_121 0.9964423 204.05543
6957         USP1                Liver       1_39 0.9931048 258.31055
13717       TPD52 Adipose_Subcutaneous       8_57 0.9893922  22.59036
5544        CNIH4                Liver      1_114 0.9821766  41.28484
18314     FAM117B Adipose_Subcutaneous      2_120 0.9801395  40.66820
8531         TNKS                Liver       8_12 0.9786638  77.06108
3755        RRBP1                Liver      20_13 0.9690390  32.69412
16444    C10orf88 Adipose_Subcutaneous      10_77 0.9621970  32.65865
14913       FCGRT Adipose_Subcutaneous      19_34 0.9603780  21.21613
17109         POR Adipose_Subcutaneous       7_48 0.9593199  33.73845
7040        INHBB                Liver       2_70 0.9580297  74.79794
16361      CCDC92 Adipose_Subcutaneous      12_75 0.9469543  27.36424
15242        CCNJ Adipose_Subcutaneous      10_61 0.9459691  21.65254
3247         KDSR                Liver      18_35 0.9329336  24.77843
13993        SRRT Adipose_Subcutaneous       7_62 0.9235605  26.54317
19681        PELO Adipose_Subcutaneous       5_30 0.9185746  62.92380
8579       STAT5B                Liver      17_25 0.9149897  31.54926
13020      ZDHHC6 Adipose_Subcutaneous      10_70 0.8991049  20.79800
14273         SCD Adipose_Subcutaneous      10_64 0.8902595  19.30919
16736      ACVR1C Adipose_Subcutaneous       2_94 0.8817067  19.80169
20155       TMED4 Adipose_Subcutaneous       7_32 0.8723188  38.77679
13346       SPHK2 Adipose_Subcutaneous      19_33 0.8705643  39.62417
1597         PLTP                Liver      20_28 0.8665201  41.32560
8865         FUT2                Liver      19_33 0.8587473 103.22839
27239      DUSP14 Adipose_Subcutaneous      17_22 0.8502019  18.56556
21711       VAMP5 Adipose_Subcutaneous       2_54 0.8454320  22.48435
16347       CCND2 Adipose_Subcutaneous       12_4 0.8306329  19.93197
22062       KCNK3 Adipose_Subcutaneous       2_16 0.8299863  20.39883
14808        CTSH Adipose_Subcutaneous      15_37 0.8011763  18.48091
               PVE          z
2454  5.133007e-04  13.376072
12008 4.620255e-04 -17.962770
19759 9.200575e-05  -5.184802
12687 5.917259e-04 -15.108415
6957  7.465476e-04  16.258211
13717 6.504469e-05  -4.684363
5544  1.180050e-04   6.145535
18314 1.160014e-04   7.852653
8531  2.194769e-04  11.038564
3755  9.220006e-05   7.008305
16444 9.144976e-05  -6.762952
14913 5.929645e-05  -4.347956
17109 9.419090e-05   6.044322
7040  2.085398e-04  -8.518936
16361 7.541065e-05  -5.328046
15242 5.960820e-05  -4.639318
3247  6.727362e-05  -4.526287
13993 7.134086e-05   4.556557
19681 1.682092e-04   8.522224
8579  8.400897e-05   5.426252
13020 5.441920e-05   3.908646
14273 5.002660e-05  -4.541468
16736 5.080971e-05  -4.185879
20155 9.843905e-05   7.696786
13346 1.003879e-04  -8.721460
1597  1.042121e-04  -5.732491
8865  2.579793e-04 -11.927107
27239 4.593571e-05   3.886352
21711 5.531964e-05   4.547456
16347 4.818143e-05  -4.190519
22062 4.927157e-05  -4.772296
14808 4.308952e-05   3.616369

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
        genename combined_pip    Adi_Sub       Liver
3741       INHBB    1.0319289 0.07389923 0.958029663
9387      ZDHHC7    1.0201173 0.99990845 0.020208878
8115     ST3GAL4    1.0000000         NA 0.999999997
3558         HPR    0.9999978         NA 0.999997774
7206 RP4-781K5.7    0.9964423         NA 0.996442264
7307       RRBP1    0.9945291 0.02549009 0.969039009
9092        USP1    0.9931048         NA 0.993104752
8750       TPD52    0.9893922 0.98939216          NA
1664       CNIH4    0.9821766         NA 0.982176629
2697     FAM117B    0.9801395 0.98013946          NA
8702        TNKS    0.9786638         NA 0.978663795
6015         POR    0.9636536 0.95931991 0.004333661
2869       FCGRT    0.9625221 0.96037797 0.002144161
912     C10orf88    0.9621970 0.96219701          NA
8140      STAT5B    0.9621156 0.04712591 0.914989707
1293       CCND2    0.9584228 0.83063294 0.127789894
1267      CCDC92    0.9536252 0.94695427 0.006670922
1301        CCNJ    0.9459691 0.94596906          NA
3935        KDSR    0.9329336         NA 0.932933551
5940        PLTP    0.9258143 0.05929421 0.866520136
8088        SRRT    0.9235605 0.92356048          NA
5741        PELO    0.9185746 0.91857462          NA
9386      ZDHHC6    0.8991049 0.89910486          NA
7426         SCD    0.8902595 0.89025947          NA
189       ACVR1C    0.8817067 0.88170666          NA
8025       SPHK2    0.8783859 0.87056431 0.007821603
8515       TMED4    0.8723188 0.87231880          NA
3015        FUT2    0.8587473         NA 0.858747314
2361      DUSP14    0.8502019 0.85020193          NA
1815     CRACR2B    0.8474532 0.25635618 0.591097068
9131       VAMP5    0.8454320 0.84543195          NA
3899       KCNK3    0.8299863 0.82998628          NA
354     ALDH16A1    0.8216692 0.70383628 0.117832960
3012       FUCA1    0.8026530 0.79242898 0.010223984
1971        CTSH    0.8011763 0.80117633          NA

Case 17 - Liver and Adipose with separate parameters (50% downsample #2)

Compare with Cases 3 and 16. Randomly samples 50% of genes to include.

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case17")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db"
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
05269c3 wesleycrouse 2023-01-10
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
                 SNP                Liver Adipose_Subcutaneous 
        6.482204e-05         6.009965e-03         3.656378e-02 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
                 SNP                Liver Adipose_Subcutaneous 
           37.333321           172.303079             7.616652 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 374.5475
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
                 SNP                Liver Adipose_Subcutaneous 
             8696600                 5446                 6596 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
                 SNP                Liver Adipose_Subcutaneous 
         0.061247606          0.016412054          0.005345842 
#total PVE
sum(estimated_group_pve)
[1] 0.0830055
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
                 SNP                Liver Adipose_Subcutaneous 
          0.73787405           0.19772248           0.06440347 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
         genename                group region_tag susie_pip        mu2
2454      ST3GAL4                Liver      11_77 1.0000000  176.65176
4435        PSRC1                Liver       1_67 1.0000000 1724.56864
3721       INSIG2                Liver       2_69 0.9999900   70.76506
19759      ZDHHC7 Adipose_Subcutaneous      16_49 0.9998596   28.85537
18637       ABCA8 Adipose_Subcutaneous      17_39 0.9900544   28.23760
25057       SIPA1 Adipose_Subcutaneous      11_36 0.9865119   22.84991
5544        CNIH4                Liver      1_114 0.9847141   43.51446
14913       FCGRT Adipose_Subcutaneous      19_34 0.9756560   19.79338
24743      PCMTD2 Adipose_Subcutaneous      20_38 0.9720272   29.27508
12687 RP4-781K5.7                Liver      1_121 0.9679462  207.12864
13020      ZDHHC6 Adipose_Subcutaneous      10_70 0.9640113   21.01357
8531         TNKS                Liver       8_12 0.9613470   79.96515
27265  AC007950.2 Adipose_Subcutaneous      15_29 0.9580793   30.04037
13346       SPHK2 Adipose_Subcutaneous      19_33 0.9557932   35.57941
12875        MPND Adipose_Subcutaneous       19_5 0.9550319   19.74430
14240     CWF19L1 Adipose_Subcutaneous      10_64 0.9441068   32.22444
21708       USP39 Adipose_Subcutaneous       2_54 0.9384321   18.13206
16736      ACVR1C Adipose_Subcutaneous       2_94 0.9321572   18.82547
22062       KCNK3 Adipose_Subcutaneous       2_16 0.9287638   20.22888
16496        ELF1 Adipose_Subcutaneous      13_16 0.9258803   17.25157
7040        INHBB                Liver       2_70 0.9257392   76.53463
20426    ALDH16A1 Adipose_Subcutaneous      19_34 0.9181006   18.92977
5563        ABCG8                Liver       2_27 0.9175320  321.20444
22969     KLHDC7A Adipose_Subcutaneous       1_13 0.9077054   18.02272
18737        NTN5 Adipose_Subcutaneous      19_33 0.9060975   53.38487
22699      ZNF575 Adipose_Subcutaneous      19_30 0.9043218   24.20160
24579      C2CD4A Adipose_Subcutaneous      15_28 0.9014682   21.33820
3247         KDSR                Liver      18_35 0.9001984   25.41426
16438        GPAM Adipose_Subcutaneous      10_70 0.8918491   18.52014
19681        PELO Adipose_Subcutaneous       5_30 0.8900162   60.82831
13615      CYP2W1 Adipose_Subcutaneous        7_2 0.8840618   19.42157
19793     UBASH3B Adipose_Subcutaneous      11_74 0.8835475   23.84138
18321       PARP9 Adipose_Subcutaneous       3_76 0.8816283   23.10575
25930     TMEM199 Adipose_Subcutaneous      17_17 0.8759915   38.41378
6391       TTC39B                Liver       9_13 0.8594171   25.04182
13992        ACHE Adipose_Subcutaneous       7_62 0.8583735   26.66383
18578    ARHGAP17 Adipose_Subcutaneous      16_21 0.8450452   17.27680
22992       FUCA1 Adipose_Subcutaneous       1_17 0.8432085   17.74356
24646         PNP Adipose_Subcutaneous       14_1 0.8374570   16.79333
15981       NRBP1 Adipose_Subcutaneous       2_16 0.8322309  105.32873
20402     CCDC116 Adipose_Subcutaneous       22_4 0.8277800   20.10887
8865         FUT2                Liver      19_33 0.8172658   69.50040
24460      FAM49A Adipose_Subcutaneous       2_10 0.8111895   17.08999
27239      DUSP14 Adipose_Subcutaneous      17_22 0.8052276   17.64644
               PVE          z
2454  5.140889e-04  13.376072
4435  5.018810e-03 -41.687336
3721  2.059372e-04  -8.982702
19759 8.396263e-05  -5.184802
18637 8.135930e-05   4.800447
25057 6.560049e-05  -5.096493
5544  1.246993e-04   6.145535
14913 5.620010e-05  -4.347956
24743 8.281268e-05  -5.636854
12687 5.834608e-04 -15.108415
13020 5.895251e-05   3.908646
8531  2.237181e-04  11.038564
27265 8.375816e-05   5.555780
13346 9.896530e-05  -8.721460
12875 5.487567e-05  -4.366911
14240 8.853741e-05   5.837199
21708 4.951881e-05   4.083599
16736 5.106876e-05  -4.185879
22062 5.467610e-05  -4.772296
16496 4.648404e-05  -3.876887
7040  2.061897e-04  -8.518936
20426 5.057732e-05  -4.215260
5563  8.576756e-04 -20.293982
22969 4.760862e-05   4.124187
18737 1.407711e-04  11.132252
22699 6.369237e-05  -5.711154
24579 5.597943e-05   4.535165
3247  6.657881e-05  -4.526287
16438 4.806799e-05   4.000718
19681 1.575520e-04   8.522224
13615 4.996747e-05   4.311978
19793 6.130298e-05   4.906621
18321 5.928242e-05   3.744644
25930 9.792808e-05   6.615834
6391  6.263111e-05  -4.334495
13992 6.660688e-05   5.880517
18578 4.248772e-05   3.867562
22992 4.354077e-05  -3.851528
24646 4.092793e-05  -3.575272
15981 2.551003e-04   5.571352
20402 4.844209e-05   4.331655
8865  1.652993e-04 -11.927107
24460 4.034451e-05  -3.586393
27239 4.135196e-05   3.886352

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
        genename combined_pip    Adi_Sub       Liver
8115     ST3GAL4    1.0957196 0.09571960 0.999999980
3758      INSIG2    1.0495571 0.04956712 0.999990027
9390      ZDHHC7    1.0122296 0.99985959 0.012370042
6238       PSRC1    1.0000000         NA 0.999999957
4015     KLHDC7A    0.9926113 0.90770536 0.084905928
21         ABCA8    0.9900544 0.99005443          NA
7623       SIPA1    0.9892596 0.98651189 0.002747697
1689       CNIH4    0.9847141         NA 0.984714122
2865       FCGRT    0.9756560 0.97565601          NA
5673      PCMTD2    0.9720272 0.97202716          NA
7198 RP4-781K5.7    0.9679462         NA 0.967946158
9389      ZDHHC6    0.9640113 0.96401125          NA
8680        TNKS    0.9613470         NA 0.961346973
86    AC007950.2    0.9580793 0.95807933          NA
8029       SPHK2    0.9557932 0.95579316          NA
4872        MPND    0.9550319 0.95503190          NA
2002     CWF19L1    0.9441068 0.94410682          NA
9097       USP39    0.9384321 0.93843206          NA
187       ACVR1C    0.9321572 0.93215717          NA
3907       KCNK3    0.9287638 0.92876384          NA
2487        ELF1    0.9258803 0.92588032          NA
3747       INHBB    0.9257392         NA 0.925739175
8565     TMEM199    0.9193194 0.87599146 0.043327909
346     ALDH16A1    0.9181006 0.91810056          NA
42         ABCG8    0.9175320         NA 0.917531976
5405        NTN5    0.9060975 0.90609755          NA
1031      C2CD4A    0.9049330 0.90146823 0.003464813
9618      ZNF575    0.9043218 0.90432184          NA
3942        KDSR    0.9001984         NA 0.900198430
3244        GPAM    0.8918491 0.89184909          NA
5738        PELO    0.8900162 0.89001616          NA
2050      CYP2W1    0.8840618 0.88406180          NA
2078       DAGLB    0.8839621 0.79698095 0.086981200
8983     UBASH3B    0.8835475 0.88354754          NA
5609       PARP9    0.8816283 0.88162827          NA
8903      TTC39B    0.8594171         NA 0.859417107
141         ACHE    0.8583735 0.85837347          NA
543     ARHGAP17    0.8482212 0.84504523 0.003175997
3000       FUCA1    0.8432085 0.84320848          NA
5981         PNP    0.8374570 0.83745695          NA
5367       NRBP1    0.8343007 0.83223094 0.002069784
1233     CCDC116    0.8277800 0.82777995          NA
3006        FUT2    0.8172658         NA 0.817265778
2772      FAM49A    0.8111895 0.81118949          NA
2370      DUSP14    0.8083621 0.80522762 0.003134518

Case 18 - Liver and Adipose with separate parameters (50% downsample #3)

Compare with Cases 3, 16, 17. Randomly samples 50% of genes to include.

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case18")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db"
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- sapply(ctwas_gene_res$id, function(x){paste(unlist(strsplit(unlist(strsplit(x, split="[|]"))[2], "_"))[-1], collapse="_")})
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
05269c3 wesleycrouse 2023-01-10
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
                 SNP                Liver Adipose_Subcutaneous 
        0.0001106724         0.0137896334         0.0173282248 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
                 SNP                Liver Adipose_Subcutaneous 
           17.760505            26.291691             5.245622 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 112.0352
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
                 SNP                Liver Adipose_Subcutaneous 
             8696600                 5498                 6544 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
                 SNP                Liver Adipose_Subcutaneous 
         0.049746723          0.005800912          0.001731070 
#total PVE
sum(estimated_group_pve)
[1] 0.05727871
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
                 SNP                Liver Adipose_Subcutaneous 
          0.86850292           0.10127520           0.03022188 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
         genename                group region_tag susie_pip       mu2
2454      ST3GAL4                Liver      11_77 1.0000000 166.72884
3721       INSIG2                Liver       2_69 0.9999936  66.56031
19759      ZDHHC7 Adipose_Subcutaneous      16_49 0.9994884  27.45044
7410        ABCA1                Liver       9_53 0.9927778  68.53207
1999        PRKD2                Liver      19_33 0.9898406  29.09938
3755        RRBP1                Liver      20_13 0.9845204  31.41717
11790      CYP2A6                Liver      19_28 0.9804901  30.45145
12687 RP4-781K5.7                Liver      1_121 0.9802779 196.86171
5544        CNIH4                Liver      1_114 0.9797271  39.98647
3247         KDSR                Liver      18_35 0.9725500  23.90255
6391       TTC39B                Liver       9_13 0.9688530  22.54707
7040        INHBB                Liver       2_70 0.9650038  71.97295
3562       ACVR1C                Liver       2_94 0.9600798  25.24584
18637       ABCA8 Adipose_Subcutaneous      17_39 0.9571227  26.34493
25057       SIPA1 Adipose_Subcutaneous      11_36 0.9423376  21.75433
6093      CSNK1G3                Liver       5_75 0.9412921  81.48936
20337        PKN3 Adipose_Subcutaneous       9_66 0.9397147  38.41851
9062      KLHDC7A                Liver       1_13 0.9284125  20.70842
21459      PROCA1 Adipose_Subcutaneous      17_17 0.9166599  26.60236
14913       FCGRT Adipose_Subcutaneous      19_34 0.9101678  19.89277
2092          SP4                Liver       7_19 0.9020442  98.66457
16361      CCDC92 Adipose_Subcutaneous      12_75 0.8891349  25.42626
6220         PELO                Liver       5_31 0.8856013  68.44503
10657      TRIM39                Liver       6_26 0.8669252  80.37077
3212        CCND2                Liver       12_4 0.8656458  21.48301
11726      CLDN23                Liver       8_11 0.8506412  24.21342
12875        MPND Adipose_Subcutaneous       19_5 0.8450061  20.39906
6957         USP1                Liver       1_39 0.8220847 246.91080
3881         VIL1                Liver      2_129 0.8021120  26.13075
               PVE          z
2454  4.852114e-04  13.376072
3721  1.937015e-04  -8.982702
19759 7.984493e-05  -5.184802
7410  1.980005e-04   7.982017
1999  8.382416e-05   5.072217
3755  9.001442e-05   7.008305
11790 8.689033e-05   5.407028
12687 5.616047e-04 -15.108415
5544  1.140088e-04   6.145535
3247  6.765135e-05  -4.526287
6391  6.357236e-05  -4.334495
7040  2.021244e-04  -8.518936
3562  7.053706e-05  -4.687370
18637 7.338122e-05   4.800447
25057 5.965854e-05  -5.096493
6093  2.232264e-04   9.116291
20337 1.050647e-04  -6.780039
9062  5.595105e-05   4.124187
21459 7.096573e-05   5.543282
14913 5.269106e-05  -4.347956
2092  2.590057e-04  10.693191
16361 6.579159e-05  -5.328046
6220  1.764008e-04   8.288398
10657 2.027683e-04   8.840164
3212  5.411972e-05  -4.065830
11726 5.994084e-05   4.720010
12875 5.016380e-05  -4.366911
6957  5.907136e-04  16.258211
3881  6.099682e-05   4.725531

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
        genename combined_pip      Adi_Sub      Liver
9427      ZDHHC7    1.0484150 9.994884e-01 0.04892660
1677       CNIH4    1.0300841 5.035691e-02 0.97972714
3736      INSIG2    1.0148991 1.490554e-02 0.99999358
923     C10orf88    1.0129152 2.843843e-01 0.72853091
6162       PRKD2    1.0048176 1.497700e-02 0.98984059
8121     ST3GAL4    1.0000000           NA 0.99999999
8077     SPTY2D1    0.9999106 3.372090e-01 0.66270164
18         ABCA1    0.9927778           NA 0.99277780
8096        SRRT    0.9913786 2.465669e-01 0.74481166
7320       RRBP1    0.9845204           NA 0.98452045
2027      CYP2A6    0.9804901           NA 0.98049008
7205 RP4-781K5.7    0.9802779           NA 0.98027791
3921        KDSR    0.9725500           NA 0.97255001
8941      TTC39B    0.9688530           NA 0.96885296
3726       INHBB    0.9650038           NA 0.96500383
196       ACVR1C    0.9600798           NA 0.96007976
7644       SIPA1    0.9588468 9.423376e-01 0.01650911
23         ABCA8    0.9571227 9.571227e-01         NA
1865     CSNK1G3    0.9412921           NA 0.94129212
5888        PKN3    0.9397147 9.397147e-01         NA
8536       TMED4    0.9339946 2.523967e-01 0.68159781
4005     KLHDC7A    0.9284125           NA 0.92841251
1286      CCDC92    0.9237987 8.891349e-01 0.03466381
6173      PROCA1    0.9166599 9.166599e-01         NA
2858       FCGRT    0.9101678 9.101678e-01         NA
7994         SP4    0.9020442           NA 0.90204415
9126        USP1    0.8980912 7.600644e-02 0.82208475
5755        PELO    0.8856013           NA 0.88560133
1987     CWF19L1    0.8791238 3.338231e-01 0.54530070
8832      TRIM39    0.8669253 8.323616e-08 0.86692520
1308       CCND2    0.8656458           NA 0.86564577
8336      TBC1D4    0.8588602 5.951134e-02 0.79934884
1615      CLDN23    0.8506412           NA 0.85064120
8754    TP53INP1    0.8466237 5.579647e-02 0.79082721
4901        MPND    0.8450061 8.450061e-01         NA
7982      SORCS2    0.8384003 6.587743e-01 0.17962599
9183        VIL1    0.8021120           NA 0.80211199

Case 19 - Liver only

Compare with Case 7 (should be identical). Checking new version.

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case19")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db" 
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- ctwas_gene_res$type
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
05269c3 wesleycrouse 2023-01-10
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
         SNP        Liver 
0.0001708048 0.0102158606 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
     SNP    Liver 
 9.70388 43.32067 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 59.81014
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
    SNP   Liver 
8696600   11003 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
       SNP      Liver 
0.04194840 0.01417103 
#total PVE
sum(estimated_group_pve)
[1] 0.05611943
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
      SNP     Liver 
0.7474844 0.2525156 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
         genename group region_tag susie_pip        mu2          PVE          z
4435        PSRC1 Liver       1_67 1.0000000 1661.18355 4.834348e-03 -41.687336
2454      ST3GAL4 Liver      11_77 1.0000000  173.08249 5.037017e-04  13.376072
12008         HPR Liver      16_38 0.9999998  163.53982 4.759307e-04 -17.962770
3721       INSIG2 Liver       2_69 0.9999958   68.51740 1.993973e-04  -8.982702
5563        ABCG8 Liver       2_27 0.9999660  313.08513 9.111040e-04 -20.293982
5991        FADS1 Liver      11_34 0.9998572  163.89639 4.769004e-04  12.926351
12687 RP4-781K5.7 Liver      1_121 0.9997364  203.40050 5.917766e-04 -15.108415
7410        ABCA1 Liver       9_53 0.9956684   70.26807 2.036072e-04   7.982017
8531         TNKS Liver       8_12 0.9915320   76.52776 2.208239e-04  11.038564
9390         GAS6 Liver      13_62 0.9888998   71.30764 2.052148e-04  -8.923688
1597         PLTP Liver      20_28 0.9885315   61.47131 1.768411e-04  -5.732491
1999        PRKD2 Liver      19_33 0.9867185   30.06125 8.632181e-05   5.072217
7040        INHBB Liver       2_70 0.9832856   73.94275 2.115902e-04  -8.518936
3755        RRBP1 Liver      20_13 0.9819053   32.42030 9.264178e-05   7.008305
5544        CNIH4 Liver      1_114 0.9787252   40.77861 1.161485e-04   6.145535
2092          SP4 Liver       7_19 0.9781344  102.23228 2.910093e-04  10.693191
6093      CSNK1G3 Liver       5_75 0.9759584   84.08145 2.388096e-04   9.116291
8865         FUT2 Liver      19_33 0.9674812  104.74356 2.949105e-04 -11.927107
11790      CYP2A6 Liver      19_28 0.9643766   31.94912 8.966560e-05   5.407028
10657      TRIM39 Liver       6_25 0.9605468   81.54422 2.279460e-04   8.840164
3247         KDSR Liver      18_35 0.9578772   24.64899 6.871148e-05  -4.526287
233        NPC1L1 Liver       7_32 0.9548612   86.93035 2.415639e-04 -10.761931
4704        DDX56 Liver       7_32 0.9494642   59.88499 1.654691e-04   9.641861
6391       TTC39B Liver       9_13 0.9414293   23.21539 6.360392e-05  -4.334495
6220         PELO Liver       5_31 0.9388155   70.72581 1.932317e-04   8.288398
1114         SRRT Liver       7_62 0.9383759   32.70763 8.931947e-05   5.424996
6778         PKN3 Liver       9_66 0.9376624   47.53713 1.297179e-04  -6.620563
3300     C10orf88 Liver      10_77 0.9360721   37.17593 1.012725e-04  -6.787850
8579       STAT5B Liver      17_25 0.9310376   30.58279 8.286375e-05   5.426252
3562       ACVR1C Liver       2_94 0.9285980   25.84902 6.985414e-05  -4.687370
6957         USP1 Liver       1_39 0.8941238  253.51088 6.596515e-04  16.258211
9062      KLHDC7A Liver       1_13 0.8301423   22.36541 5.403183e-05   4.124187
8418         POP7 Liver       7_62 0.8209688   40.45771 9.666033e-05  -5.845258
9072      SPTY2D1 Liver      11_13 0.8208409   33.49049 8.000198e-05  -5.557123
8931      CRACR2B Liver       11_1 0.8079877   21.87357 5.143334e-05  -3.989585
5415        SYTL1 Liver       1_19 0.8069655   22.30889 5.239059e-05  -3.962854
6100         ALLC Liver        2_2 0.8068337   28.07319 6.591680e-05   4.919066

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
         genename combined_pip     Liver
7022        PSRC1    1.0000000 1.0000000
9062      ST3GAL4    1.0000000 1.0000000
3979          HPR    0.9999998 0.9999998
4188       INSIG2    0.9999958 0.9999958
44          ABCG8    0.9999660 0.9999660
2990        FADS1    0.9998572 0.9998572
8053  RP4-781K5.7    0.9997364 0.9997364
17          ABCA1    0.9956684 0.9956684
9738         TNKS    0.9915320 0.9915320
3423         GAS6    0.9888998 0.9888998
6672         PLTP    0.9885315 0.9885315
6913        PRKD2    0.9867185 0.9867185
4174        INHBB    0.9832856 0.9832856
8167        RRBP1    0.9819053 0.9819053
1874        CNIH4    0.9787252 0.9787252
8916          SP4    0.9781344 0.9781344
2086      CSNK1G3    0.9759584 0.9759584
3368         FUT2    0.9674812 0.9674812
2263       CYP2A6    0.9643766 0.9643766
9870       TRIM39    0.9605468 0.9605468
4395         KDSR    0.9578772 0.9578772
5972       NPC1L1    0.9548612 0.9548612
2397        DDX56    0.9494642 0.9494642
9996       TTC39B    0.9414293 0.9414293
6425         PELO    0.9388155 0.9388155
9035         SRRT    0.9383759 0.9383759
6583         PKN3    0.9376624 0.9376624
1009     C10orf88    0.9360721 0.9360721
9091       STAT5B    0.9310376 0.9310376
195        ACVR1C    0.9285980 0.9285980
10193        USP1    0.8941238 0.8941238
4483      KLHDC7A    0.8301423 0.8301423
6761         POP7    0.8209688 0.8209688
9009      SPTY2D1    0.8208409 0.8208409
2032      CRACR2B    0.8079877 0.8079877
9235        SYTL1    0.8069655 0.8069655
405          ALLC    0.8068337 0.8068337

Case 20 - Liver and Blood with separate parameters

Compare with Case 10 (Liver+Blood with 3 additional tissues).

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case20")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Whole_Blood.db"
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- ctwas_gene_res$type
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
903d30e wesleycrouse 2023-02-04
88f9698 wesleycrouse 2023-02-01
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
         SNP        Liver  Whole_Blood 
0.0001136417 0.0090445205 0.0122175090 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
        SNP       Liver Whole_Blood 
  14.320712   43.766484    6.772068 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 49.06246
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
        SNP       Liver Whole_Blood 
    8696600       11003       11198 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
        SNP       Liver Whole_Blood 
0.041188150 0.012675311 0.002696279 
#total PVE
sum(estimated_group_pve)
[1] 0.05655974
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
        SNP       Liver Whole_Blood 
 0.72822383  0.22410483  0.04767134 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
         genename       group region_tag susie_pip        mu2          PVE
4435        PSRC1       Liver       1_67 1.0000000 1654.26995 4.814228e-03
2454      ST3GAL4       Liver      11_77 1.0000000  176.24213 5.128968e-04
20880       PCSK9 Whole_Blood       1_34 1.0000000  178.56031 5.196432e-04
12008         HPR       Liver      16_38 0.9999982  157.93071 4.596064e-04
3721       INSIG2       Liver       2_69 0.9999908   67.59930 1.967245e-04
5991        FADS1       Liver      11_34 0.9995077  162.87624 4.737663e-04
5563        ABCG8       Liver       2_27 0.9979773  310.92446 9.030169e-04
12687 RP4-781K5.7       Liver      1_122 0.9969687  202.12809 5.864466e-04
7410        ABCA1       Liver       9_53 0.9932795   70.38202 2.034480e-04
1597         PLTP       Liver      20_28 0.9852710   62.01770 1.778245e-04
8531         TNKS       Liver       8_12 0.9847984   76.45288 2.191096e-04
9390         GAS6       Liver      13_62 0.9810190   71.38951 2.038131e-04
19842        ACP6 Whole_Blood       1_73 0.9774534   21.94703 6.242981e-05
3755        RRBP1       Liver      20_13 0.9774019   32.22010 9.164743e-05
1999        PRKD2       Liver      19_33 0.9753710   29.98132 8.510223e-05
7040        INHBB       Liver       2_70 0.9701027   74.11796 2.092481e-04
6093      CSNK1G3       Liver       5_75 0.9546182   84.07065 2.335578e-04
6391       TTC39B       Liver       9_13 0.9444939   23.27926 6.398654e-05
2092          SP4       Liver       7_19 0.9443983  101.96080 2.802262e-04
4704        DDX56       Liver       7_32 0.9256831   58.82731 1.584753e-04
21823     CD163L1 Whole_Blood       12_7 0.9110450   21.60008 5.726846e-05
8865         FUT2       Liver      19_33 0.9087167  110.44874 2.920852e-04
10657      TRIM39       Liver       6_26 0.9071329   81.96315 2.163764e-04
23057       ZFP28 Whole_Blood      19_38 0.8968490   25.39396 6.627810e-05
6220         PELO       Liver       5_31 0.8957423   70.45702 1.836655e-04
3562       ACVR1C       Liver       2_94 0.8952993   24.97910 6.508267e-05
11790      CYP2A6       Liver      19_30 0.8922077   33.52620 8.705036e-05
6957         USP1       Liver       1_39 0.8863041  254.57642 6.566308e-04
17963      GALNT6 Whole_Blood      12_33 0.8707524   22.33978 5.661010e-05
1114         SRRT       Liver       7_62 0.8536980   32.62326 8.104980e-05
23737       SIPA1 Whole_Blood      11_36 0.8534699   25.15027 6.246709e-05
16404       CSE1L Whole_Blood      20_30 0.8434335   22.93780 5.630188e-05
16092        GPAM Whole_Blood      10_70 0.8265858   21.62521 5.201977e-05
9072      SPTY2D1       Liver      11_13 0.8263825   33.10832 7.962299e-05
6100         ALLC       Liver        2_2 0.8241241   27.96015 6.705827e-05
               z
4435  -41.687336
2454   13.376072
20880  23.237813
12008 -17.962770
3721   -8.982702
5991   12.926351
5563  -20.293982
12687 -15.108415
7410    7.982017
1597   -5.732491
8531   11.038564
9390   -8.923688
19842   4.624653
3755    7.008305
1999    5.072217
7040   -8.518936
6093    9.116291
6391   -4.334495
2092   10.693191
4704    9.641861
21823  -4.653700
8865  -11.927107
10657   8.840164
23057  -5.205678
6220    8.288398
3562   -4.687370
11790   5.407028
6957   16.258211
17963  -4.608245
1114    5.424996
23737  -5.439965
16404  -4.739463
16092   4.181558
9072   -5.557123
6100    4.919066

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
         genename combined_pip       Liver        Blood
249        ACVR1C    1.2437269 0.895299344 3.484276e-01
11601     ST3GAL4    1.1517444 0.999999998 1.517444e-01
9031        PSRC1    1.0564981 1.000000000 5.649806e-02
211          ACP6    1.0267116 0.049258251 9.774534e-01
3837        FADS1    1.0160517 0.999507736 1.654400e-02
2417        CNIH4    1.0131745 0.533572074 4.796024e-01
5416       INSIG2    1.0081061 0.999990774 8.115310e-03
5138          HPR    1.0073982 0.999998172 7.399991e-03
5673         KDSR    1.0041422 0.450433471 5.537087e-01
8212        PCSK9    1.0000000          NA 1.000000e+00
23          ABCA1    0.9987345 0.993279521 5.454970e-03
55          ABCG8    0.9979773 0.997977322           NA
10312 RP4-781K5.7    0.9969687 0.996968698           NA
10452       RRBP1    0.9935015 0.977401878 1.609958e-02
8603         PLTP    0.9928058 0.985270976 7.534812e-03
8895        PRKD2    0.9895863 0.975370996 1.421529e-02
12439        TNKS    0.9847984 0.984798410           NA
4400         GAS6    0.9810190 0.981018990           NA
5398        INHBB    0.9782941 0.970102709 8.191355e-03
1306     C10orf88    0.9782659 0.656054885 3.222110e-01
2690      CSNK1G3    0.9776905 0.954618180 2.307231e-02
12766      TTC39B    0.9727619 0.944493892 2.826805e-02
8492         PKN3    0.9686191 0.279985274 6.886339e-01
11640      STAT5B    0.9676508 0.702232117 2.654187e-01
11421         SP4    0.9626436 0.944398344 1.824530e-02
973        ATP1B2    0.9608670 0.444900964 5.159661e-01
3085        DDX56    0.9485890 0.925683137 2.290588e-02
5027    HIST1H2BH    0.9329614 0.439424553 4.935369e-01
13500       ZFP28    0.9243451 0.027496082 8.968490e-01
1914      CD163L1    0.9209132 0.009868164 9.110450e-01
13031        USP1    0.9120136 0.886304145 2.570941e-02
8294         PELO    0.9117587 0.895742313 1.601634e-02
4325         FUT2    0.9087167 0.908716731           NA
12610      TRIM39    0.9071329 0.907132864 4.381281e-08
11531     SPTY2D1    0.8929224 0.826382465 6.653990e-02
2930       CYP2A6    0.8922077 0.892207662           NA
12500    TP53INP1    0.8729546 0.370714877 5.022397e-01
4381       GALNT6    0.8707524          NA 8.707524e-01
10923       SIPA1    0.8552324 0.001762444 8.534699e-01
2675        CSE1L    0.8540968 0.010663276 8.434335e-01
11566        SRRT    0.8536980 0.853698022           NA
11808       SYTL1    0.8273307 0.676328349 1.510024e-01
4660         GPAM    0.8265858          NA 8.265858e-01
509          ALLC    0.8241241 0.824124088           NA
2618      CRACR2B    0.8120701 0.539118602 2.729515e-01
12969     UGT2B17    0.8082791 0.032529550 7.757495e-01
3971        FAM3D    0.8078808 0.791335475 1.654529e-02
1863        CCND2    0.8071777 0.785894977 2.128269e-02
7360      N4BP2L2    0.8019320 0.415035736 3.868962e-01

Case 21 - Liver and Blood with shared variance parameter

Compare with Case 20 (Liver+Blood with separate variance parameter).

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case21")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Whole_Blood.db"
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- ctwas_gene_res$type
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
903d30e wesleycrouse 2023-02-04
88f9698 wesleycrouse 2023-02-01
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
         SNP        Liver  Whole_Blood 
0.0001212463 0.0124521315 0.0036933593 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
        SNP       Liver Whole_Blood 
   13.29666    31.50648    31.50648 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 28.55783
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
        SNP       Liver Whole_Blood 
    8696600       11003       11198 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
        SNP       Liver Whole_Blood 
0.040801966 0.012562468 0.003792121 
#total PVE
sum(estimated_group_pve)
[1] 0.05715655
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
        SNP       Liver Whole_Blood 
 0.71386329  0.21979050  0.06634621 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
         genename       group region_tag susie_pip        mu2          PVE
20880       PCSK9 Whole_Blood       1_34 1.0000000  308.22307 8.969855e-04
2454      ST3GAL4       Liver      11_77 1.0000000  170.24780 4.954523e-04
16865        LDLR Whole_Blood       19_9 0.9999998  619.61206 1.803184e-03
12008         HPR       Liver      16_38 0.9999984  156.90900 4.566332e-04
3721       INSIG2       Liver       2_69 0.9999943   67.06819 1.951796e-04
5563        ABCG8       Liver       2_27 0.9975572  305.17105 8.859342e-04
12687 RP4-781K5.7       Liver      1_122 0.9969565  199.14345 5.777800e-04
7410        ABCA1       Liver       9_53 0.9947417   69.23077 2.004148e-04
5991        FADS1       Liver      11_34 0.9945898  160.45733 4.644338e-04
18553       TIMD4 Whole_Blood       5_92 0.9910643  177.85449 5.129641e-04
1597         PLTP       Liver      20_28 0.9884000   61.13291 1.758442e-04
8531         TNKS       Liver       8_12 0.9882088   74.85002 2.152588e-04
1999        PRKD2       Liver      19_33 0.9879258   29.53124 8.490365e-05
9390         GAS6       Liver      13_62 0.9854919   70.52236 2.022555e-04
3755        RRBP1       Liver      20_13 0.9849818   31.73757 9.097501e-05
7040        INHBB       Liver       2_70 0.9767248   72.95237 2.073633e-04
6093      CSNK1G3       Liver       5_75 0.9630637   82.49268 2.312015e-04
6391       TTC39B       Liver       9_13 0.9626739   22.78383 6.383021e-05
17278       PSRC1 Whole_Blood       1_67 0.9565507 1636.19476 4.554737e-03
2092          SP4       Liver       7_19 0.9524455  100.21894 2.777859e-04
19842        ACP6 Whole_Blood       1_73 0.9512723   25.54314 7.071303e-05
4704        DDX56       Liver       7_32 0.9450936   58.47310 1.608241e-04
11790      CYP2A6       Liver      19_30 0.9410053   32.66037 8.944034e-05
3562       ACVR1C       Liver       2_94 0.9389761   24.91384 6.807938e-05
1114         SRRT       Liver       7_62 0.9370482   32.18402 8.776524e-05
10657      TRIM39       Liver       6_26 0.9275974   80.65520 2.177270e-04
8865         FUT2       Liver      19_33 0.9240537  105.25167 2.830392e-04
6220         PELO       Liver       5_31 0.9207324   69.40676 1.859754e-04
9062      KLHDC7A       Liver       1_14 0.8699226   21.80182 5.519422e-05
9072      SPTY2D1       Liver      11_13 0.8697099   32.83968 8.311772e-05
6100         ALLC       Liver        2_2 0.8694223   27.50688 6.959729e-05
8579       STAT5B       Liver      17_25 0.8547148   30.27261 7.529937e-05
3212        CCND2       Liver       12_4 0.8471372   21.88898 5.396343e-05
5415        SYTL1       Liver       1_19 0.8397136   21.34041 5.214999e-05
7355         BRI3       Liver       7_60 0.8242918   28.88017 6.927892e-05
1144        ASAP3       Liver       1_16 0.8181004   33.48518 7.972224e-05
11726      CLDN23       Liver       8_11 0.8156769   24.54645 5.826759e-05
21823     CD163L1 Whole_Blood       12_7 0.8024986   25.07237 5.855446e-05
8931      CRACR2B       Liver       11_1 0.8015855   20.85149 4.864153e-05
               z
20880  23.237813
2454   13.376072
16865 -24.707322
12008 -17.962770
3721   -8.982702
5563  -20.293982
12687 -15.108415
7410    7.982017
5991   12.926351
18553  13.882363
1597   -5.732491
8531   11.038564
1999    5.072217
9390   -8.923688
3755    7.008305
7040   -8.518936
6093    9.116291
6391   -4.334495
17278 -41.793474
2092   10.693191
19842   4.624653
4704    9.641861
11790   5.407028
3562   -4.687370
1114    5.424996
10657   8.840164
8865  -11.927107
6220    8.288398
9062    4.124187
9072   -5.557123
6100    4.919066
8579    5.426252
3212   -4.065830
5415   -3.962854
7355   -5.140136
1144    5.283225
11726   4.720010
21823  -4.653700
8931   -3.989585

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
         genename combined_pip      Liver        Blood
249        ACVR1C    1.1096557 0.93897613 1.706796e-01
9031        PSRC1    1.0418415 0.08529082 9.565507e-01
11601     ST3GAL4    1.0337047 1.00000000 3.370467e-02
211          ACP6    1.0270637 0.07579141 9.512723e-01
3837        FADS1    1.0027587 0.99458982 8.168839e-03
5416       INSIG2    1.0012667 0.99999427 1.272455e-03
5138          HPR    1.0010618 0.99999840 1.063380e-03
8212        PCSK9    1.0000000         NA 1.000000e+00
5972         LDLR    0.9999998         NA 9.999998e-01
2417        CNIH4    0.9999172 0.61505000 3.848672e-01
55          ABCG8    0.9975572 0.99755720           NA
10312 RP4-781K5.7    0.9969565 0.99695645           NA
23          ABCA1    0.9955531 0.99474171 8.114287e-04
5673         KDSR    0.9921889 0.74392673 2.482622e-01
12112       TIMD4    0.9910643         NA 9.910643e-01
8895        PRKD2    0.9907476 0.98792578 2.821868e-03
8603         PLTP    0.9895609 0.98839997 1.160923e-03
12439        TNKS    0.9882088 0.98820882           NA
10452       RRBP1    0.9873438 0.98498178 2.362061e-03
4400         GAS6    0.9854919 0.98549194           NA
5398        INHBB    0.9779355 0.97672477 1.210737e-03
8492         PKN3    0.9772193 0.26141497 7.158044e-01
973        ATP1B2    0.9724865 0.66122533 3.112612e-01
12766      TTC39B    0.9670440 0.96267390 4.370102e-03
2690      CSNK1G3    0.9666301 0.96306369 3.566462e-03
11640      STAT5B    0.9665903 0.85471476 1.118755e-01
1306     C10orf88    0.9634842 0.79352318 1.699611e-01
11421         SP4    0.9549673 0.95244550 2.521762e-03
3085        DDX56    0.9484767 0.94509357 3.383157e-03
2930       CYP2A6    0.9410053 0.94100527           NA
11566        SRRT    0.9370482 0.93704825           NA
12610      TRIM39    0.9275974 0.92759741 6.854308e-09
4325         FUT2    0.9240537 0.92405373           NA
8294         PELO    0.9232693 0.92073243 2.536881e-03
5027    HIST1H2BH    0.9204044 0.70934102 2.110633e-01
13031        USP1    0.9063479 0.69728891 2.090590e-01
11531     SPTY2D1    0.8808234 0.86970991 1.111348e-02
11808       SYTL1    0.8741650 0.83971362 3.445136e-02
2618      CRACR2B    0.8703798 0.80158547 6.879430e-02
5789      KLHDC7A    0.8699226 0.86992256           NA
509          ALLC    0.8694223 0.86942227           NA
13500       ZFP28    0.8618056 0.06360145 7.982042e-01
1863        CCND2    0.8503636 0.84713721 3.226400e-03
12500    TP53INP1    0.8332123 0.65017059 1.830417e-01
1245         BRI3    0.8287582 0.82429182 4.466412e-03
877         ASAP3    0.8284269 0.81810036 1.032655e-02
1914      CD163L1    0.8216615 0.01916289 8.024986e-01
2321       CLDN23    0.8188432 0.81567693 3.166293e-03
8714         POP7    0.8163625 0.61931370 1.970488e-01
7360      N4BP2L2    0.8089038 0.61644440 1.924594e-01
11900      TBC1D4    0.8051575 0.79662842 8.529110e-03

Case 22 - Liver, Blood and Adipose with separate variance parameter

Compare with Case 20 (Liver+Blood with separate variance parameter).

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case22")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Whole_Blood.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db"
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- ctwas_gene_res$type
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
903d30e wesleycrouse 2023-02-04
88f9698 wesleycrouse 2023-02-01
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
                 SNP                Liver          Whole_Blood 
        0.0000879571         0.0065084856         0.0049335631 
Adipose_Subcutaneous 
        0.0150904720 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
                 SNP                Liver          Whole_Blood 
           15.552010            50.448898            15.755890 
Adipose_Subcutaneous 
            5.596842 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 84.25444
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
                 SNP                Liver          Whole_Blood 
             8696600                11003                11198 
Adipose_Subcutaneous 
               13082 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
                 SNP                Liver          Whole_Blood 
          0.03462001           0.01051388           0.00253317 
Adipose_Subcutaneous 
          0.00321544 
#total PVE
sum(estimated_group_pve)
[1] 0.0508825
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
                 SNP                Liver          Whole_Blood 
          0.68039130           0.20663057           0.04978470 
Adipose_Subcutaneous 
          0.06319342 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
         genename                group region_tag susie_pip        mu2
20880       PCSK9          Whole_Blood       1_34 1.0000000  292.81506
4435        PSRC1                Liver       1_67 1.0000000 1671.58010
2454      ST3GAL4                Liver      11_77 1.0000000  173.29542
12008         HPR                Liver      16_38 0.9999978  157.54089
3721       INSIG2                Liver       2_69 0.9999907   68.23259
32382      ZDHHC7 Adipose_Subcutaneous      16_49 0.9994285   27.51062
5991        FADS1                Liver      11_34 0.9986248  163.38377
5563        ABCG8                Liver       2_27 0.9968934  312.70648
12687 RP4-781K5.7                Liver      1_122 0.9958201  202.37447
7410        ABCA1                Liver       9_53 0.9920842   70.76914
16865        LDLR          Whole_Blood       19_9 0.9860500  588.43695
8531         TNKS                Liver       8_12 0.9819120   77.98310
9390         GAS6                Liver      13_62 0.9793993   73.98876
26340       TPD52 Adipose_Subcutaneous       8_57 0.9779838   20.92862
1597         PLTP                Liver      20_28 0.9744734   57.95682
1999        PRKD2                Liver      19_33 0.9734753   29.94248
19842        ACP6          Whole_Blood       1_73 0.9731059   24.71050
3755        RRBP1                Liver      20_13 0.9665073   32.57819
7040        INHBB                Liver       2_70 0.9639261   74.28462
31260       ABCA8 Adipose_Subcutaneous      17_39 0.9491281   27.01222
6093      CSNK1G3                Liver       5_75 0.9447775   83.55330
2092          SP4                Liver       7_19 0.9320244  102.37847
6391       TTC39B                Liver       9_13 0.9085958   23.78221
27865        CCNJ Adipose_Subcutaneous      10_61 0.9068170   20.47059
27536       FCGRT Adipose_Subcutaneous      19_34 0.9052147   20.64055
8865         FUT2                Liver      19_33 0.9035945   72.26140
23057       ZFP28          Whole_Blood      19_38 0.8933481   28.57149
27650      PLPPR2 Adipose_Subcutaneous      19_11 0.8902840   26.71673
10657      TRIM39                Liver       6_26 0.8841069   82.54485
6957         USP1                Liver       1_39 0.8764910  256.20169
4704        DDX56                Liver       7_33 0.8744278   57.99635
21823     CD163L1          Whole_Blood       12_7 0.8729646   24.16043
28984      CCDC92 Adipose_Subcutaneous      12_75 0.8698630   25.72447
11790      CYP2A6                Liver      19_30 0.8687866   33.93950
34082      PROCA1 Adipose_Subcutaneous      17_18 0.8682486   26.72503
3562       ACVR1C                Liver       2_94 0.8599134   25.30942
6100         ALLC                Liver        2_2 0.8192854   28.37379
               PVE          z
20880 8.521454e-04  23.237813
4435  4.864604e-03 -41.687336
2454  5.043214e-04  13.376072
12008 4.584718e-04 -17.962770
3721  1.985675e-04  -8.982702
32382 8.001518e-05  -5.184802
5991  4.748228e-04  12.926351
5563  9.072060e-04 -20.293982
12687 5.864850e-04 -15.108415
7410  2.043209e-04   7.982017
16865 1.688570e-03 -24.707322
8531  2.228401e-04  11.038564
9390  2.108851e-04  -8.923688
26340 5.956519e-05  -4.684363
1597  1.643595e-04  -5.732491
1999  8.482679e-05   5.072217
19842 6.997806e-05   4.624653
3755  9.163311e-05   7.008305
7040  2.083833e-04  -8.518936
31260 7.461145e-05   4.800447
6093  2.297277e-04   9.116291
2092  2.776874e-04  10.693191
6391  6.288445e-05  -4.334495
27865 5.402197e-05  -4.639318
27536 5.437423e-05  -4.347956
8865  1.900204e-04 -11.927107
23057 7.428034e-05  -5.205678
27650 6.922009e-05   3.965665
10657 2.123807e-04   8.840164
6957  6.535062e-04  16.258211
4704  1.475859e-04   9.641861
21823 6.137925e-05  -4.653700
28984 6.512047e-05  -5.328046
11790 8.581020e-05   5.407028
34082 6.752780e-05   5.543282
3562  6.333696e-05  -4.687370
6100  6.765079e-05   4.919066

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
         genename combined_pip      Adi_Sub       Liver        Blood
3728      DNAJC13    1.8233864 3.751840e-01 0.797081476 6.511209e-01
308        ACVR1C    1.3061499 3.399432e-01 0.859913406 1.062933e-01
13442     ST3GAL4    1.0995558 3.619309e-02 0.999999995 6.336276e-02
263          ACP6    1.0804720 7.446485e-02 0.032901279 9.731059e-01
1499     C10orf88    1.0418891 1.942758e-01 0.659265349 1.883480e-01
2761        CNIH4    1.0399495 4.433321e-02 0.433189780 5.624265e-01
10334       PSRC1    1.0346903 2.945515e-02 1.000000000 5.235121e-03
15528      ZDHHC7    1.0308369 9.994285e-01 0.016659709 1.474871e-02
5042         GAS6    1.0208991 4.149979e-02 0.979399332           NA
6164        INHBB    1.0208851 5.443195e-02 0.963926139 2.526998e-03
5873          HPR    1.0169972 1.496283e-02 0.999997776 2.036583e-03
6184       INSIG2    1.0150037 1.229211e-02 0.999990661 2.720936e-03
6494         KDSR    1.0149336 4.049752e-01 0.303609427 3.063490e-01
3065      CSNK1G3    1.0138495 6.282587e-02 0.944777517 6.246064e-03
4386        FADS1    1.0041538           NA 0.998624768 5.529022e-03
9526         PELO    1.0007792 1.978634e-01 0.794936008 7.979861e-03
9432        PCSK9    1.0000005 4.984591e-07          NA 1.000000e+00
64          ABCG8    0.9968934           NA 0.996893377           NA
11975 RP4-781K5.7    0.9958201           NA 0.995820088           NA
9741         PKN3    0.9942721 1.506616e-01 0.138908742 7.047018e-01
28          ABCA1    0.9937233           NA 0.992084204 1.639123e-03
12136       RRBP1    0.9921614 2.120357e-02 0.966507276 4.450546e-03
14458       TPD52    0.9920075 9.779838e-01 0.005949712 8.073981e-03
9859         PLTP    0.9891232 1.235242e-02 0.974473374 2.297402e-03
13364     SPTY2D1    0.9877175 4.873760e-01 0.477324101 2.301736e-02
6833         LDLR    0.9860500           NA          NA 9.860500e-01
10183       PRKD2    0.9833101 3.891769e-03 0.973475264 5.943076e-03
14384        TNKS    0.9819120           NA 0.981912022           NA
13239         SP4    0.9765264 3.908866e-02 0.932024394 5.413332e-03
13486      STAT5B    0.9720516 2.707278e-02 0.693328729 2.516501e-01
15546       ZFP28    0.9592919 4.225783e-02 0.023685899 8.933481e-01
14735      TTC39B    0.9591090 4.316469e-02 0.908595812 7.348478e-03
13401        SRRT    0.9537042 4.063502e-01 0.547353993           NA
34          ABCA8    0.9491281 9.491281e-01          NA           NA
2143        CCND2    0.9464626 7.280609e-01 0.212919523 5.482203e-03
15036        USP1    0.9405275 5.117330e-02 0.876490973 1.286323e-02
5743    HIST1H2BH    0.9250086           NA 0.520901943 4.041067e-01
12673       SIPA1    0.9144629 6.702811e-01 0.001238984 2.429428e-01
2194      CD163L1    0.9134944 3.478295e-02 0.005746864 8.729646e-01
2154         CCNJ    0.9110941 9.068170e-01 0.004277143           NA
4703        FCGRT    0.9103701 9.052147e-01 0.002341236 2.814081e-03
6626      KLHDC7A    0.9076986 6.265206e-01 0.281177999           NA
4950         FUT2    0.9035945           NA 0.903594458           NA
3528        DDX56    0.8992934 2.137085e-02 0.874427779 3.494761e-03
9852       PLPPR2    0.8953828 8.902840e-01 0.005078100 2.068699e-05
604          ALLC    0.8885037 6.921824e-02 0.819285433           NA
2106       CCDC92    0.8847019 8.698630e-01 0.012579076 2.259762e-03
14568      TRIM39    0.8841070 9.649856e-08 0.884106857 1.416383e-08
9981          POR    0.8791576 7.542258e-01 0.004268438 1.206634e-01
9414       PCMTD2    0.8717502 7.520134e-01 0.064784069 5.495274e-02
10201      PROCA1    0.8717494 8.682486e-01 0.003317855 1.829653e-04
7505        LRRK2    0.8701613 6.485233e-02 0.794672124 1.063684e-02
3355       CYP2A6    0.8688247 3.808170e-05 0.868786620           NA
3299      CWF19L1    0.8514524 3.546988e-01 0.290951212 2.058024e-01
13215      SORCS2    0.8423335 7.493912e-01 0.084454568 8.487747e-03
7120    LINC01184    0.8361409 5.554749e-01 0.196860294 8.380579e-02
14019       TIMD4    0.8280940 3.212977e-02          NA 7.959643e-01
14447    TP53INP1    0.8262980 3.793612e-02 0.410230811 3.781311e-01
13673       SYTL1    0.8260800 4.360640e-02 0.688624046 9.384954e-02
1431         BRI3    0.8056284 5.803202e-02 0.737416455 1.017996e-02
3050        CSE1L    0.8042657 3.416089e-02 0.005141047 7.649638e-01

Case 23 - Liver, Blood and Adipose with shared variance parameter

Compare with Case 22 (same with separate variance parameters).

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case23")

weight <- "/project2/compbio/predictdb/mashr_models/mashr_Liver.db;/project2/compbio/predictdb/mashr_models/mashr_Whole_Blood.db;/project2/compbio/predictdb/mashr_models/mashr_Adipose_Subcutaneous.db"
weight <- unlist(strsplit(weight, ";"))

#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")


#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- ctwas_gene_res$type
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res <- cbind(ctwas_gene_res, gene_info[sapply(ctwas_gene_res$alt_name, match, gene_info$gene_id), c("genename", "gene_type")])

#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

df$group <- as.factor(df$group)
df$group <- dplyr::recode_factor(df$group,
                                 Adipose_Subcutaneous="Adi_Sub",
                                 Brain_Cerebellum="Brain_Cer",
                                 Adipose_Visceral_Omentum="Adi_Vis",
                                 Whole_Blood="Blood")

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

Version Author Date
903d30e wesleycrouse 2023-02-04
88f9698 wesleycrouse 2023-02-01
####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
print(estimated_group_prior)
                 SNP                Liver          Whole_Blood 
        0.0001089434         0.0118495369         0.0048369906 
Adipose_Subcutaneous 
        0.0004915995 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
print(estimated_group_prior_var)
                 SNP                Liver          Whole_Blood 
            13.31279             26.85102             26.85102 
Adipose_Subcutaneous 
            26.85102 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[ncol(group_prior_var_rec)]
print(estimated_enrichment)
[1] 7.897512
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
                 SNP                Liver          Whole_Blood 
             8696600                11003                11198 
Adipose_Subcutaneous 
               13082 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
print(estimated_group_pve)
                 SNP                Liver          Whole_Blood 
        0.0367062325         0.0101881073         0.0042324977 
Adipose_Subcutaneous 
        0.0005025353 
#total PVE
sum(estimated_group_pve)
[1] 0.05162937
#attributable PVE
estimated_group_pve/sum(estimated_group_pve)
                 SNP                Liver          Whole_Blood 
         0.710956390          0.197331611          0.081978484 
Adipose_Subcutaneous 
         0.009733515 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
         genename                group region_tag susie_pip        mu2
20880       PCSK9          Whole_Blood       1_34 1.0000000  305.89745
2454      ST3GAL4                Liver      11_77 1.0000000  168.44461
16865        LDLR          Whole_Blood       19_9 0.9999995  613.64697
12008         HPR                Liver      16_38 0.9999979  155.70647
3721       INSIG2                Liver       2_69 0.9999932   66.26177
32382      ZDHHC7 Adipose_Subcutaneous      16_49 0.9980177  282.52034
12687 RP4-781K5.7                Liver      1_122 0.9954593  197.22973
5563        ABCG8                Liver       2_27 0.9952113  301.44964
7410        ABCA1                Liver       9_53 0.9945173   68.54804
5991        FADS1                Liver      11_34 0.9918644  158.51276
18553       TIMD4          Whole_Blood       5_92 0.9907482  175.61944
8531         TNKS                Liver       8_12 0.9876413   74.46467
1597         PLTP                Liver      20_28 0.9875334   59.78733
1999        PRKD2                Liver      19_33 0.9867639   29.23428
9390         GAS6                Liver      13_62 0.9848609   70.31949
3755        RRBP1                Liver      20_13 0.9839621   31.57228
7040        INHBB                Liver       2_70 0.9754582   72.30803
17278       PSRC1          Whole_Blood       1_67 0.9672354 1628.27575
19842        ACP6          Whole_Blood       1_73 0.9665932   25.25567
6391       TTC39B                Liver       9_13 0.9599533   22.64819
6093      CSNK1G3                Liver       5_75 0.9599377   81.55833
2092          SP4                Liver       7_19 0.9447848   99.10516
3562       ACVR1C                Liver       2_94 0.9372327   24.61697
11790      CYP2A6                Liver      19_30 0.9351715   32.46926
4704        DDX56                Liver       7_33 0.9341181   57.01733
10657      TRIM39                Liver       6_26 0.9220454   80.01689
1114         SRRT                Liver       7_62 0.9188870   31.92805
8865         FUT2                Liver      19_33 0.9133067  102.73734
6100         ALLC                Liver        2_2 0.8780360   27.25296
23057       ZFP28          Whole_Blood      19_38 0.8650758   29.33952
21823     CD163L1          Whole_Blood       12_7 0.8581601   24.76869
9062      KLHDC7A                Liver       1_14 0.8467426   21.51199
9072      SPTY2D1                Liver      11_13 0.8459173   32.61239
7355         BRI3                Liver       7_60 0.8345000   28.58034
1144        ASAP3                Liver       1_16 0.8272665   33.25444
5415        SYTL1                Liver       1_19 0.8261822   21.22370
8579       STAT5B                Liver      17_25 0.8233303   30.08998
11726      CLDN23                Liver       8_11 0.8214820   24.40499
17963      GALNT6          Whole_Blood      12_33 0.8126653   25.93689
3212        CCND2                Liver       12_4 0.8005992   21.66860
               PVE          z
20880 8.902176e-04  23.237813
2454  4.902047e-04  13.376072
16865 1.785824e-03 -24.707322
12008 4.531334e-04 -17.962770
3721  1.928326e-04  -8.982702
32382 8.205561e-04  -5.184802
12687 5.713684e-04 -15.108415
5563  8.730726e-04 -20.293982
7410  1.983936e-04   7.982017
5991  4.575482e-04  12.926351
18553 5.063563e-04  13.882363
8531  2.140276e-04  11.038564
1597  1.718230e-04  -5.732491
1999  8.395101e-05   5.072217
9390  2.015445e-04  -8.923688
3755  9.040752e-05   7.008305
7040  2.052653e-04  -8.518936
17278 4.583323e-03 -41.793474
19842 7.104327e-05   4.624653
6391  6.327087e-05  -4.334495
6093  2.278409e-04   9.116291
2092  2.724893e-04  10.693191
3562  6.714326e-05  -4.687370
11790 8.836574e-05   5.407028
4704  1.549990e-04   9.641861
10657 2.147110e-04   8.840164
1114  8.537973e-05   5.424996
8865  2.730645e-04 -11.927107
6100  6.963800e-05   4.919066
23057 7.386309e-05  -5.205678
21823 6.185741e-05  -4.653700
9062  5.300934e-05   4.124187
9072  8.028433e-05  -5.557123
7355  6.940872e-05  -5.140136
1144  8.005995e-05   5.283225
5415  5.102901e-05  -3.962854
8579  7.209686e-05   5.426252
11726 5.834410e-05   4.720010
17963 6.134087e-05  -4.608245
3212  5.048547e-05  -4.065830

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- c("Adipose_Subcutaneous", "Liver", "Brain_Cerebellum", "Adipose_Visceral_Omentum", "Whole_Blood")

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

#abbreviate column names
df_gene <- dplyr::rename(df_gene, 
                         Adi_Sub="Adipose_Subcutaneous",
                         Brain_Cer="Brain_Cerebellum",
                         Adi_Vis="Adipose_Visceral_Omentum",
                         Blood="Whole_Blood")

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene, max(sum(df_gene$combined_pip>0.8), 20))
         genename combined_pip      Adi_Sub       Liver        Blood
308        ACVR1C    1.1504987 2.099099e-02 0.937232746 1.922750e-01
263          ACP6    1.0499307 1.547899e-03 0.081789598 9.665932e-01
13442     ST3GAL4    1.0449164 7.097195e-04 0.999999996 4.420671e-02
10334       PSRC1    1.0206884 1.013373e-03 0.052439705 9.672354e-01
4386        FADS1    1.0034855           NA 0.991864368 1.162113e-02
2761        CNIH4    1.0029934 8.277595e-04 0.536584962 4.655807e-01
1499     C10orf88    1.0028221 2.935708e-02 0.789143959 1.843210e-01
15528      ZDHHC7    1.0021283 9.980177e-01 0.002895176 1.215465e-03
6184       INSIG2    1.0021219 1.856231e-04 0.999993179 1.943092e-03
5873          HPR    1.0016956 2.308724e-04 0.999997890 1.466790e-03
9432        PCSK9    1.0000000 7.522259e-09          NA 1.000000e+00
6833         LDLR    0.9999995           NA          NA 9.999995e-01
28          ABCA1    0.9956856           NA 0.994517339 1.168261e-03
11975 RP4-781K5.7    0.9954593           NA 0.995459297           NA
6494         KDSR    0.9953049 2.023789e-02 0.668354860 3.067121e-01
64          ABCG8    0.9952113           NA 0.995211282           NA
14019       TIMD4    0.9913597 6.115121e-04          NA 9.907482e-01
10183       PRKD2    0.9909145 1.648758e-04 0.986763862 3.985719e-03
9859         PLTP    0.9895767 3.089373e-04 0.987533373 1.734379e-03
14384        TNKS    0.9876413           NA 0.987641316           NA
12136       RRBP1    0.9874780 3.266162e-04 0.983962061 3.189274e-03
5042         GAS6    0.9856701 8.091447e-04 0.984860912           NA
9741         PKN3    0.9827705 2.301343e-02 0.200912863 7.588442e-01
6164        INHBB    0.9783808 1.067837e-03 0.975458190 1.854734e-03
1123       ATP1B2    0.9771596 8.998613e-04 0.588762800 3.874969e-01
13486      STAT5B    0.9730618 6.418358e-04 0.823330284 1.490896e-01
14735      TTC39B    0.9664583 7.105092e-04 0.959953313 5.794506e-03
3065      CSNK1G3    0.9662452 1.311701e-03 0.959937750 4.995744e-03
13239         SP4    0.9493498 6.344909e-04 0.944784831 3.930499e-03
3528        DDX56    0.9370060 3.729137e-04 0.934118092 2.514996e-03
3355       CYP2A6    0.9351723 8.074045e-07 0.935171486           NA
5743    HIST1H2BH    0.9279286           NA 0.658428381 2.695002e-01
13401        SRRT    0.9264733 7.586300e-03 0.918886998           NA
14568      TRIM39    0.9220455 1.547382e-09 0.922045438 1.063697e-08
9526         PELO    0.9204887 1.241554e-01 0.792345863 3.987532e-03
15546       ZFP28    0.9168815 1.174367e-03 0.050631403 8.650758e-01
4950         FUT2    0.9133067           NA 0.913306721           NA
15036        USP1    0.9129431 1.197718e-03 0.645707889 2.660375e-01
13364     SPTY2D1    0.8961792 3.571507e-02 0.845917258 1.454685e-02
6626      KLHDC7A    0.8823173 3.557474e-02 0.846742579           NA
2982      CRACR2B    0.8812080 3.032610e-03 0.781339204 9.683621e-02
604          ALLC    0.8794463 1.410332e-03 0.878036004           NA
2194      CD163L1    0.8781896 8.034140e-04 0.019226089 8.581601e-01
13673       SYTL1    0.8746132 8.648209e-04 0.826182164 4.756625e-02
2143        CCND2    0.8608777 5.579932e-02 0.800599207 4.479138e-03
14447    TP53INP1    0.8508173 6.788455e-04 0.613272639 2.368658e-01
1431         BRI3    0.8429317 9.698682e-04 0.834499998 7.461879e-03
1021        ASAP3    0.8420798 4.460399e-04 0.827266478 1.436728e-02
8469      N4BP2L2    0.8376143 4.032046e-04 0.585827851 2.513833e-01
12673       SIPA1    0.8263667 8.871226e-02 0.003978709 7.336757e-01
2657       CLDN23    0.8263296 4.009696e-04 0.821482009 4.446587e-03
5019       GALNT6    0.8129323 2.669988e-04          NA 8.126653e-01
9978         POP7    0.8084571           NA 0.562732910 2.457242e-01
5449        GRINA    0.8017987 1.008854e-03 0.031215768 7.695741e-01
15136        VIL1    0.8002473           NA 0.566658447 2.335889e-01

Comparison - Liver and Adipose Analyzed Separately and Jointly

Comparison of with Case 3 (Liver + Adipose with separate parameters) with Case 9 (Liver only) and Case 10 (Adipose only)

Venn Diagram

####################
#union of separate analyses
liver_and_adipose_genes_union <- union(liver_genes, adipose_genes)

#genes found in both liver and adipose analzyed separately
intersect(liver_genes, adipose_genes)
[1] "KDSR"     "PELO"     "SRRT"     "PKN3"     "C10orf88" "ACVR1C"   "KLHDC7A" 
[8] "SPTY2D1" 
#genes found only in liver
liver_genes[!(liver_genes %in% adipose_genes)]
 [1] "PSRC1"       "ST3GAL4"     "HPR"         "INSIG2"      "ABCG8"      
 [6] "FADS1"       "RP4-781K5.7" "ABCA1"       "TNKS"        "GAS6"       
[11] "PLTP"        "PRKD2"       "INHBB"       "RRBP1"       "CNIH4"      
[16] "SP4"         "CSNK1G3"     "FUT2"        "CYP2A6"      "TRIM39"     
[21] "NPC1L1"      "DDX56"       "TTC39B"      "STAT5B"      "USP1"       
[26] "POP7"        "CRACR2B"     "SYTL1"       "ALLC"       
#genes found only in adipose
adipose_genes[!(adipose_genes %in% liver_genes)]
 [1] "ZDHHC7"     "TPD52"      "POR"        "ABCA8"      "SIPA1"     
 [6] "PLPPR2"     "FAM117B"    "CCNJ"       "PCMTD2"     "CCDC92"    
[11] "FCGRT"      "CCND2"      "SPHK2"      "NTN5"       "IL1RN"     
[16] "AC007950.2" "CWF19L1"    "TMED4"      "PROCA1"     "MPND"      
[21] "KCNK3"      "SCD"        "PDHB"       "SORCS2"     "C2CD4A"    
[26] "PCSK9"      "ALDH16A1"   "LINC01184"  "VAMP5"      "PARP9"     
[31] "UBASH3B"    "CTSH"       "ELF1"       "GPAM"      
####################
#genes found in both separate and joint analyses
liver_and_adipose_genes[liver_and_adipose_genes %in% liver_and_adipose_genes_union ]
 [1] "ACVR1C"      "ST3GAL4"     "PSRC1"       "C10orf88"    "CNIH4"      
 [6] "GAS6"        "INHBB"       "ZDHHC7"      "HPR"         "CSNK1G3"    
[11] "INSIG2"      "KDSR"        "CYP2A6"      "FADS1"       "PELO"       
[16] "RRBP1"       "RP4-781K5.7" "ABCG8"       "ABCA1"       "PLTP"       
[21] "PRKD2"       "TPD52"       "TNKS"        "SPTY2D1"     "SRRT"       
[26] "STAT5B"      "SP4"         "TTC39B"      "KLHDC7A"     "ABCA8"      
[31] "SIPA1"       "CCND2"       "PKN3"        "PLPPR2"      "USP1"       
[36] "CCDC92"      "FCGRT"       "CCNJ"        "ALLC"        "CWF19L1"    
[41] "FUT2"        "PROCA1"      "PCMTD2"      "POR"         "DDX56"      
[46] "TRIM39"      "FAM117B"     "SORCS2"      "SPHK2"       "SYTL1"      
[51] "LINC01184"   "SCD"         "IL1RN"       "NTN5"       
#genes found only in joint analysis
liver_and_adipose_genes[!(liver_and_adipose_genes %in% liver_and_adipose_genes_union )]
[1] "BRI3"
#genes found only in separate analyses
liver_and_adipose_genes_union[!(liver_and_adipose_genes_union %in% liver_and_adipose_genes)]
 [1] "NPC1L1"     "POP7"       "CRACR2B"    "AC007950.2" "TMED4"     
 [6] "MPND"       "KCNK3"      "PDHB"       "C2CD4A"     "PCSK9"     
[11] "ALDH16A1"   "VAMP5"      "PARP9"      "UBASH3B"    "CTSH"      
[16] "ELF1"       "GPAM"      
####################
save.image("multigroup_image.RData")
load("multigroup_image.RData")

library(VennDiagram)
Loading required package: grid
Loading required package: futile.logger
term_gene_list <- list(liver=liver_genes, combined=liver_and_adipose_genes)

names(term_gene_list) <- c("Liver\nOnly", "Liver and\nAdipose")

v0 <- venn.diagram( rev(term_gene_list), filename=NULL, disable.logging=T,
                    fill = c("red", "blue"),
                    alpha = 0.50,
                    col = "transparent",
                    cex=1)
INFO [2023-03-21 14:32:41] [[1]]
INFO [2023-03-21 14:32:41] rev(term_gene_list)
INFO [2023-03-21 14:32:41] 
INFO [2023-03-21 14:32:41] $filename
INFO [2023-03-21 14:32:41] NULL
INFO [2023-03-21 14:32:41] 
INFO [2023-03-21 14:32:41] $disable.logging
INFO [2023-03-21 14:32:41] T
INFO [2023-03-21 14:32:41] 
INFO [2023-03-21 14:32:41] $fill
INFO [2023-03-21 14:32:41] c("red", "blue")
INFO [2023-03-21 14:32:41] 
INFO [2023-03-21 14:32:41] $alpha
INFO [2023-03-21 14:32:41] [1] 0.5
INFO [2023-03-21 14:32:41] 
INFO [2023-03-21 14:32:41] $col
INFO [2023-03-21 14:32:41] [1] "transparent"
INFO [2023-03-21 14:32:41] 
INFO [2023-03-21 14:32:41] $cex
INFO [2023-03-21 14:32:41] [1] 1
INFO [2023-03-21 14:32:41] 
grid.newpage()
grid.draw(v0)

Version Author Date
903d30e wesleycrouse 2023-02-04
pdf(file = "output/SBP_Venn.pdf", width = 3, height = 3)

grid.newpage()
grid.draw(v0)

dev.off()
png 
  2 

Silver Standard

library(readxl)

known_annotations <- read_xlsx("data/summary_known_genes_annotations.xlsx", sheet="LDL")
New names:
• `` -> `...4`
• `` -> `...5`
known_annotations <- unique(known_annotations$`Gene Symbol`)

#number of silver standard genes in liver genes
sum(liver_genes %in% known_annotations)
[1] 6
#number of silver standard genes in liver+adipose genes
sum(liver_and_adipose_genes %in% known_annotations)
[1] 5

Case 26 - All tissues with shared variance parameter

Load ctwas results

results_dir <- paste0("/project2/mstephens/wcrouse/ctwas_multigroup_testing/", trait_id, "/multigroup_case26")

weight <- "/project2/compbio/predictdb/mashr_models/"
weight <- paste0(weight, list.files(weight))
weight <- weight[-grep(".gz", weight)]


#load information for all genes

gene_info <- data.frame(gene=as.character(), genename=as.character(), gene_type=as.character(), weight=as.character())

for (i in 1:length(weight)){
  sqlite <- RSQLite::dbDriver("SQLite")
  db = RSQLite::dbConnect(sqlite, weight[i])
  query <- function(...) RSQLite::dbGetQuery(db, ...)
  gene_info_current <- query("select gene, genename, gene_type from extra")
  RSQLite::dbDisconnect(db)

  gene_info_current$weight <- weight[i]
  
  gene_info <- rbind(gene_info, gene_info_current)
}

gene_info$group <- sapply(1:nrow(gene_info), function(x){paste0(unlist(strsplit(tools::file_path_sans_ext(rev(unlist(strsplit(gene_info$weight[x], "/")))[1]), "_"))[-1], collapse="_")})

gene_info$gene_id <- paste(gene_info$gene, gene_info$group, sep="|")

#load ctwas results
ctwas_res <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.susieIrss.txt"))

#make unique identifier for regions
ctwas_res$region_tag <- paste(ctwas_res$region_tag1, ctwas_res$region_tag2, sep="_")

#load z scores for SNPs and collect sample size
load(paste0(results_dir, "/", analysis_id, "_expr_z_snp.Rd"))

sample_size <- z_snp$ss
sample_size <- as.numeric(names(which.max(table(sample_size))))

#compute PVE for each gene/SNP
ctwas_res$PVE = ctwas_res$susie_pip*ctwas_res$mu2/sample_size

#separate gene and SNP results
ctwas_gene_res <- ctwas_res[ctwas_res$type != "SNP", ]
ctwas_gene_res <- data.frame(ctwas_gene_res)
ctwas_snp_res <- ctwas_res[ctwas_res$type == "SNP", ]
ctwas_snp_res <- data.frame(ctwas_snp_res)

#add gene information to results
ctwas_gene_res$gene_id <- sapply(ctwas_gene_res$id, function(x){unlist(strsplit(x, split="[|]"))[1]})
ctwas_gene_res$group <- ctwas_gene_res$type
ctwas_gene_res$alt_name <- paste0(ctwas_gene_res$gene_id, "|", ctwas_gene_res$group)

ctwas_gene_res$genename <- NA
ctwas_gene_res$gene_type <- NA

group_list <- unique(ctwas_gene_res$group)

for (j in 1:length(group_list)){
  print(j)
  group <- group_list[j]

  res_group_indx <- which(ctwas_gene_res$group==group)
  gene_info_group <- gene_info[gene_info$group==group,,drop=F]

  ctwas_gene_res[res_group_indx,c("genename", "gene_type")] <- gene_info_group[sapply(ctwas_gene_res$alt_name[res_group_indx], match, gene_info_group$gene_id), c("genename", "gene_type")]
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11
[1] 12
[1] 13
[1] 14
[1] 15
[1] 16
[1] 17
[1] 18
[1] 19
[1] 20
[1] 21
[1] 22
[1] 23
[1] 24
[1] 25
[1] 26
[1] 27
[1] 28
[1] 29
[1] 30
[1] 31
[1] 32
[1] 33
[1] 34
[1] 35
[1] 36
[1] 37
[1] 38
[1] 39
[1] 40
[1] 41
[1] 42
[1] 43
[1] 44
[1] 45
[1] 46
[1] 47
[1] 48
[1] 49
#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res$z <- z_gene[ctwas_gene_res$id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$gene_id=NA
ctwas_snp_res$group="SNP"
ctwas_snp_res$alt_name=NA
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA

ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos", "alt_name", "gene_id"))]
first_cols <- c("genename", "group", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

Check convergence of parameters

library(ggplot2)
library(cowplot)

load(paste0(results_dir, "/", analysis_id, "_ctwas.s2.susieIrssres.Rd"))

#estimated group prior (all iterations)
estimated_group_prior_all <- group_prior_rec
estimated_group_prior_all["SNP",] <- estimated_group_prior_all["SNP",]*thin #adjust parameter to account for thin argument

#estimated group prior variance (all iterations)
estimated_group_prior_var_all <- group_prior_var_rec

#set group size
group_size <- c(table(ctwas_gene_res$type), structure(n_snps, names="SNP"))
group_size <- group_size[rownames(estimated_group_prior_all)]

#estimated group PVE (all iterations)
estimated_group_pve_all <- estimated_group_prior_var_all*estimated_group_prior_all*group_size/sample_size #check PVE calculation

#estimated enrichment of genes (all iterations)
estimated_enrichment_all <- t(sapply(rownames(estimated_group_prior_all)[rownames(estimated_group_prior_all)!="SNP"], function(x){estimated_group_prior_all[rownames(estimated_group_prior_all)==x,]/estimated_group_prior_all[rownames(estimated_group_prior_all)=="SNP"]}))

title_size <- 12

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_all), nrow(estimated_group_prior_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_all), function(x){estimated_group_prior_all[x,]})),
                 group = rep(rownames(estimated_group_prior_all), each=ncol(estimated_group_prior_all)))

groupnames_for_plots <- sapply(unique(df$group), function(x){paste(sapply(unlist(strsplit(x, "_")), substr, start=1, stop=3), collapse="_")})

df$group <- groupnames_for_plots[df$group]
df$group <- as.factor(df$group)

p_pi <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi)) +
  ggtitle("Proportion Causal") +
  theme_cowplot()

p_pi <- p_pi + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_prior_var_all), nrow(estimated_group_prior_var_all)),
                 value = unlist(lapply(1:nrow(estimated_group_prior_var_all), function(x){estimated_group_prior_var_all[x,]})),
                 group = rep(rownames(estimated_group_prior_var_all), each=ncol(estimated_group_prior_var_all)))

groupnames_for_plots <- sapply(unique(df$group), function(x){paste(sapply(unlist(strsplit(x, "_")), substr, start=1, stop=3), collapse="_")})

df$group <- groupnames_for_plots[df$group]
df$group <- as.factor(df$group)

p_sigma2 <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(sigma^2)) +
  ggtitle("Effect Size") +
  theme_cowplot()

p_sigma2 <- p_sigma2 + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_group_pve_all), nrow(estimated_group_pve_all)),
                 value = unlist(lapply(1:nrow(estimated_group_pve_all), function(x){estimated_group_pve_all[x,]})),
                 group = rep(rownames(estimated_group_pve_all), each=ncol(estimated_group_pve_all)))

groupnames_for_plots <- sapply(unique(df$group), function(x){paste(sapply(unlist(strsplit(x, "_")), substr, start=1, stop=3), collapse="_")})

df$group <- groupnames_for_plots[df$group]
df$group <- as.factor(df$group)

p_pve <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(h^2[G])) +
  ggtitle("PVE") +
  theme_cowplot()

p_pve <- p_pve + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

df <- data.frame(niter = rep(1:ncol(estimated_enrichment_all), nrow(estimated_enrichment_all)),
                 value = unlist(lapply(1:nrow(estimated_enrichment_all), function(x){estimated_enrichment_all[x,]})),
                 group = rep(rownames(estimated_enrichment_all), each=ncol(estimated_enrichment_all)))

groupnames_for_plots <- sapply(unique(df$group), function(x){paste(sapply(unlist(strsplit(x, "_")), substr, start=1, stop=3), collapse="_")})

df$group <- groupnames_for_plots[df$group]
df$group <- as.factor(df$group)

p_enrich <- ggplot(df, aes(x=niter, y=value, group=group)) +
  geom_line(aes(color=group)) +
  geom_point(aes(color=group)) +
  xlab("Iteration") + ylab(bquote(pi[G]/pi[S])) +
  ggtitle("Enrichment") +
  theme_cowplot()

p_enrich <- p_enrich + theme(plot.title=element_text(size=title_size)) + 
  expand_limits(y=0) + 
  guides(color = guide_legend(title = "Group")) + theme (legend.title = element_text(size=12, face="bold"))

plot_grid(p_pi, p_sigma2, p_enrich, p_pve)

p_pi

p_sigma2

p_enrich

p_pve

####################

#estimated group prior
estimated_group_prior <- estimated_group_prior_all[,ncol(group_prior_rec)]
-sort(-estimated_group_prior)
                                Liver                                Spleen 
                         7.678925e-03                          2.761580e-03 
                 Esophagus_Muscularis                      Esophagus_Mucosa 
                         2.416817e-03                          1.824333e-03 
                     Brain_Cerebellum                      Colon_Transverse 
                         1.624084e-03                          1.512828e-03 
                      Artery_Coronary                                Testis 
                         1.230304e-03                          1.078124e-03 
               Heart_Atrial_Appendage   Esophagus_Gastroesophageal_Junction 
                         9.548189e-04                          9.270242e-04 
             Brain_Frontal_Cortex_BA9           Brain_Cerebellar_Hemisphere 
                         7.388320e-04                          6.720351e-04 
      Skin_Not_Sun_Exposed_Suprapubic                       Muscle_Skeletal 
                         5.493272e-04                          3.616994e-04 
                                 Lung                         Colon_Sigmoid 
                         3.428143e-04                          3.228810e-04 
           Skin_Sun_Exposed_Lower_leg     Cells_EBV-transformed_lymphocytes 
                         2.970045e-04                          2.435308e-04 
                              Stomach                                   SNP 
                         2.020489e-04                          1.465425e-04 
                        Artery_Tibial              Adipose_Visceral_Omentum 
                         5.263171e-05                          4.361397e-05 
                        Kidney_Cortex           Brain_Caudate_basal_ganglia 
                         2.447455e-05                          1.601047e-05 
                        Adrenal_Gland                              Prostate 
                         1.427112e-05                          7.575684e-06 
                              Thyroid                           Whole_Blood 
                         7.165434e-06                          2.644834e-06 
                             Pancreas                          Nerve_Tibial 
                         1.686359e-06                          1.147164e-06 
                       Brain_Amygdala                  Minor_Salivary_Gland 
                         9.591520e-07                          5.132029e-07 
                                Ovary                          Artery_Aorta 
                         4.110385e-07                          4.057736e-07 
               Brain_Substantia_nigra                 Breast_Mammary_Tissue 
                         3.793462e-07                          3.234640e-07 
          Brain_Putamen_basal_ganglia                             Pituitary 
                         2.377746e-07                          2.219286e-07 
       Small_Intestine_Terminal_Ileum                  Adipose_Subcutaneous 
                         1.569164e-07                          1.355395e-07 
                   Brain_Hypothalamus            Cells_Cultured_fibroblasts 
                         1.017948e-07                          9.264274e-08 
Brain_Nucleus_accumbens_basal_ganglia                     Brain_Hippocampus 
                         4.695558e-08                          3.640750e-08 
       Brain_Spinal_cord_cervical_c-1                  Heart_Left_Ventricle 
                         1.458363e-08                          8.809855e-09 
                               Uterus                          Brain_Cortex 
                         6.358487e-09                          4.205326e-09 
 Brain_Anterior_cingulate_cortex_BA24                                Vagina 
                         2.219845e-09                          1.534677e-09 
#estimated group prior variance
estimated_group_prior_var <- estimated_group_prior_var_all[,ncol(group_prior_var_rec)]
-sort(-estimated_group_prior_var)
                 Adipose_Subcutaneous              Adipose_Visceral_Omentum 
                             44.17900                              44.17900 
                        Adrenal_Gland                          Artery_Aorta 
                             44.17900                              44.17900 
                      Artery_Coronary                         Artery_Tibial 
                             44.17900                              44.17900 
                       Brain_Amygdala  Brain_Anterior_cingulate_cortex_BA24 
                             44.17900                              44.17900 
          Brain_Caudate_basal_ganglia           Brain_Cerebellar_Hemisphere 
                             44.17900                              44.17900 
                     Brain_Cerebellum                          Brain_Cortex 
                             44.17900                              44.17900 
             Brain_Frontal_Cortex_BA9                     Brain_Hippocampus 
                             44.17900                              44.17900 
                   Brain_Hypothalamus Brain_Nucleus_accumbens_basal_ganglia 
                             44.17900                              44.17900 
          Brain_Putamen_basal_ganglia        Brain_Spinal_cord_cervical_c-1 
                             44.17900                              44.17900 
               Brain_Substantia_nigra                 Breast_Mammary_Tissue 
                             44.17900                              44.17900 
           Cells_Cultured_fibroblasts     Cells_EBV-transformed_lymphocytes 
                             44.17900                              44.17900 
                        Colon_Sigmoid                      Colon_Transverse 
                             44.17900                              44.17900 
  Esophagus_Gastroesophageal_Junction                      Esophagus_Mucosa 
                             44.17900                              44.17900 
                 Esophagus_Muscularis                Heart_Atrial_Appendage 
                             44.17900                              44.17900 
                 Heart_Left_Ventricle                         Kidney_Cortex 
                             44.17900                              44.17900 
                                Liver                                  Lung 
                             44.17900                              44.17900 
                 Minor_Salivary_Gland                       Muscle_Skeletal 
                             44.17900                              44.17900 
                         Nerve_Tibial                                 Ovary 
                             44.17900                              44.17900 
                             Pancreas                             Pituitary 
                             44.17900                              44.17900 
                             Prostate       Skin_Not_Sun_Exposed_Suprapubic 
                             44.17900                              44.17900 
           Skin_Sun_Exposed_Lower_leg        Small_Intestine_Terminal_Ileum 
                             44.17900                              44.17900 
                               Spleen                               Stomach 
                             44.17900                              44.17900 
                               Testis                               Thyroid 
                             44.17900                              44.17900 
                               Uterus                                Vagina 
                             44.17900                              44.17900 
                          Whole_Blood                                   SNP 
                             44.17900                               6.38226 
#estimated enrichment
estimated_enrichment <- estimated_enrichment_all[,ncol(group_prior_var_rec)]
-sort(-estimated_enrichment)
                                Liver                                Spleen 
                         5.240068e+01                          1.884491e+01 
                 Esophagus_Muscularis                      Esophagus_Mucosa 
                         1.649226e+01                          1.244917e+01 
                     Brain_Cerebellum                      Colon_Transverse 
                         1.108269e+01                          1.032348e+01 
                      Artery_Coronary                                Testis 
                         8.395545e+00                          7.357077e+00 
               Heart_Atrial_Appendage   Esophagus_Gastroesophageal_Junction 
                         6.515646e+00                          6.325976e+00 
             Brain_Frontal_Cortex_BA9           Brain_Cerebellar_Hemisphere 
                         5.041760e+00                          4.585941e+00 
      Skin_Not_Sun_Exposed_Suprapubic                       Muscle_Skeletal 
                         3.748587e+00                          2.468222e+00 
                                 Lung                         Colon_Sigmoid 
                         2.339351e+00                          2.203327e+00 
           Skin_Sun_Exposed_Lower_leg     Cells_EBV-transformed_lymphocytes 
                         2.026747e+00                          1.661844e+00 
                              Stomach                         Artery_Tibial 
                         1.378774e+00                          3.591567e-01 
             Adipose_Visceral_Omentum                         Kidney_Cortex 
                         2.976200e-01                          1.670134e-01 
          Brain_Caudate_basal_ganglia                         Adrenal_Gland 
                         1.092548e-01                          9.738557e-02 
                             Prostate                               Thyroid 
                         5.169616e-02                          4.889664e-02 
                          Whole_Blood                              Pancreas 
                         1.804824e-02                          1.150765e-02 
                         Nerve_Tibial                        Brain_Amygdala 
                         7.828203e-03                          6.545215e-03 
                 Minor_Salivary_Gland                                 Ovary 
                         3.502076e-03                          2.804910e-03 
                         Artery_Aorta                Brain_Substantia_nigra 
                         2.768983e-03                          2.588643e-03 
                Breast_Mammary_Tissue           Brain_Putamen_basal_ganglia 
                         2.207306e-03                          1.622564e-03 
                            Pituitary        Small_Intestine_Terminal_Ileum 
                         1.514432e-03                          1.070791e-03 
                 Adipose_Subcutaneous                    Brain_Hypothalamus 
                         9.249158e-04                          6.946435e-04 
           Cells_Cultured_fibroblasts Brain_Nucleus_accumbens_basal_ganglia 
                         6.321903e-04                          3.204230e-04 
                    Brain_Hippocampus        Brain_Spinal_cord_cervical_c-1 
                         2.484433e-04                          9.951809e-05 
                 Heart_Left_Ventricle                                Uterus 
                         6.011810e-05                          4.339006e-05 
                         Brain_Cortex  Brain_Anterior_cingulate_cortex_BA24 
                         2.869698e-05                          1.514814e-05 
                               Vagina 
                         1.047257e-05 
#report sample size
print(sample_size)
[1] 343621
#report group size
print(group_size)
                                  SNP                  Adipose_Subcutaneous 
                              8696600                                 13003 
             Adipose_Visceral_Omentum                         Adrenal_Gland 
                                12878                                 11837 
                         Artery_Aorta                       Artery_Coronary 
                                12767                                 12088 
                        Artery_Tibial                        Brain_Amygdala 
                                12913                                 10993 
 Brain_Anterior_cingulate_cortex_BA24           Brain_Caudate_basal_ganglia 
                                11632                                 12278 
          Brain_Cerebellar_Hemisphere                      Brain_Cerebellum 
                                11977                                 12220 
                         Brain_Cortex              Brain_Frontal_Cortex_BA9 
                                12463                                 12203 
                    Brain_Hippocampus                    Brain_Hypothalamus 
                                11686                                 11834 
Brain_Nucleus_accumbens_basal_ganglia           Brain_Putamen_basal_ganglia 
                                12230                                 11920 
       Brain_Spinal_cord_cervical_c-1                Brain_Substantia_nigra 
                                11211                                 10753 
                Breast_Mammary_Tissue            Cells_Cultured_fibroblasts 
                                12813                                 12453 
    Cells_EBV-transformed_lymphocytes                         Colon_Sigmoid 
                                10647                                 12553 
                     Colon_Transverse   Esophagus_Gastroesophageal_Junction 
                                12752                                 12493 
                     Esophagus_Mucosa                  Esophagus_Muscularis 
                                12890                                 12949 
               Heart_Atrial_Appendage                  Heart_Left_Ventricle 
                                12297                                 11567 
                        Kidney_Cortex                                 Liver 
                                 9390                                 10985 
                                 Lung                  Minor_Salivary_Gland 
                                13316                                 11899 
                      Muscle_Skeletal                          Nerve_Tibial 
                                11907                                 13727 
                                Ovary                              Pancreas 
                                11867                                 11963 
                            Pituitary                              Prostate 
                                12773                                 12516 
      Skin_Not_Sun_Exposed_Suprapubic            Skin_Sun_Exposed_Lower_leg 
                                13269                                 13505 
       Small_Intestine_Terminal_Ileum                                Spleen 
                                12118                                 12319 
                              Stomach                                Testis 
                                12280                                 15723 
                              Thyroid                                Uterus 
                                13648                                 11328 
                               Vagina                           Whole_Blood 
                                11059                                 11139 
#estimated group PVE
estimated_group_pve <- estimated_group_pve_all[,ncol(group_prior_rec)]
-sort(-estimated_group_pve)
                                  SNP                                 Liver 
                         2.367052e-02                          1.084518e-02 
                               Spleen                  Esophagus_Muscularis 
                         4.373904e-03                          4.023613e-03 
                     Esophagus_Mucosa                      Brain_Cerebellum 
                         3.023383e-03                          2.551620e-03 
                     Colon_Transverse                                Testis 
                         2.480299e-03                          2.179417e-03 
                      Artery_Coronary                Heart_Atrial_Appendage 
                         1.912067e-03                          1.509581e-03 
  Esophagus_Gastroesophageal_Junction              Brain_Frontal_Cortex_BA9 
                         1.488998e-03                          1.159174e-03 
          Brain_Cerebellar_Hemisphere       Skin_Not_Sun_Exposed_Suprapubic 
                         1.034847e-03                          9.371421e-04 
                                 Lung                       Muscle_Skeletal 
                         5.869065e-04                          5.537150e-04 
                        Colon_Sigmoid            Skin_Sun_Exposed_Lower_leg 
                         5.211061e-04                          5.156961e-04 
    Cells_EBV-transformed_lymphocytes                               Stomach 
                         3.333629e-04                          3.190002e-04 
                        Artery_Tibial              Adipose_Visceral_Omentum 
                         8.737974e-05                          7.221214e-05 
                        Kidney_Cortex           Brain_Caudate_basal_ganglia 
                         2.954721e-05                          2.527364e-05 
                        Adrenal_Gland                               Thyroid 
                         2.171881e-05                          1.257325e-05 
                             Prostate                           Whole_Blood 
                         1.219056e-05                          3.787746e-06 
                             Pancreas                          Nerve_Tibial 
                         2.593739e-06                          2.024591e-06 
                       Brain_Amygdala                  Minor_Salivary_Gland 
                         1.355626e-06                          7.851195e-07 
                         Artery_Aorta                                 Ovary 
                         6.660531e-07                          6.271329e-07 
                Breast_Mammary_Tissue                Brain_Substantia_nigra 
                         5.328599e-07                          5.244469e-07 
                            Pituitary           Brain_Putamen_basal_ganglia 
                         3.644537e-07                          3.643996e-07 
       Small_Intestine_Terminal_Ileum                  Adipose_Subcutaneous 
                         2.444755e-07                          2.265925e-07 
                   Brain_Hypothalamus            Cells_Cultured_fibroblasts 
                         1.548793e-07                          1.483274e-07 
Brain_Nucleus_accumbens_basal_ganglia                     Brain_Hippocampus 
                         7.383289e-08                          5.470070e-08 
       Brain_Spinal_cord_cervical_c-1                  Heart_Left_Ventricle 
                         2.102065e-08                          1.310164e-08 
                               Uterus                          Brain_Cortex 
                         9.260686e-09                          6.738425e-09 
 Brain_Anterior_cingulate_cortex_BA24                                Vagina 
                         3.319810e-09                          2.182071e-09 
#total PVE
sum(estimated_group_pve)
[1] 0.06429501
#attributable PVE
-sort(-estimated_group_pve/sum(estimated_group_pve))
                                  SNP                                 Liver 
                         3.681549e-01                          1.686784e-01 
                               Spleen                  Esophagus_Muscularis 
                         6.802868e-02                          6.258049e-02 
                     Esophagus_Mucosa                      Brain_Cerebellum 
                         4.702360e-02                          3.968612e-02 
                     Colon_Transverse                                Testis 
                         3.857685e-02                          3.389715e-02 
                      Artery_Coronary                Heart_Atrial_Appendage 
                         2.973896e-02                          2.347897e-02 
  Esophagus_Gastroesophageal_Junction              Brain_Frontal_Cortex_BA9 
                         2.315883e-02                          1.802898e-02 
          Brain_Cerebellar_Hemisphere       Skin_Not_Sun_Exposed_Suprapubic 
                         1.609529e-02                          1.457566e-02 
                                 Lung                       Muscle_Skeletal 
                         9.128337e-03                          8.612099e-03 
                        Colon_Sigmoid            Skin_Sun_Exposed_Lower_leg 
                         8.104923e-03                          8.020779e-03 
    Cells_EBV-transformed_lymphocytes                               Stomach 
                         5.184896e-03                          4.961508e-03 
                        Artery_Tibial              Adipose_Visceral_Omentum 
                         1.359044e-03                          1.123137e-03 
                        Kidney_Cortex           Brain_Caudate_basal_ganglia 
                         4.595569e-04                          3.930887e-04 
                        Adrenal_Gland                               Thyroid 
                         3.377992e-04                          1.955557e-04 
                             Prostate                           Whole_Blood 
                         1.896035e-04                          5.891197e-05 
                             Pancreas                          Nerve_Tibial 
                         4.034122e-05                          3.148909e-05 
                       Brain_Amygdala                  Minor_Salivary_Gland 
                         2.108446e-05                          1.221120e-05 
                         Artery_Aorta                                 Ovary 
                         1.035933e-05                          9.753990e-06 
                Breast_Mammary_Tissue                Brain_Substantia_nigra 
                         8.287733e-06                          8.156884e-06 
                            Pituitary           Brain_Putamen_basal_ganglia 
                         5.668460e-06                          5.667619e-06 
       Small_Intestine_Terminal_Ileum                  Adipose_Subcutaneous 
                         3.802402e-06                          3.524262e-06 
                   Brain_Hypothalamus            Cells_Cultured_fibroblasts 
                         2.408885e-06                          2.306982e-06 
Brain_Nucleus_accumbens_basal_ganglia                     Brain_Hippocampus 
                         1.148346e-06                          8.507768e-07 
       Brain_Spinal_cord_cervical_c-1                  Heart_Left_Ventricle 
                         3.269406e-07                          2.037738e-07 
                               Uterus                          Brain_Cortex 
                         1.440343e-07                          1.048048e-07 
 Brain_Anterior_cingulate_cortex_BA24                                Vagina 
                         5.163403e-08                          3.393842e-08 

Top gene+tissue pairs by PIP

#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
          genename                               group region_tag susie_pip
547056       PCSK9                         Whole_Blood       1_34 1.0000000
549667        APOB                    Esophagus_Mucosa       2_13 1.0000000
556346         LPA                              Testis      6_104 1.0000000
575793      ZDHHC7                    Esophagus_Mucosa      16_49 0.9999998
565521       USP28                     Muscle_Skeletal      11_67 0.9999998
550896      INSIG2                               Liver       2_69 0.9999989
561421       ABCA1                               Liver       9_53 0.9998566
558457        TNKS                               Liver       8_12 0.9998403
574913         HPR                               Liver      16_38 0.9998230
548544       MARC1                    Colon_Transverse      1_112 0.9998023
569640        GAS6                               Liver      13_62 0.9993799
556911         SP4                               Liver       7_19 0.9992645
554294      TRIM39                               Liver       6_24 0.9990462
552709     CSNK1G3                               Liver       5_75 0.9975428
551821         PXK                     Artery_Coronary       3_40 0.9943961
596636       RRBP1                               Liver      20_13 0.9942847
576935       ASGR1 Esophagus_Gastroesophageal_Junction       17_6 0.9920015
572815       LITAF            Brain_Frontal_Cortex_BA9      16_12 0.9915327
564544       FADS1                               Liver      11_34 0.9878783
552269        MITF                              Spleen       3_47 0.9860400
594860       FCGRT                    Esophagus_Mucosa      19_34 0.9833651
568774        FLT3                                Lung       13_7 0.9833530
574865      TXNL4B              Heart_Atrial_Appendage      16_38 0.9771763
563369       ADRB1                              Spleen      10_71 0.9761541
547844       PSRC1                               Liver       1_67 0.9748017
571482      NYNRIN                               Liver       14_3 0.9739043
549519        ALLC                               Liver        2_2 0.9725759
558572         LPL                              Testis       8_21 0.9692914
553423       OR2H2         Brain_Cerebellar_Hemisphere       6_23 0.9655675
584418       PDE4C                    Brain_Cerebellum      19_14 0.9650869
546898       PCSK9     Skin_Not_Sun_Exposed_Suprapubic       1_34 0.9649245
565384        DRD2                    Brain_Cerebellum      11_67 0.9632179
575482      OSGIN1                               Liver      16_48 0.9624309
549089 RP4-781K5.7                               Liver      1_121 0.9564623
574604      TXNL4B                    Brain_Cerebellum      16_38 0.9525200
588730      CYP2A6                               Liver      19_28 0.9509540
556456        RAC1                    Brain_Cerebellum        7_8 0.9454510
551240      ACVR1C                               Liver       2_94 0.9346819
554772      NT5DC1                     Artery_Coronary       6_77 0.9344266
569060     N4BP2L1 Esophagus_Gastroesophageal_Junction      13_10 0.9264501
567411       LRRK2                               Liver      12_25 0.9185921
562883        GPAM Esophagus_Gastroesophageal_Junction      10_70 0.9143740
546109       ASAP3                               Liver       1_16 0.9120411
556072     SLC22A1          Skin_Sun_Exposed_Lower_leg      6_103 0.9057635
569077     N4BP2L1              Heart_Atrial_Appendage      13_10 0.9014084
566858     ST3GAL4                               Liver      11_77 0.8912349
550837  AC073257.2                Esophagus_Muscularis       2_69 0.8799692
556900         SP4                Esophagus_Muscularis       7_19 0.8798374
576072       FOXF1              Heart_Atrial_Appendage      16_51 0.8775131
559803        PLEC                    Colon_Transverse       8_94 0.8758950
568045       STAC3                Esophagus_Muscularis      12_36 0.8724693
561517       GATA3                     Artery_Coronary       10_8 0.8701767
579031        PGS1                Esophagus_Muscularis      17_44 0.8662931
591107     IRF2BP1                Esophagus_Muscularis      19_32 0.8639131
562257     CYP26C1         Brain_Cerebellar_Hemisphere      10_59 0.8588801
593071        NTN5     Skin_Not_Sun_Exposed_Suprapubic      19_33 0.8552348
597990        PLTP                               Liver      20_28 0.8494840
592692       PRKD2                               Liver      19_33 0.8451069
551384     FAM117B         Brain_Cerebellar_Hemisphere      2_120 0.8397245
598959       BRWD1     Skin_Not_Sun_Exposed_Suprapubic      21_19 0.8298207
558329      CLDN23                               Liver       8_11 0.8257987
561220      TTC39B                               Liver       9_13 0.8254973
587705      ZNF486                     Artery_Coronary      19_16 0.8254483
545593     KLHDC7A                               Liver       1_13 0.8183601
547104       PRMT6                               Liver       1_66 0.8127810
557464       TMED4                Esophagus_Muscularis       7_32 0.8111055
558067        BRI3                               Liver       7_60 0.8074427
573170        CETP                        Artery_Aorta      16_30 0.8048222
550132       KCNK3              Heart_Atrial_Appendage       2_15 0.8034501
              mu2          PVE          z
547056  303.35442 8.828169e-04 -23.237813
549667  803.64279 2.338748e-03  38.857959
556346  369.40561 1.075038e-03 -18.873327
575793   52.06682 1.515239e-04  -6.906720
565521  282.22193 8.213173e-04  -4.621519
550896   69.93727 2.035301e-04   8.982702
561421   71.45438 2.079155e-04  -7.982017
558457   85.76712 2.495582e-04 -11.038564
574913  286.83096 8.345828e-04  17.962770
548544   93.23112 2.712660e-04  -9.668702
569640   79.53000 2.313033e-04   8.923688
556911  106.60401 3.100090e-04 -10.693191
554294   76.32893 2.219193e-04  -8.840164
552709   84.47295 2.452277e-04  -9.116291
551821   66.54059 1.925601e-04   7.955317
596636   33.92595 9.816644e-05  -7.008305
576935   89.86027 2.594182e-04  -9.644806
572815   36.64564 1.057425e-04   5.772374
564544  161.56736 4.644911e-04 -12.926351
552269   34.59269 9.926569e-05  -5.519279
594860   30.83439 8.824101e-05   4.135964
568774   37.29779 1.067365e-04   5.820864
574865  218.28388 6.207474e-04  -2.264888
563369   41.94337 1.191522e-04   6.216518
547844 1663.80182 4.719958e-03  41.687336
571482   48.51887 1.375141e-04  -7.009952
549519   28.05778 7.941401e-05  -4.919066
558572   34.48204 9.726745e-05   5.563062
553423   49.47939 1.390360e-04   7.374266
584418   47.30392 1.328568e-04   6.633160
546898  105.32673 2.957687e-04  15.890031
565384  112.70967 3.159410e-04  -3.250305
575482   50.80867 1.423075e-04  -6.907310
549089  204.21060 5.684162e-04  15.108415
574604  259.55754 7.194955e-04 -18.129274
588730   31.43019 8.698148e-05  -5.407028
556456   36.02546 9.912173e-05  -5.713928
551240   25.16317 6.844624e-05   4.687370
554772   60.67408 1.649942e-04   7.669070
569060   39.73024 1.071183e-04  -6.647944
567411   27.12869 7.252233e-05  -4.792808
562883   57.80639 1.538226e-04  -7.194356
546109   32.75142 8.692903e-05  -5.283225
556072   57.82240 1.524162e-04   7.155007
569077   47.36497 1.242508e-04  -7.250966
566858  172.64250 4.477754e-04 -13.376072
550837   29.24086 7.488210e-05  -5.046154
556900   49.52584 1.268103e-04   7.330657
576072   29.23553 7.465946e-05  -5.012221
559803   48.68217 1.240916e-04   6.767908
568045   39.24675 9.964928e-05   5.874553
561517   29.56814 7.487758e-05  -4.990067
579031   53.79800 1.356286e-04   7.140667
591107   84.94768 2.135708e-04  -8.060102
562257   67.16139 1.678698e-04  -8.237356
593071  115.13788 2.865655e-04 -12.110745
597990   62.84694 1.553673e-04   5.732491
592692   30.07464 7.396605e-05  -5.072217
551384   57.22370 1.398405e-04   8.348197
598959  571.94914 1.381217e-03  -2.982534
558329   25.91277 6.227422e-05  -4.720010
561220   23.70093 5.693790e-05   4.334495
587705   51.56357 1.238663e-04  -6.991792
545593   22.75472 5.419213e-05  -4.124187
547104   32.10979 7.595062e-05   5.323721
557464   63.38956 1.496288e-04  -9.575315
558067   30.20908 7.098548e-05   5.140136
573170  205.65382 4.816782e-04 -14.582596
550132   26.95019 6.301458e-05  -4.634044

Top genes by combined PIP

#aggregate by gene name
df_gene <- aggregate(ctwas_gene_res$susie_pip, by=list(ctwas_gene_res$genename), FUN=sum)
colnames(df_gene) <- c("genename", "combined_pip")

#drop duplicated gene names
df_gene <- df_gene[!(df_gene$genename %in% names(which(table(ctwas_gene_res$genename)>length(weight)))),]

#collect tissue-level results
all_tissue_names <- unique(ctwas_gene_res$group)

df_gene_pips <- matrix(NA, nrow=nrow(df_gene), ncol=length(all_tissue_names))
colnames(df_gene_pips) <- all_tissue_names

for (i in 1:nrow(df_gene)){
  #print(i)
  
  gene <- df_gene$genename[i]
  ctwas_gene_res_subset <- ctwas_gene_res[ctwas_gene_res$genename==gene,]
  df_gene_pips[i,ctwas_gene_res_subset$group] <- ctwas_gene_res_subset$susie_pip
}

df_gene <- cbind(df_gene, df_gene_pips)

#sort by combined PIP
df_gene <- df_gene[order(-df_gene$combined_pip),]

df_gene <- df_gene[,apply(df_gene, 2, function(x){!all(is.na(x))})]

#genes with PIP>0.8 or 20 highest PIPs
head(df_gene[,c("genename", "combined_pip", "Liver")], max(sum(df_gene$combined_pip>0.8), 20))
           genename combined_pip        Liver
11391       N4BP2L1    2.6363568 2.779247e-03
8899           LDLR    2.0000000           NA
18533           SP4    1.9678399 9.992645e-01
12659         PCSK9    1.9653204           NA
20423        TXNL4B    1.9360994           NA
8356          KCNK3    1.8636853           NA
78            ABCG8    1.7463517 5.485313e-01
36            ABCA1    1.7131846 9.998566e-01
12513         PARP9    1.6919785 1.311536e-02
21260        ZDHHC7    1.4476807 4.802575e-03
17961       SLC22A1    1.3371909           NA
593          ACVR1C    1.2481918 9.346819e-01
362      AC073257.2    1.2111168 1.137434e-03
4991        DNAJC13    1.2050427 1.667727e-01
12119        NYNRIN    1.1151374 9.739043e-01
10045         LITAF    1.1136612 1.025066e-01
13867           PXK    1.0880080 4.587298e-03
10495         MARC1    1.0797668 6.429089e-02
539            ACP6    1.0547871 4.519575e-02
6142          FCGRT    1.0407669 1.563035e-02
3648          CNIH4    1.0403337 6.068566e-01
1511          ASGR1    1.0365428           NA
2062       C10orf88    1.0352894 7.367786e-03
7627            HPR    1.0332530 9.998230e-01
19490         TIMD4    1.0332270           NA
7526          HMGCR    1.0327456           NA
6583           GAS6    1.0308308 9.993799e-01
2404       C6orf106    1.0304899 1.152956e-02
3283           CETP    1.0300162 9.107228e-03
13730         PSRC1    1.0284621 9.748017e-01
13159          PLTP    1.0271601 8.494840e-01
4006        CSNK1G3    1.0251799 9.975428e-01
5767          FADS1    1.0246555 9.878783e-01
11330         MYLIP    1.0225010 4.908059e-03
11866        NPC1L1    1.0211863 5.712926e-01
18799       ST3GAL4    1.0210225 8.912349e-01
7494        HLA-DOB    1.0194407 9.106181e-03
13720         PSMG1    1.0164963 8.163365e-02
20690         USP28    1.0160222 2.237867e-03
18700       SPTY2D1    1.0158874 4.935378e-01
17203         RRBP1    1.0152336 9.942847e-01
12771          PELO    1.0144536 3.174076e-02
6293           FLT3    1.0139908 1.176088e-02
7141          GSK3B    1.0137367 6.607167e-02
13021          PKN3    1.0135939 1.959753e-01
1607         ATP1B2    1.0129527 4.299207e-01
8023         INSIG2    1.0118501 9.999989e-01
7393            HFE    1.0116575 5.033274e-03
12861          PHC1    1.0115913 3.814000e-02
20125        TRIM39    1.0109401 9.990462e-01
7544          HNF1A    1.0093247 8.859540e-03
5798        FAM117B    1.0092980           NA
19919          TNKS    1.0050947 9.998403e-01
4539         CYP2A6    1.0045799 9.509540e-01
10866          MITF    1.0039619 9.839026e-03
10113           LPA    1.0008389 3.634569e-04
16947   RP4-781K5.7    1.0005515 9.564623e-01
8426           KDSR    1.0000915 3.953176e-01
1263           APOB    1.0000000 1.578648e-11
13992          RAC1    0.9999827 5.527332e-03
18848        STAT5B    0.9997169 4.538253e-01
46            ABCA8    0.9988278           NA
13294          POP7    0.9988130 4.204977e-01
734           ADRB1    0.9980826 7.291318e-03
12853          PGS1    0.9927453 1.197515e-02
958            ALLC    0.9919668 9.725759e-01
5605         ERGIC3    0.9904117           NA
12000        NT5DC1    0.9890894 1.585599e-02
21471        ZNF329    0.9875014 1.913257e-01
10127           LPL    0.9868029 5.415695e-03
12693         PDE4C    0.9831384 3.175422e-03
17779        SH3TC1    0.9805800 1.217851e-02
18181        SLC4A7    0.9799874 1.266394e-02
12242         OR2H2    0.9794274           NA
13087          PLEC    0.9786905 1.337186e-02
20717         USP53    0.9777320 3.233507e-01
5064         DOPEY2    0.9773328 5.970757e-03
5118           DRD2    0.9769902           NA
13535         PRKD2    0.9764914 8.451069e-01
77            ABCG5    0.9758083           NA
13552         PRMT6    0.9745100 8.127810e-01
12639        PCMTD2    0.9730153 2.071639e-01
12336        OSGIN1    0.9706080 9.624309e-01
17846         SIPA1    0.9675438 8.920474e-03
1486          ASAP3    0.9653063 9.120411e-01
4566        CYP4F12    0.9651260 7.936832e-03
17733         SGMS1    0.9620011 6.978874e-01
8089        IRF2BP1    0.9615833           NA
4534        CYP26C1    0.9614738           NA
20828        VPS37D    0.9546597 3.120892e-02
6929           GPAM    0.9522601           NA
1816          BCAT2    0.9511309 4.932736e-01
19074         SYTL1    0.9472628 4.907381e-01
6358          FOXF1    0.9443884 3.630297e-02
6313            FN1    0.9432315           NA
20318        TTC39B    0.9391861 8.254973e-01
7300          HBS1L    0.9372965 3.337573e-01
5297          EEPD1    0.9349431 4.929160e-01
10243         LRRK2    0.9341073 9.185921e-01
7852          IGF2R    0.9224562 4.050024e-01
2903          CCND2    0.9218828 3.509243e-01
18067      SLC2A4RG    0.9207592 6.892090e-02
224      AC007950.2    0.9189124 4.764393e-03
3902        CRACR2B    0.9163486 5.124684e-01
20673          USP1    0.9128224 7.321420e-01
3986          CSE1L    0.9082353 9.336825e-03
12015          NTN5    0.9038029           NA
7143            GSN    0.9022591 5.765758e-01
3517         CLDN23    0.9018924 8.257987e-01
18824         STAC3    0.8980901           NA
10320          LY96    0.8942083           NA
18749          SRRT    0.8864554 6.965266e-01
6589          GATA3    0.8795033           NA
2078       C11orf58    0.8749333 4.064828e-01
8580        KLHDC7A    0.8729319 8.183601e-01
4439           CTSH    0.8726960 2.717720e-01
1993          BRWD1    0.8698207           NA
6881         GOLGA3    0.8588020 5.439460e-03
19594         TMED4    0.8583463 6.035695e-03
21561        ZNF486    0.8566501 2.531606e-02
2957        CD163L1    0.8533031 2.199325e-02
20897          WBP4    0.8518498           NA
9407      LINC01184    0.8487412 2.824512e-01
16541 RP11-766F14.2    0.8449696           NA
13930         RAB21    0.8424570 7.542498e-03
1977           BRI3    0.8316413 8.074427e-01
18496        SORCS2    0.8314154 4.120853e-01
19454         THOP1    0.8309397 2.491448e-01
11443        NAP1L4    0.8286951 8.186003e-03
1809          BCAR1    0.8276299           NA
11842          NOS3    0.8220421 9.106728e-03
5817        FAM134B    0.8204765           NA
19991      TP53INP1    0.8201000 5.168250e-01
20788          VIL1    0.8156495 4.510462e-01
11657           NF1    0.8153226 9.651858e-03
17489          SDC4    0.8149133 7.844299e-01
3961          CRTC3    0.8100111 7.015122e-01
20136         TRIM5    0.8095141 2.661279e-02
6116         FBXO46    0.8069903           NA
4099   CTB-50L17.10    0.8055965 7.615144e-01
write.csv(df_gene, file="LDL_all_tissue_PIPs.csv")

all_tissue_genes <- df_gene$genename[df_gene$combined_pip>0.8]

Silver Standard

library(readxl)

known_annotations <- read_xlsx("data/summary_known_genes_annotations.xlsx", sheet="LDL")
New names:
• `` -> `...4`
• `` -> `...5`
known_annotations <- unique(known_annotations$`Gene Symbol`)

#number of silver standard genes in all tissue genes
sum(all_tissue_genes %in% known_annotations)
[1] 15
#list silver standard genes detected

all_tissue_genes[all_tissue_genes %in% known_annotations]
 [1] "LDLR"   "PCSK9"  "ABCG8"  "ABCA1"  "HMGCR"  "CETP"   "PLTP"   "FADS1" 
 [9] "MYLIP"  "NPC1L1" "TNKS"   "LPA"    "APOB"   "LPL"    "ABCG5" 

sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /software/R-4.1.0-no-openblas-el7-x86_64/lib64/R/lib/libRblas.so
LAPACK: /software/R-4.1.0-no-openblas-el7-x86_64/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C           
 [4] LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=C       
 [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C        
[10] LC_TELEPHONE=C       LC_MEASUREMENT=C     LC_IDENTIFICATION=C 

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] readxl_1.4.1        VennDiagram_1.7.3   futile.logger_1.4.3
[4] cowplot_1.1.1       ggplot2_3.4.0       workflowr_1.7.0    

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0     xfun_0.35            bslib_0.4.1         
 [4] colorspace_2.0-3     vctrs_0.5.1          generics_0.1.3      
 [7] htmltools_0.5.4      yaml_2.3.6           utf8_1.2.2          
[10] blob_1.2.3           rlang_1.0.6          jquerylib_0.1.4     
[13] later_1.3.0          pillar_1.8.1         withr_2.5.0         
[16] glue_1.6.2           DBI_1.1.3            bit64_4.0.5         
[19] lambda.r_1.2.4       lifecycle_1.0.3      stringr_1.5.0       
[22] cellranger_1.1.0     munsell_0.5.0        gtable_0.3.1        
[25] evaluate_0.18        memoise_2.0.1        labeling_0.4.2      
[28] knitr_1.41           callr_3.7.3          fastmap_1.1.0       
[31] httpuv_1.6.6         ps_1.7.2             fansi_1.0.3         
[34] highr_0.9            Rcpp_1.0.9           formatR_1.12        
[37] promises_1.2.0.1     scales_1.2.1         cachem_1.0.6        
[40] jsonlite_1.8.4       farver_2.1.1         fs_1.5.2            
[43] bit_4.0.5            digest_0.6.31        stringi_1.7.8       
[46] processx_3.8.0       dplyr_1.0.10         getPass_0.2-2       
[49] rprojroot_2.0.3      cli_3.4.1            tools_4.1.0         
[52] magrittr_2.0.3       sass_0.4.4           tibble_3.1.8        
[55] RSQLite_2.2.19       futile.options_1.0.1 whisker_0.4.1       
[58] pkgconfig_2.0.3      data.table_1.14.6    assertthat_0.2.1    
[61] rmarkdown_2.18       httr_1.4.4           rstudioapi_0.14     
[64] R6_2.5.1             git2r_0.30.1         compiler_4.1.0