FAQ
How do I use the R package?
The R package is used to download the data from the
API and perform the colocalization and rare variant analysis.
How do I upload my own data?
You can upload your own GWAS summary statistics to run colocalization and rare variant analysis
against the GPMap database. You can optionally specify one or more existing GWAS upload GUIDs to
also compare your upload against those (in addition to the main database).
-
Use the Upload GWAS for Comparison form on the
homepage.
-
Use the gpmapr R package:
gpmapr::upload_gwas(
file = 'gwas.tsv.gz',
name = 'My new GWAS',
email = 'me@example.com',
column_names = list(...),
...
)
How does the GWAS upload pipeline work?
The GWAS upload pipeline is a process that allows you to upload a GWAS and perform the
colocalization and rare variant analysis. It uses the same data processing pipeline used to created
this resource, with some caveats. There are a series of steps that are taken to process the data,
some of which will remove the data and make it look potentially inconsistent.
-
Filtered list of comparisons: Due to server constraints, only studies that have a minimum
p-value of 1e-6 for that ld block will be compared with your GWAS.
-
No GWAS QC step: Due to server constraints, DENTIST (which is used in the pipeline) is
not run on any GWAS Uploads.
-
Sparesly Populated Studies Not Supported: Only studies with a minimum of 150 samples in a
significant ld block will be processed.
-
Rare Variant Analysis Not Supported: Only samples with a MAF of 0.01 or greater will be
processed.
-
Conditional Imputation: Imputation is only performed if the correlation between origina
and imputed SNPs is greater than 0.7.
-
Conditional Finemapping: If finemapping only finds a single credible set or does not
converge, then the LBF values are merely calculated from the original summary statistics.
-
Missing LBF Values: There are a series of finemapping filtering steps that find and
remove over inflated LBF values, these are removed from the analysis. Hence you may see some
missing LBF values in the results.
How do I interpret this graph?
Graph options
- Study P-value: The p-value threshold for the traits to be displayed in the legend.
- Include Trans Markers: Whether to include trans markers in the graph.
-
Trait Types: The type of trait to be displayed in the graph. 'Molecular Only' will still
include the phenotype in question on the phenotype view, 'Phenotype Only' will not include
molecular traits.
-
Trait Categories: The category of trait to be displayed in the graph. If no categories
are selected, all traits will be displayed.
Trait view
Displays colocalised results of the study in question, and shows all studies which colocalise with
it, overlayed on top of the the Manhattan plot of the phenotype. Also displays significant rare and
non-colocalising results. To compare 2 specific traits, please use the 'Filter Results By' dropdown.
-
Colocalised Results: Displays colocalised results of phenotype in question, and shows all
studies which colocalise with it
-
Rare association Results: These are not colocalization groups, but single SNP
associations that both show significant association with the phenotype in question.
-
Circle size: The size of the circle is proportional to the number of traits in the
colocalisation group. The larger the circle, the more traits are in the colocalisation group.
-
Result Filtering: To compare 2 specific traits, please use the 'Filter Results By'
dropdown, this will filter the results to only show the traits that are selected.
Variant view
SNP view displays the colocalisation results for a single SNP. Each circle represents a trait that
is in the colocalisation group.
-
How the SNP is chosen: The SNP is chosen as the cumulative sum of the log bayes factor,
which is returned by susie. Every trait in
the colocalisation group is included in the cumulative sum, and the maximum is taken.
-
Node size: Each node is sized by the p-value of the SNP for that specific trait, the
smaller the p-value, the larger the node.
-
Links: Each link represents a colocalization pair result, as returned by
coloc.
-
Link Strength: The strength of the link is displayed as the H4 value, which is a measure
of the strength of the colocalization pair result. A significant link (H4 > 0.8) is displayed in
blue, a weak link (0.8 > H4 > 0.5) is displayed in orange.
-
Group Connectedness: The connectedness of the colocalisation group is calculated as the
sum of the H4 values of the coloc pairs, divided by the total number of coloc pairs in the
gorup. A higher connectedness means that the colocalisation group is more strongly connected to
other colocalisation groups.
- Trait Type: The type of trait is displayed in the legend.
-
Common vs Rare: The common groups are visualised in the graph, but the rare results are
not, they are included in the results table and forest plot
-
VEP annotation: The data displayed under VEP annotation is data related to the SNP, as
provided by the
Ensembl Variant Effect Predictor
Gene and region view
Displays the colocalisation results for a gene or region. Each circle represents a result that has a
study marked with that gene. Results may not align with the exonic region of the gene, as some
studies may have QTLs which are in the regulatory region of the gene.
-
Colocalised Results: Displays colocalised results of phenotype in question, and shows all
studies which colocalise with it
-
Rare association Results: These are not colocalization groups, but single SNP
associations that both show significant association with the phenotype in question.
-
Circle size: The size of the circle is proportional to the number of traits in the
colocalisation group. The larger the circle, the more traits are in the colocalisation group.
-
Surrounding Genes: Below the graph, the surrounding genes are displayed. These are genes
that are in the same region as the gene in question, and have a study marked with them.
-
Result Filtering: You can filter by another surrounding gene by clicking on the gene in
question. To compare 2 specific traits, please use the 'Filter Results By' dropdown, this will
filter the results to only show the traits that are selected.
-
Pleiotropy Scores: There are two pleiotropy scores displayed, the first is the number of
distinct trait categories and the second is the number of distinct protein coding genes that the
trait is associated with. They are calculated by counting the number of distinct trait
categories and protein coding genes that the gene is associated with in the colocalisation
results. Rare variant results are not included in the calculation.