About the Genotype-Phenotype Map (GPMap)

Developed at the MRC Integrative Epidemiology Unit (IEU) at the University of Bristol, The Human Genotype-Phenotype Map (GPMap) is an integrated discovery engine designed to bridge the gap between GWAS discovery and functional follow-up. While standard browsers identify genes in proximity to lead SNPs, the GPMap uses rigorous fine-mapping and colocalization to identify causal links between thousands of complex traits and molecular layers (eQTL, pQTL, sQTL, and methQTL).

For licensing, privacy, and service terms, please see our Terms of Use.

Data Processing Diagram

Core Capabilities

  • Causal Locus Resolution: Transition from "nearest gene" heuristics to empirical evidence. By scanning Colocalization Groups (CGs), you can identify the specific phenotypes and molecular mechanisms sharing a genetic architecture at a single locus.
  • Systemic Pleiotropy & Comorbidity: Instantly visualize "pleiotropic neighbors." The GPMap allows you to deconvolve whether a variant affects multiple traits independently (horizontal pleiotropy) or acts through a molecular mediator like a protein (vertical pleiotropy).
  • Precision MR Instruments: Streamline Mendelian Randomization by selecting instruments backed by high colocalization posterior probabilities (H4>0.8). This minimizes "LD-contamination" and ensures your IVs are functionally relevant.
  • User-Led Extensibility: Beyond our library of 4,500+ traits, you can upload your own GWAS summary statistics. The platform will automatically run fine-mapping and colocalization against our entire multi-omic database to identify supported mechanisms for your novel hits.

Accessing the Map

The GPMap is an open-access resource available via our web interface or the gpmapr R package for programmatic analysis.

Quick Start with gpmapr

# Install the package
devtools::install_github('MRCIEU/gpmapr')

# Search for traits, genes, or variants
gpmapr::search_gpmapr('Haemoglobin')

# Retrieve high-resolution data for a specific target
gpmapr::gene('TREM2')

# Project your own results against the map
gpmapr::upload_gwas(file = 'my_gwas.tsv.gz', name = 'Discovery Study', ...)

Project Components

The Genotype-Phenotype Map is comprised of 3 distinct efforts:

  • The data processing pipeline (which includes GWAS Upload)
  • An R package
  • The website and API

Nomenclature

We have defined these terms as follows:

  • Traits: A trait denotes the outcome variable assessed in any of the GWAS from which we have taken summary statistics for the GPMap. The trait names remain as defined in the original study. Other commonly used names for trait might be 'phenotype' or 'study'.
    • Complex Trait: represent polygenic organismal phenotypes and clinical disease states. All complex traits that have genome wide associations with them.
    • Molecular Trait: representing discrete cellular processes such as mRNA expression or protein abundance. Molecular traits do not typically have genome wide data associated with them, but rather have significant cis (and sometimes trans) signals extracted in specific genomic regions, and are associated with a gene and tissue.
    • Trait Category: Traits were categorized into 23 categories. This was achieved by generating a prompt for OpenAI model gpt-5-nano for each complex trait analyzed (excluding molecular phenotypes) requesting the best matching category for each trait name along with a confidence score for the match. Trait to category mapping was manually inspected. A number of traits had a low category match confidence score but were retained for completeness. Some traits were too broad and therefore were manually set to undefined
    • A third category of trait, which falls in between these, are ultra-specific measurements which have genome-wide associations, as opposed to cis- and trans- windows. These have been denoted as 'Cell Traits' (e.g. 'IgD- CD27- B cell %B cell') and 'Targeted Protein Measure' (e.g. 'VDBP plasma levels') are neither considered a complex trait nor a molecular trait.
  • Pleiotropy score: There are two pleiotropy scores calculated on both the variant and gene level. The first is the number of distinct trait categories, and the second is the number of distinct protein coding genes, that fall in colocalization groups tagged by the variant (variant level) or containing gene-specific QTLs (gene level). Rare variant results are not included in the calculation.
  • Coverage: dense vs. sparse. Summary statistics that only published results of SNPs that reached a specific p-value threshold are considered sparsely populated, all others are considered densely populated. Sparsely populated summary statistics had their missing values 0-padded and both imputation and fine-mapping steps were skipped.
  • Cell Type: Some QTLs were derived from 'single cell' expression assays which derive gene expression measurements from specific cell subsets following single-cell transcriptional profiling. A list of cell types currently in the map are in Supplementary Table 11.
  • Gene Annotation: There are two different types of gene annotation that occur for QTLs, 'gene' and 'situated gene'. 'Gene' refers to the gene that was assayed and is annotated by the QTL resource itself (e.g. GTEx gene expression of TREML2 in blood is annotated as 'TREML2'). 'Situated gene' is currently only applicable for rare variants studies conducted using whole-exome sequencing data and denotes the gene in which the variant is physically situated. Common variant QTL studies have a 'Gene', most rare variant phenotypic studies have a 'Situated Gene', and rare variant QTL studies have both a 'Gene' and a 'Situated Gene', which may differ. Genes have been assigned to methQTLs based on the proximity of the assayed CpG site to a gene. These were taken from the Illumina Methylation EPICv2 manifest(1), tagging CpGs by their proximity to gene bodies and promoter regions.
  • Colocalization definitions:
    • Colocalizing Pair: A colocalization pair is a single colocalization test run between two traits using coloc(2). Two traits are considered to be a pair if H4 ≥ 0.8.
    • Colocalization Group: A set of traits that have been grouped together by a graph-based clustering and pruning methods from the results of pairwise colocalization analysis. A detailed explanation of the clustering and pruning methods can be found in Supplementary Notes 2-3.
    • Group Connectedness Percentage: The connectedness of the colocalization group is calculated as the total count of the H4≥0.8 across all colocalization pairs (edges) in the group, divided by the total number of possible edges in the colocalization group with n nodes, n(n-1)/2. A higher connectedness means that the colocalization group is more strongly connected.
    • Candidate Variant: Each colocalization group is assigned a SNP. The SNP is chosen as the variant with the highest cumulative sum of the log bayes factor (LBF), calculated by SuSiE(3), across every trait in the colocalization group. In some situations, the candidate variant may not be the variant with the highest LBF for a given study in the colocalization group, or, if the variant was not genotyped or imputed for that study, may be missing. This variant should not be interpreted as the causal variant for traits in the group, it is instead tagging a shared colocalizing signal in the region.
  • P-value thresholds:
    • Genome wide significance (GWS): We utilize the standard European genome-wide significant p-value threshold of 5e-8 for most calculations in the results.
    • Suggestive Significance: We have extracted and analyzed all loci above a 'suggestive significance' of 1.5e-4. As there are approximately 1 million independent loci across the genome (0.05 / 1m = 5e-8), across 3 billion base pairs. Meaning there are approximately 333 independent regions across every 1Mb, 0.05/333 ~= 1.5e-4. For the resource, both colocalization group and colocalization pair data is available. Users should be aware of their own p-value threshold for multiple testing correction.
  • Cis and Trans: For common variant QTLs, the cis window includes any finemapped loci within ±1 Mb of the SNP tagging the QTL, and trans is defined as any region outside of that window. For rare QTLs, any SNP falling directly within the gene that was assayed is considered cis (i.e. the 'gene' and the 'situated gene' match). If this is not the case, the variant is considered to be a trans signal for the assayed gene measure.

Acknowledgements

We express our sincere gratitude to the research participants and the investigators of the various studies and consortia whose data contributed to the development of the GPMap. This work would not have been possible without the altruistic contribution of hundreds of thousands of individuals to genetic research.

We specifically acknowledge the following biobanks and consortia for providing the GWAS summary statistics and functional genomic data: UK Biobank, The GTEx Consortium, FinnGen, and the NHGRI-EBI GWAS Catalog. We also thank the IEU OpenGWAS project for providing the computational infrastructure and data standardization that facilitated this large-scale integration.