Versatile Gene-based Association Study - 2 version 2

File input

Upload GWAS results:


Use SNPs from

Select Sub-population from

Use Gene definition from


LD calculation assumption for X-chromosome analysis

The default below is to use all SNPs in gene definition; if required change to <100% to use a subset of SNPs or select "Best-SNP" test:

Top-x% test with top percent

or Best-SNP test

VEGAS2Pathway analysis: please check the box if you want to perform pathway analysis

VEGAS2Pathway analysis

Email and submit

Email (results to be sent):

We had a hardware problem on 9th Nov 2018 and all running and queued vegas2 online jobs were deleted. Apologies for any inconvenience caused. Please resend any jobs as required. Although the online server has 48 cores available for vegas2 jobs, the server is experiencing very high demand currently and the queue to run new jobs can be long. Running time scales with the significance of your results and wait times will be long if your input p-values are very small (e.g. due to biobank scale data).

Where possible please run the standalone version of vegas2 on your own computer – you will likely receive results faster.

  • This is the VEGAS2 web platform. Here, user can perfome the gene-based and pathway-based analyses on GWAS summary data using VEGAS and VEGAS2Pathway approaches respectively. It is publically available for non-commercial use.
  • The online version here uses 1000G populations to estimate patterns of linkage disequlibrium for each gene. The offline version is available below to download.
  • Description of VEGAS, VEGAS2 and VEGAS2Pathway can be found in publications VEGAS, VEGAS2 and "">VEGAS2Pathway respectively.

Method in brief
  • After reading the GWAS summary file with rs IDs and the association p-values and provided options, VEGAS2 first assign variants to genes based on the hg19 genomic location then using simulation approach (refer VEGAS and VEGAS2 publications) calculates the gene-based empirical association p-values. If a gene contain more than 1000 SNPs, then before simulation VEGAS2 successively prunes the list of variants with r2 criteria of 0.99, 0.90, 0.70 and 0.50, respectively. After each pruning interval, VEGAS2 checks the number of pruned SNPs. If the number of pruned SNPs is less than 1000 then VEGAS2 uses the pruned SNPs from that interval to perform gene-based analysis; otherwise, it iteratively applies an increasingly stringent r2 criteria on all SNPs in the gene. After applying a pruning criteria of r2 = 0.50, it uses all pruned SNPs for analysis irrespective of the number.
  • For ~6-8 million variants gene-based analysis might take ~18-72 hours to complete, provided availability of computing nodes. Gene-based analysis of schizophrenia meta-analysis performed under PGC in which many large genes are implicated takes around ~130-150 hours.
  • Once the gene-based analysis is complete, an email will be sent to provided email ID with a link to download the gene-based result file.
  • If user have chosen an option to perform the pathway analysis, then obtained gene-based result will be used to perform VEGAS2pathway analysis, which will take further ~18-24 hours to complete. Once the analysis is finished an email will be sent with the link to download the pathway-based result

Input format
  • Upload association results as a text file with two columns (tab or space-delimited, no headers) - SNP rs-number and p-value.
  • See a sample input file.
  • User should make sure that input file does not have header and any NA values prior to uploading.
  • Maximum upload size 200 megabytes.

Output file
  • The gene-based output is a plain-text file with the columns: Chromosome, Gene, Number of SNPs, Number of simulations, Start position, Stop position, Gene-based test statistic, P-value, Top-SNP, Top-SNP pvalue.
  • The pathway-based output is also a plain-text file with the columns: GO_ID, Description, Pathway length, Pathway length after first cut, Pathwal length after second cut, Final pathway length, Combined p-value, Emipirical p-value, Genelist.

Offline version

Frequently Asked Questions
  • What are the major differences between version 2 and version 1 of the software VEGAS2?
  • In version 2 of the VEGAS2 software we incorporated gene-set enrichment approach. Refer VEGAS2Pathway manuscript below.

    In addition VEGAS2 version 2 is corrected for the bug in the VEGAS and VEGAS2 version 1 scripts reported by Dr. Julian Hecker and others from the University of Bonn, Germany. The bug found by Julian affects the results from the top percentage test (e.g. top 10% of SNPs, the bug is only present if the % is <100) but not the default all snps test. For script related details refer to subroutine "mvsimstoptenr" in VEGAS2 version 2 script. In web-based application this bug was corrected on 28/01/2016, hence all analyses performed after 28/01/2016 using VEGAS2 online application should provide valid results. This bug was not corrected in the offline applications of VEGAS2 until January 2017 and so please download the current version and re-run any offline runs done with the top % test using VEGAS2 prior to January 2017. The same bug existed in the top % test in the original VEGAS (VEGAS1) so any top % test runs using VEGAS1 should be rerun using either the offline version of VEGAS2 or the web application of VEGAS2.

  • What are the ongoing developments on version 2?
  • We are planning to incorporate gene-based test considering only known or predicted functional variants.

  • How long does vegas2v2 takes to run a gene-based and pathway-based tests respectively?
  • The time required for gene-based run depends on genesize (number of SNPs it contains) and the number of simulations to perform. Typically it takes around 24 hours to run gene-based test on ~2000 genes. The web application that run individual chromosome on indivial CPUs or server it takes around 24-48 hours to run gene-based analysis across whole genome.

    Typically the pathway analysis of ~9500 gene-sets takes ~24 hours.

    The time required also depends on how busy the job queue is. At times of heavy use your job will be queued and may take several days to reach the front of the queue.

  • Can I run VEGAS2 gene-based tests on mixed ancestry samples?
  • No. The gene-based run is sensitive to the LD structure hence we recommond users to use appropriate population to compute pairwise LD between variants.

  • What happens to SNPs that are not in reference panel?
  • Those SNPs are ignored for gene-based analysis. User should make sure the SNP ids are same in provide GWAS summary file and plink formated genotype files.

  • Can I specify my own gene-pathway annotations to perform pathway analysis?

    Yes, you can use the standalone package to run pathway analysis using your own gene-pathway annotations.

  • Our paper describing VEGAS2Pathway approach is:
  • Mishra, A., Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO), the Colorectal Cancer Family Registry (CCFR), & MacGregor, S. (2017) A Novel Approach for Pathway Analysis of GWAS Data Highlights Role of BMP Signaling and Muscle Cell Differentiation in Colorectal Cancer Susceptibility, Twin Research and Human Genetics, 20(1), 1-9. [doi]
  • Our paper describing VEGAS2 is:
  • Mishra A., Macgregor S.(2015). VEGAS2: Software for More Flexible Gene-Based Testing. Twin Research and Human Genetics,86-91. [doi]
    Please cite this paper if you have used VEGAS2 in your research. We would like to hear from you!
  • Our paper describing VEGAS is:
  • Liu JZ, McRae AF, Nyholt DR, Medland SE, Wray NR, Brown KM, AMFS Investigators, Hayward NK, Montgomery GW, Visscher PM, Martin NG, MacGregor S. (2010). A Versatile Gene-Based Test for Genome-wide Association Studies. American Journal of Human Genetics, 87. [doi] [pdf]

Aniket Mishra (aniket dot mishra at qimrberghofer dot edu dot au) and Stuart MacGregor (stuart dot macgregor at qimrberghofer dot edu dot au), Queensland Statistical Genetics, QIMR Berghofer Medical Research Institute.