zgtangqian.com

Evaluating Genome Assembly Quality with QUAST: A Comprehensive Guide

Written on

QUAST, short for QUality Assessment Tool, is an essential resource for evaluating the quality of genome assemblies. As various assembly algorithms have been developed, it is crucial to choose the most suitable one for different scenarios to achieve optimal results. While contiguous genomes may not always be attainable, it is possible to derive segments from reference genomes using existing assembly methods, highlighting the importance of assembly quality assessments. These evaluations guide researchers in selecting the right assemblers based on their specific needs.

In this article, we will explore how to determine the quality of genome assemblies using QUAST, one of the leading assessment tools in the field.

What is QUAST?

QUAST allows users to evaluate assemblies with or without reference genomes and provides comprehensive reports, tables, and visualizations illustrating various assembly characteristics.

Downloading QUAST

To obtain QUAST, visit its official site and click the DOWNLOAD button. You will be redirected to a SOURCEFORGE page where you can download the latest version (as of this writing, it is quast-5.0.2). The pre-compiled binaries can be extracted and run immediately.

tar -xf quast-5.0.2.tar.gz cd quast-5.0.2 quast.py

After executing quast.py, you will see the following output:

QUAST: Quality Assessment Tool for Genome Assemblies Version: 5.0.2

Usage: python quast.py [options] <files_with_contigs>

Options include: - -o --output-dir <dirname>: Directory for saving result files [default: quast_results/results_<datetime>] - -r <filename>: Reference genome file - -g --features [type:]<filename>: File with genomic feature coordinates (GFF, BED, NCBI or TXT) - -m --min-contig <int>: Minimum contig length threshold [default: 500] - -t --threads <int>: Maximum number of threads [default: 25% of CPUs]

These are basic options; for a complete list, use --help. The online QUAST manual is available at http://quast.sf.net/manual.

Once QUAST is confirmed to be functioning correctly, we can proceed to assess some assemblies.

Obtaining an Example Assembly

We will utilize a sample dataset from the Flye assembler, which includes reads from an E. coli genome (Escherichia coli str. K-12 substr. MG1655, NCBI accession number CP009685). This dataset contains PacBio reads.

To download the dataset, use the following command: wget https://zenodo.org/record/1172816/files/E.coli_PacBio_40x.fasta

Next, we will assemble this dataset using the Flye assembler: flye --pacbio-raw E.coli_PacBio_40x.fasta --out-dir my_assembly --threads 8

The final assembly's contigs can be found in assembly.fasta. Let's check the quality of this assembly using QUAST.

Using QUAST

To run QUAST, provide the contigs file from the final assembly along with the reference genome: quast.py my_assembly/assembly.fasta -r ref.fasta -o quastResult

The report can be viewed in the report.html file located in the output folder.

QUAST report for Flye assembly of E. coli dataset

You can also compare multiple assemblies (e.g., assembly1.fasta and assembly2.fasta) and assign labels for each: quast.py assembly1.fasta assembly2.fasta -l label1,label2 -r ref.fasta -o quastResult

QUAST report for two assemblies

Common evaluation metrics include: - Genome fraction - Largest alignment - NGA50 - LGA50 - Number of misassemblies - Number of contigs

QUAST provides explanations for these metrics; hovering over each will display a popup with detailed descriptions.

You can also evaluate your assembly without a reference genome: quast.py my_assembly/assembly.fasta -o quastResult

The results will provide statistics such as: - Number of contigs - Largest contig - Total length - N50 - L50

QUAST report for Flye assembly of E. coli dataset without reference

Icarus Contig Browser

Icarus is a visualization tool integrated within QUAST for analyzing assemblies.

Icarus contig browser

This tool allows you to see how closely your assembly aligns with the reference genome.

MetaQUAST: QUAST for Metagenomics Assemblies

QUAST also offers MetaQUAST, which is designed for assessing metagenomic assemblies. Users can compare multiple assemblies simultaneously and include several reference genomes.

To execute MetaQUAST, use the following command: metaquast.py meta.contigs1.fasta meta.contigs2.fasta -l label1,label2 -R References/ -t 8 -o metaquastResult

Similar to QUAST, labels can be assigned to each assembly for clarity in the final report. You can also specify a folder containing all reference genomes.

MetaQUAST report for three assemblies with multiple references

Final Thoughts

I hope this article serves as a valuable resource for understanding how to utilize quality assessment tools for genome assemblies. Feel free to incorporate these tools into your research and projects, as they are freely accessible.

Take care and stay safe!

For further reading, check out my previous articles on bioinformatics and DNA analysis.

<div class="link-block">

<div>

<div>

<h2>Bioinformatics and Computational Biology— What? Why? How?</h2>

<div><h3>A gentle introduction to bioinformatics and computational biology</h3></div>

<div><p>medium.com</p></div>

</div>

<div>

</div>

</div>

</div>

<div class="link-block">

<div>

<div>

<h2>A Dummies’ Intro to Bioinformatics</h2>

<div><h3>Bioinformatics is gaining traction in today’s scientific landscape.</h3></div>

<div><p>towardsdatascience.com</p></div>

</div>

<div>

</div>

</div>

</div>

<div class="link-block">

<div>

<div>

<h2>DNA Sequence Data Analysis</h2>

<div><h3>An introductory guide to DNA sequence data analysis.</h3></div>

<div><p>medium.com</p></div>

</div>

<div>

</div>

</div>

</div>

<div class="link-block">

<div>

<div>

<h2>Genome Assembly — The Holy Grail of Genome Analysis</h2>

<div><h3>Exploring the assembly of the 2019 novel coronavirus genome.</h3></div>

<div><p>towardsdatascience.com</p></div>

</div>

<div>

</div>

</div>

</div>

<div class="link-block">

<div>

<div>

<h2>A Simple Introduction to Read Simulators</h2>

<div><h3>An overview of read simulation tools and their applications.</h3></div>

<div><p>medium.com</p></div>

</div>

<div>

</div>

</div>

</div>

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

A Journey Through Imagination: Insights from 'The Soulburn Talisman'

Discover how 'The Soulburn Talisman' intertwines adventure and imagination, guiding readers through life's challenges with a gentle touch.

Effective Solutions When USPS Says Your Package Is Delivered

Learn what to do if USPS marks your package as delivered but you haven't received it. Tips for customers and business owners alike.

Mastering Experimental Design in Data Science: A Comprehensive Guide

Explore the essentials of experimental design in data science, ensuring effective problem-solving and accurate data analysis.

The Wholistic Approach: Understanding Health in Its Fullness

Exploring the concept of wholeness in health, emphasizing a holistic view of our lives and the interconnectedness of all aspects of well-being.

Finding Inspiration in the Little Things: Ducks and Bees

Discover how small inspirations, from adorable baby ducks to buzzing bees, can motivate and uplift us in our daily lives.

# Unmasking Spam Emails: A Cautionary Overview of Scams

A detailed examination of spam emails and how to protect yourself from scams, featuring insights and examples.

The Hidden Dangers of Pride: Embracing Humility for Success

Explore how pride can hinder success and relationships, and discover the importance of humility for personal growth.

Maximizing Productivity Through the 80/20 Principle

Discover how the 80/20 rule can enhance your productivity and efficiency in various aspects of life.