Bioinformatics: UCSC Genome Browser, NCBI, Ensembl, String, Clustal omega

Bioinformatics

UCSC GENOME BROWSER

Introduction:

The University of California Santa Cruz (UCSC) Genome Browser (genome.ucsc.edu) is a popular Web-based tool for quickly displaying a requested portion of a genome at any scale, accompanied by a series of aligned annotation “tracks”.

• Uses

The annotations—generated by the UCSC Genome Bioinformatics Group and external collaborators—display gene predictions, mRNA and expressed sequence tag alignments, simple nucleotide polymorphisms, expression and regulatory data, phenotype and variation data, and pairwise and multiple-species comparative genomics data. All information relevant to a region is presented in one window, facilitating biological analysis and interpretation.

Tools of UCSC genome browser include:

• Genome Browser—graphical view of genes, gene structure, and annotation tracks.

• BLAT—aligning DNA sequence with a reference genomic assembly.

• Custom Tracks—displaying your data in conjunction with existing browser data.

• Table Browser—bulk data manipulation and downloads, intersections and joins between data sets.

• Session—sharing your data with others.

• PCR—getting DNA bracketed by a pair of primers.

NCBI:

• NCBI is the abbreviation of National Center Of Biotechnology Information.

Established in 1988 as a national resource for molecular biology information, NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information.

1. Entrez

Entrez is the search and retrieval tool for all of NCBI. It is French for enter. Entrez allows you to search all of the NCBI databases, including PubMed, nucleotide, protein, structure, etc.

2. NCBI Gene

Gene as the center (loci) of NCBI databases Links to each key NCBI resource.

3. GenBank (NCBI Data Model)

GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.

4. BLAST

BLAST (Basic Local Alignment Search Tool) is a set of similarity search programs, designed to explore all of the available sequence databases regardless of whether the query is protein or DNA (or soon RNA).

• Blast Types

1. blastn - for nucleotide - nucleotide comparisons
2. blastp - for protein - protein comparisons
3. blastx - compares the nucleotide sequence "against nr translated into hypothetical
proteins
4. tblastn - compares the protein sequence" against the nr nucleotide database translated into hypothetical proteins in all six reading frames
5. tblastx - compares the nucleotide sequence" translated in all six reading frames against the nr nucleotide translated in all six reading frames.

BLAST:

BLAST stands for Basic Local Alignment Search Tool. This searches for similarity between a query sequence and the sequences deposited in National Center for Biotechnology Information (NCBI) website. The putative genes in the query sequence can be detected based on the sequence homology of the deposited sequences. BLAST is popular as a bioinformatics tool due to its ability to identify regions of local similarity between two sequences quickly. BLAST calculates an expectation value, which estimates the number of matches between two sequences. It uses the local alignment of sequence.

BLAT:

The BLAST-Like Alignment Tool (BLAT) is used to find genomic sequences that match a protein or DNA sequence submitted by the user. BLAT is typically used for searching similar sequences within the same or closely related species

Uses:

BLAT’s speed is one of its main advantages. It is useful for quickly finding the genome location of a genomic, mRNA or protein sequence.

What are the differences between BLAT and BLAST?

BLAT is an alignment tool like BLAST, but it is structured differently. On DNA, BLAT works by keeping an index of an entire genome in memory. Thus, the target database of BLAT is not a set of GenBank sequences, but instead an index derived from the assembly of the entire genome. By default, the index consists of all non-overlapping 11-mers except for those heavily involved in repeats, and it uses less than a gigabyte of RAM. This smaller size means that BLAT is far more easily mirrored than BLAST. Blat of DNA is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more. It may miss more divergent or shorter sequence alignments.

From a practical standpoint, BLAT has several advantages over BLAST:

• Speed (no queues, response in seconds) at the price of lesser homology depth

• The ability to submit a long list of simultaneous queries in FASTA format

• Five convenient output sort options

• A direct link into the UCSC browser

• Alignment block details in natural genomic order

• An option to launch the alignment later as part of a custom track.

FASTA

• What is FASTA

FASTA is another sequence alignment tool which is used to search similarities between sequences of DNA and proteins. The query sequence is broken down into sequence patterns or words known as ktuples and the target sequences are searched for these k-tuples in order to find the similarities between the two. FASTA is a fine tool for similarity searches. When finding sequence similarities, the best way to conduct your search is to first perform a BLAST search and then go to FASTA. The FASTA file format is widely used as the input method in other sequence alignment tools like BLAST.

Main Difference – BLAST vs FASTA

BLAST and FASTA are two similarity searching programs that identify homologous DNA sequences and proteins based on the excess sequence similarity. The excess similarity between two DNA or amino acid sequences arises due to the common ancestry-homology. The most effective similarity searching is the comparing of amino acid sequence of proteins rather than DNA sequences. Both BLAST and FASTA use a scoring strategy in order to compare two sequences and provide highly accurate statistical estimates about the similarities between sequences. The main difference between BLAST and FASTA is that BLAST is mostly involved in finding of ungapped, locally optimal sequence alignments whereas FASTA is involved in finding similarities between less similar sequences.

Ensembl Genome Browser:

INTRODUCTION

The Ensembl project creates evidence-based annotation of genome sequences and integrates these data with other biological information.

Ensembl was established in 1999, towards the end of the Human Genome Project, in response to a recognition that understanding the genetic code of organisms is as important as reading it.

The project provides an expanding wealth of information for a diverse list of species, including:

• Intron and exon structure for protein-coding and non-coding genes

• Genomic variations and somatic mutations and their consequences on genes and genotypes in populations and individuals

• Cross-species gene trees and whole genome alignments

• Functional genomic data - including regulatory region annotation.

STRING

A database of known and predicted protein-protein interactions. The database contains information from numerous sources, including experimental repositories, computational prediction methods and public text collections. STRING is regularly updated and gives a comprehensive view on protein-protein interactions currently available.

Uses:

STRING allows for the searching of one or multiple proteins at a time with the ability to additionally limit the search to the desired species.

Clustal Omega

Clustal Omega is a multiple sequence alignment program for aligning three or more sequences together in a computationally efficient and accurate manner. It produces biologically meaningful multiple sequence alignments of divergent sequences. Evolutionary relationships can be seen via viewing Cladograms or Phylograms.

Uses:

Clustal Omega is a used package for carrying out multiple sequence alignment.

Clustal W

Introduction:

Clustal w is a tool for aligning multiple protein or nucleotide sequences. The alignment is achieved via three steps: pairwise alignment, guide-tree generation and progressive alignment. ClustalW-MPI is a distributed and parallel implementation of ClustalW.

Uses:

• It is a general purpose multiple alignment program for DNA or proteins.

• Calculate all possible pairwise alignments,

• Record the score for each pair.

• Calculate a guide tree based on the pairwise distances

SWISS-MODEL

Introduction:

Swiss model is a server for automated comparative modeling of three-dimensional (3D) protein structures. It pioneered the field of automated modeling starting in 1993 and is the most widely-used free web-based automated modeling facility ttoday

Uses

SWISS-MODEL is a web-based integrated service dedicated to protein structure homology modelling. It guides the user in building protein homology models at different levels of complexity.

Polymerase chain reaction (PCR)

Introduction:

Polymerase chain reaction (PCR) is a very versatile gene amplification method that has brought a tremendous progress in molecular biology and genetics. It is an in vitro method of amplifying a desired DNA sequence of any origin hundreds of million times in hours. Typically, the goal of PCR is to make enough of the target DNA region that it can be analyzed or used in some other way. For instance, DNA amplified by PCR may be sent for sequencing, visualized by gel electrophoresis, or cloned into a plasmid for further experiments. PCR is used in many areas of biology and medicine, including molecular biology research, medical diagnostics, and even some branches of ecology.

1. Components of PCR:

The PCR reaction requires the following components:

1. DNA Template : The double stranded DNA (dsDNA) of interest, separated from the sample.

2. DNA Polymerase : Usually a thermostable Taq polymerase that does not rapidly denature at high temperatures (98°), and can function at a temperature optimum of about 70°C.

3. Oligonucleotide primers : Short pieces of single stranded DNA (often 20-30 base pairs) which are complementary to the 3’ ends of the sense and anti-sense strands of the target sequence.

4. Deoxynucleotide triphosphates : Single units of the bases A, T, G, and C (dATP, dTTP, dGTP, dCTP) provide the energy for polymerization and the building blocks for DNA synthesis.

5. Buffer system : Includes magnesium and potassium to provide the optimal conditions for DNA denaturation and renaturation; also important for polymerase activity, stability and fidelity.

2. PCR procedure

All the PCR components are mixed together and are taken through series of 3 major cyclic reactions conducted in an automated, self-contained thermocycler machine.

1. Denaturation :

This step involves heating the reaction mixture to 94°C for 15-30 seconds. During this, the double stranded DNA is denatured to single strands due to breakage in weak hydrogen bonds.

2. Annealing :

The reaction temperature is rapidly lowered to 54-60°C for 20-40 seconds. This allows the primers to bind (anneal) to their complementary sequence in the template DNA.

3. Elongation :

Also known at extension, this step usually occurs at 72-80°C (most commonly 72°C). In this step, the polymerase enzyme sequentially adds bases to the 3′ each primer, extending the DNA sequence in the 5′ to 3′ direction. Under optimal conditions, DNA polymerase will add about 1,000 bp/minute.

mHñ Asif,. About Author

Medical Laboratory Technology

Bioinformatics: UCSC Genome Browser, NCBI, Ensembl, String, Clustal omega

Post a Comment

Contact form