BLAST comparision of distantly related Haemophilus ducreyi and E.coli genomes

In the BLAST comparisons presented here each protein in the Haemophilus ducreyi proteome is used as a query against the entire E.coli proteome and vice versa. For each protein the best match to a protein in the other proteome is recorded. "Best match" depends on the E-value, which is a measure of the probablility that the observed level of similarity between two compared proteins could be due to chance alone. Thus, an E-value approaching 0 means there is "zero probability" the protein's goodness-of-match to another protein can be attributed to chance. Examples of matches with excellent and marginal E-values are here.

Unique genes

Unique genes have no significant similarity to genes in the compared genome, as determined by the E-value. If the E-value of the best hit is greater than 0.0001, the query protein is considered unique.  Comparisons depicting unique genes are significant because they reveal the genes that are likely to be responsible for the biology, virulence, and pathogenicity unique to the bacterium (Kalman et al., Nature Genetics 21:385-389, 1999). In addition, this analysis suggests how closely the two genomes are phylogenetically related; a low proportion of unique genes suggests a close phylogenetic relationship. In rare instances, however, homologs can produce blast E-values less than 0.0001.

We have tabulated the unique proteins in each of the two Haemophilus proteomes:

Orthologous gene maps

W-H Li in his book Molecular Evolution (Sinauer Associates, Inc. Sunderland, Massachusetts) gives a succinct definition of orthologous and paralogous genes: "Two genes are said to be paralogous if they are derived from a duplication event, but orthologous if they are derived from a speciation event." Further details here.

Determining orthology is also significant in assessing the relationship between two genomes. Revealing which orthologous regions are conserved throughout evolution suggests the significance of those regions to the survival of the bacterium (Siefert et al., J Mol Evol 45:467-472, 1997). This type of analysis helps to resolve what sort of changes have occurred in one genome relative to the other throughout evolution, and also suggests the phylogenetic relationship between the bacteria. For example, a high proportion of orthologs suggests a close phylogenetic relationship (Watanabe et al., J Mol Evol 44(Suppl 1):S57-S64, 1996).

We have tabulated the common proteins in both of the two Haemophilus proteomes:

Distributon of unique and orthologous (common) genes based on the COG functional categories