Pattern search approach

ABC Transporters Analysis

General approach

NBDs displaying the highest degree of sequence conservation were identified first . We then searched for genes that code for proteins with membrane spanning domains (MSDs) and proteins with solute-binding motifs (SBPs). Throughout the analysis, positional information was monitored and used in the decision process.

Nucleotide-binding domains

ATP-binding domains display a high degree of sequence similarity, therefore conserved motifs can be deduced using the PROSITE web site ( http://www.expasy.ch/prosite/ ). The first domain is the so-called Walker A motif, also referred as the ATP/GRP-binding site motif A (PROSITE:PDOC00017). The second domain overlaps the "Signature" motif or C motif (PROSITE: PDOC00185) and the walker B motifs. These signatures were used to initially characterize the NBDs. We used the two first motifs to scan all proteins in our databases with the pattern-search program. The mkdom and XDOM programs were used to summarize the domain organizations of all NBD proteins.

Membrane-spanning domains

After NBDs were identified, we searched for genes that could code MSDs. Four or more TMSs (transmembrane spanning sequences) in proteins with NBDs or in gene products from NBD-adjacent genes became candidate MSD components. Some MSD-containing proteins also contain a conserved EAA motif located on a cytoplasmic loop (PROSITE: PDOC00364). This conserved motif was searched for in all gene translations.

Solute-binding proteins

The following PROSITE patterns were employed with a local pattern search tool to search for SBPs: 1) for gram negative signatures, PDOC00796, PDOC00798 and PDOC00799 (Families 1, 3 and 5 signatures), 2) for gram-positive signatures, PDOC00013 Prokaryotic membrane lipoprotein lipid attachment site. In order to facilitate comparisons with systems from other genomes, the classification proposed by Linton and Higgins for ABC transporters in E.coli was followed (Mol Microbiol. 28(1):5-13, 1998).

Methods

The STDGEN and ORALGEN databases was used as primary source of protein and DNA sequences. Similarity searches were performed with a local blast tool as well as Psi-blast from NCBI. The modular arrangement of protein domains was analyzed with XDOM tool (Gouzy et al., 1997). The number and location of transmembrane regions were predicted with PHD htm program. And the signal peptide was predicted with the pattern matching program SIGNALP. The ClustalW (Thompon et al., 1994) and MultAlin program (Corpet et al., 1988) were utilized for multiple alignments.