General features and properties of insertion sequence elements


Previous ...

IS Identification

The families in ISfinder are defined using an initial manual BLAST analysis often followed by reiterative BLAST analyses with the primary transposase sequence of representative elements used as a query in a BLASTP (Altschul, et al., 1990) search of microbial genomes. Potential full-length Tpases are retained and that with the lowest score then used as a query in a second BLASTP search. This is continued until no new potential candidates are detected. The ClustalW multiple alignment algorithm (Thompson, et al., 1994) is then used and the results displayed using the Jalview alignment editor (Clamp, et al., 2004) for assessment. The corresponding DNA together with 1000 base pairs up- and down-stream is then extracted and examined manually for the IRs or other typical features such as secondary structures and flanking DRs. This, together with comparison of the DNA extremities of various elements, allows identification of both ends of the collected elements. In cases where more than a single IS copy is identified, BLASTN can be used to define the IS ends. Where only a single copy is found, the ends can often be defined by identifying and comparing with empty sites.

In a second step, we use the Markov Cluster Algorithm (MCL) (http://micans.org/mcl/) (Van Dongen, 2000, Enright, et al., 2002) to weigh the relationships between clusters of ISs and to validate prior ISfinder classification of ISs into families and subgroups (Siguier, et al., 2009). This is explained in detail in Siguier, et al. (2009) and is based on the parameters used in the MCL (Fig 1.5.1) in addition to characteristics such as the specificity of target site duplications, the detailed sequence of the ends, genetic organisation. It should be understood that the distinction between families and subgroups can evolve as the number of ISs in the database increases.

Several semi-automatic IS annotation pipelines are now available. The interested reader is directed to three of these: ISsaga (Varani, et al., 2011) which is now integrated into the ISfinder platform (Siguier, et al., 2006), ISScan (Wagner, et al., 2007) and Oasis (Robinson, et al., 2012). At present, de novo prediction of ISs is not efficient and these pipelines all employ the ISfinder database to function. While all three pipelines permit identification of IS fragments as well as full length ISs, a certain level of manual assessment is essential.

    References :
  • Altschul SF, Gish W, Miller W, Myers EW & Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403-410.
  • Clamp M, Cuff J, Searle SM & Barton GJ (2004) The Jalview Java alignment editor. Bioinformatics 20: 426-427.
  • Enright AJ, Van Dongen S & Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30: 1575-1584.
  • Robinson DG, Lee MC & Marx CJ (2012) OASIS: an automated program for global investigation of bacterial and archaeal insertion sequences. Nucleic Acids Res 40: e174.
  • Siguier P, Gagnevin L & Chandler M (2009) The new IS1595 family, its relation to IS1 and the frontier between insertion sequences and transposons. Res Microbiol 160: 232-241.
  • Siguier P, Perochon J, Lestrade L, Mahillon J & Chandler M (2006) ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res 34: D32-36.
  • Thompson JD, Higgins DG & Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 4673-4680.
  • Van Dongen S (2000) A cluster algorithm for graphs. Technical Report INS-R0010, National Research Institute for Mathematics and Computer Science in the Netherlands. Amsterdam.

  • Varani A, Siguier P, Gourbeyre E, Charneau V & Chandler M (2011) ISsaga is an ensemble of web-based methods for high throughput identification and semi-automatic annotation of insertion sequences in prokaryotic genomes. Genome Biol 12: R30.
  • Wagner A, Lewis C & Bichsel M (2007) A survey of bacterial insertion sequences using IScan. Nucleic Acids Research 35: 5284-5293.