Double Cut and Join Model
The double cut and join (DCJ) model is a model for genome rearrangement used to define an edit distance between genomes based on gene order and orientation, rather than nucleotide sequence. It takes the fundamental units of a genome to be synteny blocks, maximal sections of DNA conserved between genomes. It focuses on changes due to genome rearrangement operations such as inversions, translocations as well as the creation and absorption of circular intermediates.[1] [2]
A genome is described as a directed, edge labeled graph with each vertex having degree 1 or 2. Edges are labeled as synteny blocks, vertices of degree 1 represent telomeres, and vertices of degree 2 representing adjacencies between blocks. This requires that the genome consist of cycles and paths. Each component is called a chromosome. The beginning of each edge is called the tail, the end of each edge is called the head; together heads and tails are known as extremities. Vertices are described by their roles as heads and tails of blocks, for instance, in the figure, the adjacency which forms the head of marker 1 and the tail of marker 2 is labelled (h1, t2), the telomere formed by the head of 2 is (h2). A double cut and join (DCJ) operation consists of one of the following four transformations:
- (i) breaking two adjacencies (a, b) and (c, d) to create two more adjacencies, (a, c) and (b,d)
- (ii) taking an adjacency (a, b) and a telomere (c) to create a new adjacency and telomere, either as (a, c), (b) or (b,c), (a).
- (iii) taking two telomeres (a) and (b) and creating a new adjacency (a, b)
- (iv) breaking an adjacency (a, b) to create the two telomeres (a) and (b).
An edit distance, the double cut and join distance, is defined between genomes with the same number of edges and , as the minimum number of DCJ operations needed to transform into .
Mathematical Results
[edit]The DCJ distance defines a metric space. To verify this, we first note that , since no operations are needed to change G into itself, and if , , since at least one operation is needed to transform into any other genome. (A proof that the is always defined whenever and are genomes with the same edges will follow.) Note that each operation has an inverse: (i) and (ii) are their own inverses, and (iii) is the inverse of (iv). Thus . The triangle inequality holds because a series of DCJ operations transforming to followed by a series of transformations from to will transform to , so the minimal number of operations needed to transform to must be no longer than this.
To compute the DCJ distance between two genomes and with the same set of synteny blocks, we construct a bipartite multigraph known as the adjacency graph of the genomes. The vertex set of the adjacency graph is , where is the vertex set of and is the vertex set of . For and , we have if and are an extremity of the same synteny block. If and share two extremities, we add two edges between and to .
If , we see that the adjacency graph is composed entirely of paths of length 1, connecting two telomeres, and cycles of length 2, connecting two adjacencies. We can use this fact to calculate . Let be the number of synteny blocks in genomes and , be the number of cycles in their adjacency graph, and be the number of paths in their adjacency graph. Then The proof shows that each DCJ operation can decrease by no more than 1, and that if , there exists an operation decreasing . This proves that is always defined, and gives a method for its calculation. Since it is easy to count cycles and paths, can be found in linear time.[3]
References
[edit]- ^ Richard M. Friedberg; A. E. Darling; S. Yancopoulos (2008). "Genome rearrangement by the double cut and join operation. Each of these individual operations involves 2 cuts and 2 joins of the genomic DNA". Methods in Molecular Biology. 452: 385–416. doi:10.1007/978-1-60327-159-2_18. PMID 18566774.
- ^ Yancopoulos S, Attie O, Friedberg R (2005). "Efficient sorting of genomic permutations by translocation, inversion and block interchange". Bioinformatics. 21 (16): 3340–3346. doi:10.1093/bioinformatics/bti535. PMID 15951307.
- ^ YAF 2005