[Solved]3 75 Marks Total Jukes Cantor Model Dna Sequence Evolution Simple Site Mutates Rate Mutati Q37189878
![3. [7.5 marks total] The Jukes-Cantor model of DNA sequence evolution is simple: each site mutates at rate μ and when a mutat](https://media.cheggcdn.com/media%2F033%2F0332dd3f-003f-4b5d-ba72-d0b27e25954e%2FphpfJR0f8.png)

3. [7.5 marks total] The Jukes-Cantor model of DNA sequence evolution is simple: each site mutates at rate μ and when a mutation occurs, a new base is chosen uni formly at random from the four possible bases. {A, C. G,「. If we ignore mutations from base X to base X, the of each other. mutation rate is μ. All sites mutate independently Thus we observe mutations at a site after an exponentially distributed waiting time with rate At a mutation, choose from the 3 possible bases to mutate to with equal probability. A sequence that has evolved over time according to the Jukes-Cantor model has each base equally likely to occur at each site. Your programs should write sequences consisting of {A, C, G, T], though it may be easier internally to translate the bases to integers, {1,2,3,4) for example. Sequence D is the most recent common ancestor of se- uences E and HF t The time since E split from F is t time units. (a) [4.5 marks/ Write a method that simulates pairs of sequences that have di- verged from a recent common ancestor t time units ago. Assume that evo- lution has occurred according to the Jukes Cantor model. The distributiorn for the sequence of the most recent common ancestor is uniform over the four possible bases at each site. The method should take sequence length, timet and mutation rate p as inputs. It should return the ancestral sequence (D in the figure) and the descendant sequences (E and F in the figure). You may use methods choice, exponential and poisson from the numpy. random li- brary Simulate a pair of sequences of length 50 with μ 0.01 and t = 10, Print the resulting sequences along with the ancestral sequence. Report the number of sites at which each sequence differs from the ancestral sequence and from it:s sibling sequence (i.e., the number of sites difference between D and E, D and F and E and F (b) [3 marks] Explain why you would expect the number of mutations that occur on a tree to be Poisson distributed with parameter 2tLiu, where L is the sequence length. Simulate 1000 pairs of sibling sequences of length 1000 with 0.01 and t – 25. For each simulated pair, count the number of sites at which they differ from each other. Report the mean and variance of the number of differing sites. Is this number Poisson distributed with parameter 2tL Explain why or why not. 3 Show transcribed image text 3. [7.5 marks total] The Jukes-Cantor model of DNA sequence evolution is simple: each site mutates at rate μ and when a mutation occurs, a new base is chosen uni formly at random from the four possible bases. {A, C. G,「. If we ignore mutations from base X to base X, the of each other. mutation rate is μ. All sites mutate independently Thus we observe mutations at a site after an exponentially distributed waiting time with rate At a mutation, choose from the 3 possible bases to mutate to with equal probability. A sequence that has evolved over time according to the Jukes-Cantor model has each base equally likely to occur at each site. Your programs should write sequences consisting of {A, C, G, T], though it may be easier internally to translate the bases to integers, {1,2,3,4) for example. Sequence D is the most recent common ancestor of se- uences E and HF t The time since E split from F is t time units.
(a) [4.5 marks/ Write a method that simulates pairs of sequences that have di- verged from a recent common ancestor t time units ago. Assume that evo- lution has occurred according to the Jukes Cantor model. The distributiorn for the sequence of the most recent common ancestor is uniform over the four possible bases at each site. The method should take sequence length, timet and mutation rate p as inputs. It should return the ancestral sequence (D in the figure) and the descendant sequences (E and F in the figure). You may use methods choice, exponential and poisson from the numpy. random li- brary Simulate a pair of sequences of length 50 with μ 0.01 and t = 10, Print the resulting sequences along with the ancestral sequence. Report the number of sites at which each sequence differs from the ancestral sequence and from it:s sibling sequence (i.e., the number of sites difference between D and E, D and F and E and F (b) [3 marks] Explain why you would expect the number of mutations that occur on a tree to be Poisson distributed with parameter 2tLiu, where L is the sequence length. Simulate 1000 pairs of sibling sequences of length 1000 with 0.01 and t – 25. For each simulated pair, count the number of sites at which they differ from each other. Report the mean and variance of the number of differing sites. Is this number Poisson distributed with parameter 2tL Explain why or why not. 3
Expert Answer
Answer to 3. [7.5 marks total] The Jukes-Cantor model of DNA sequence evolution is simple: each site mutates at rate μ and when a… . . .
OR

