Approaching Rosalind Problems
I’ve been working through the Rosalind problems over the last week. They are an interesting set of problems exploring the space between Biology and Computation. I decided to take notes while doing one problem. This blog post is a basic outline of my process for problem solving.
Step 1 - Read The Problem And Background
Always read the problem. I’ve gotten a number of problems wrong in programming challenges because I didn’t read the instructions thoroughly. Rosalind provides some background info which I’ve found helpful as well. It is especially good for learning the relevant biology vocabulary. For this blog post I did the problem Mendel’s First Law.
Step 2 - Verify That You Can Read The Data
All of the Rosalind problems supply a data file and I’ve fallen into a pattern of starting out by reading in the file to ensure I know what’s going on. For this problem the provided file is 2 2 2
. Since all the entries are the same it would be hard to verify I was reading and interpreting the file correctly. Because of this I used 1 2 3
as my example file. Then I wrote the code to put each of these values in a well named variable.
Step 3 - Ponder And Devise A Strategy
My first strategy was to calculate how many ways there were to have the at least one dominant allele. Here’s some basic thoughts on that: From the background given I know that if one parent has the dominant allele the the resulting organism must have the dominant allele. Probability of one parent have the dominant allele is
Then I remembered that since there are fewer ways get two recessive alleles I should calculate that instead and then subtract the result from one. (Basic law of probability).
Math:
Probability of two recessive parents mating:
Probability of two heterozygous parents mating:
Punnet square for heterozygous mating
Only 1/4 of those will be recessive so we take the probability of two heterozygous mating and multiply it by 1/4.
Finally the probability of a heterozygous and a recessive organism mating:
Punnet square for heterozygous & recessive mating
In this case half the offspring have two recessive alleles so multiply the probability by 1/2.
Step 4 - Write The Code
First thing I realized was that we need floats so I had to change the import code slightly. I changed the to_i to a to_f.
All my calculations require the total so I do that once:
Now calculate the probability of two recessive organisms mating:
Now heterozygous organisms mating:
Now the hetero + recessive matings:
Now I incorporate the fractions from the punnet squares:
This is the probability of a recessive organism. The problem asked for the probability of a dominant organism so I take 1 - recessive_total
All together:
When all this code runs it gives 0.78333
which is the expected result.
Step 5 - Download The Real Data Set
Now I downloaded the real dataset, ran it through my code, and pasted the result into the text box.
Step 6 - Celebrate
I recommend celebrating with cookies.
Post-script
This problem can also be solved by simulating all possible matings and then calculating the percentage that have the dominant allele.