In this chapter, you will need the following:
- In Exercise 5:
You need a long English text. We give here some .txt files of Mark Twain's novels : hfinn10.txt ,
You will also find here a .txt file containing some merged Matlab files that are useful to reformat the texts (remove punctuation, etc.) and to analyze the n-tuples.
- In Exercise 6:
You need statistics of DNA sequences. The translation code divides the DNA at every third base and uses the
64 possible triplets in each piece to code either for one of the amino
acids or for the end of the protein, called the stop. The table is
available at http://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=9606
There is a vast library of DNA sequences available online from either `Genbank': http://www.ncbi.nlm.nih.gov/Genbank/
or `EMBL': http://www.ebi.ac.uk/embl/.