In this chapter, you will need the following:


- In Exercise 5:
You need a long English text. We give here some .txt files of Mark Twain's novels  :   hfinn10.txt ,   sawy210.txt ,   sawyr10.txt ,   yanke11.txt ,   lmiss10.txt ,   puddn10.txt ,   sawy310.txt   and   tramp11.txt . You will also find here a .txt file containing some merged Matlab files that are useful to reformat the texts (remove punctuation, etc.) and to analyze the n-tuples.

- In Exercise 6:
You need statistics of DNA sequences. The translation code divides the DNA at every third base and uses the 64 possible triplets in each piece to code either for one of the amino acids or for the end of the protein, called the stop. The table is available at http://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=9606
There is a vast library of DNA sequences available online from either  `Genbank':  http://www.ncbi.nlm.nih.gov/Genbank/  
or `EMBL': http://www.ebi.ac.uk/embl/.