Basic text processing

Write code to read, store, and analyze the latest human genome assembly (found at: /common/contrib/classroom/inf503/genomes/human.txt ). At minimum, your code must contain (10pts): • A character array to store the entire human genome in a single data structure • A separate function to read the human genome file • A function to compute the number of A, C, G, or T characters in the human genome • Comments describing major code blocks and control structures A. (20pts) Read in and store the human genome. There will be multiple scaffolds (each with a separate header denoted by “>”). Concatenate the entire genome (discard headers) into a single character array data structure. Collect the following statistics (see below) as you are reading the file. Hint: you can keep running totals or store scaffold sizes / names in a separate sets of arrays • How many scaffolds were there? • What was the longest and shortest scaffold? Provide names of scaffolds and lengths. • What was the average scaffold length? B. (20pts) Write a function to assess the content of the human genome – count the total number of a given character (A, C, G, or T) in the whole genome. • What is the ‘big O’ notation of your search (linear / quadratic / cubic / etc)? • How long does it take (in seconds) to execute this function? Hint: You will need to use system time within your code to get accurate time estimates. • What was the GC content of the human genome (percent of C’s and G’s in the genome)?

Get Top-Notch Quality Essays TODAY !

Ready to join our block community of business leaders for four days of virtual sessions on driving developer happiness and boosting productivity?

Place Order