N = number of sites in original sequence counts[r] = number of sites with r read starts in original sequence for each site 1...N x = random number between 0 and 1 (uniform distribution) if x < counts[0] / N randomized_counts[site] = 0 else if x < (counts[0] + counts[1]) / N randomized_counts[site] = 1 else if x < (counts[0] + counts[1] + counts[2]) / N randomized_counts[site] = 2 else randomized_counts[site] = 3This randomization tends to eliminate the clustering of read starts due to copy number variation. Note, however, that we are still preserving the distribution of read start counts. As a result, this approach is expected to be more conservative than just randomly locating read starts across the sequence. The reason for doing things in this way is to allow for the fact that factors other than CNVs, such as library amplification, can also cause clustering of read starts at a particular site.
The last column is the ratio of the average number of segments found for the current row's S, to the average number of segments found with the previous row's S. Note that the averaging is only important for the simulation table, since you will only need to run the algorithm once on the real data. If there is no value to print (as in the last column for the first row, or for cases where there are no D-segments and thus no minimum and maximum scores), print -1. See the template file for other formatting details. Note this template was only made with 10,000,000bp of sequence so it may not follow Karlin-Altschul theory closely.