N = total number of sites counts[r] = number of sites with r read starts in (modified) original sequence for each site 1...N x = random number between 0 and 1 (uniform distribution) if x < counts[0] / N randomized_counts[site] = 0 else if x < (counts[0] + counts[1]) / N randomized_counts[site] = 1 else if x < (counts[0] + counts[1] + counts[2]) / N randomized_counts[site] = 2 else randomized_counts[site] = 3This randomization tends to eliminate the clustering of read starts due to copy number variation. Note, however, that we are still preserving the distribution of read start counts. As a result, this approach is expected to be more conservative than just randomly locating read starts across the sequence. The reason for doing things in this way is to allow for the fact that factors other than CNVs, such as library amplification, can also cause clustering of read starts at a particular site.
If there is a 0 in the denominator of your ratio (for cases where there were no D-segments for S), print -1. See the template file for other formatting details.