N = total number of sites
counts[r] = number of sites with r read starts in (modified) original sequence
for each site 1...N
x = random number between 0 and 1 (uniform distribution)
if x < counts[0] / N
randomized_counts[site] = 0
else if x < (counts[0] + counts[1]) / N
randomized_counts[site] = 1
else if x < (counts[0] + counts[1] + counts[2]) / N
randomized_counts[site] = 2
else
randomized_counts[site] = 3
This randomization tends to eliminate the clustering of read starts
due to copy number variation. Note, however, that we are still
preserving the distribution of read start counts. As a result, this approach is expected to be more conservative than
just randomly locating read starts across the sequence. The reason for doing
things in this way is to allow for the fact that factors other than CNVs, such as
library amplification, can also cause clustering of read starts at a
particular site.
If there is a 0 in the denominator of your ratio (for cases where there were no D-segments for S), print -1. See the template file for other formatting details.