Runs of Ones in Binary Strings

Félix Balado and Guénolé Silvestre

February 20th, 2023

1 Introduction

A run of ones in a binary string is an uninterrupted sequence of ones flanked on each side either by a zero or by the start/end of the string (Mood’s criterion [1]). In the following, a run of ones will be simply referred to as a “run”.

What is the total number of runs of length \(i\) over all binary \(n\)-strings, or equivalently, if we draw binary \(n\)-strings uniformly at random, what is the expected number of runs of length \(i\) in a binary \(n\)-string?

This problem was previously solved by Sinha and Sinha [2] —in fact, these authors also solved a harder problem in [3] from which the solution to the problem addressed here can be produced.

Here we wish to show that the solution can be found in shorter and simpler ways than in [3] or [2] (see also [4]). We give three different solutions: two of them use elementary counting arguments (recursive and combinatorial solutions), while the third one is probabilistic.

2 Counting Runs Recursively

Let \(r_n(i)\) be the total number of runs of length \(i\) over all binary \(n\)-strings.

First of all, let us get some visual intuition. In the diagrams below, for \(n=2,3\) and \(4\) we list all binary \(n\)-strings and, right underneath, their runs “spectra” (i.e. the number of runs of lengths \(1\) to \(n\) found in each particular \(n\)-string). On the right we show \(r_n(i)\).

  • • \(n=2\):

    0 0 1 1
    0 1 0 1
    \(i\) \(r_2(i)\)
    1 0 1 1 0 2
    2 0 0 0 1 1
  • • \(n=3\):

    0 0 0 0 1 1 1 1
    0 0 1 1 0 0 1 1
    0 1 0 1 0 1 0 1
    \(i\) \(r_3(i)\)
    1 0 1 1 0 1 2 0 0 5
    2 0 0 0 1 0 0 1 0 2
    3 0 0 0 0 0 0 0 1 1
  • • \(n=4\):

    0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
    0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
    0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
    0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
    \(i\) \(r_4(i)\)
    1 0 1 1 0 1 2 0 0 1 2 2 1 0 1 0 0 12
    2 0 0 0 1 0 0 1 0 0 0 0 1 1 1 0 0 5
    3 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 2
    4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1

Assuming \(n>1\), it is trivial to see that

\begin{align} r_n(n)&=1,\nonumber \\ r_n(n-1)&=2.\label {eq:rnnm1} \end{align} This solves the problem for \(n=2\). Assuming \(n>2\), let us next see how, for \(1\le i<n-1\), \(r_n(i)\) can be recursively obtained from \(r_{n-1}(i)\):

  • • On the one hand, consider the \(n\)-strings that start with a zero: these trivially contribute \(r_{n-1}(i)\) runs to \(r_n(i)\).

  • • On the other hand, consider the \(n\)-strings that start with a one. For \(1\le i \le n-1\), the \(2^{n-i-1}\) \(n\)-strings that start with \(i\) ones followed by at least one zero add \(2^{n-i-1}\) runs to \(r_{n-1}(i)\) if \(i<n-1\), and subtract \(2^{n-i-1}\) runs from \(r_{n-1}(i-1)\) if \(i>1\).

    Thus, the \(n\)-strings that start with a one contribute \(r_{n-1}(i)+2^{n-i-1}-2^{n-i-2}=r_{n-1}(i)+2^{n-i-2}\) runs to \(r_n(i)\).

So, collecting these two contributions we have that

\begin{equation} \label {eq:rec} r_{n}(i)= 2\,r_{n-1}(i)+2^{n-i-2}. \end{equation}

Now, by using the expression above recursively \(k\) times, with \(k< n-i\), we get

\begin{equation} \label {eq:rn_k} r_n(i)=2^k r_{n-k}(i)+ k\, 2^{n-i-2}. \end{equation}

When \(k=n-i-1\), we have that \(r_{n-k}(i)=r_{i+1}(i)=2\) because of (1). Thus, inputting this value of \(k\) in (3) we get the following explicit expression for the number of runs of ones of length \(i\) over all binary \(n\)-strings:

\begin{equation} \label {eq:rn} r_n(i)=(n-i+3)\,2^{n-i-2}, \end{equation}

where \(1\le i<n-1\). Incidentally, (4) is also valid when \(i=n-1\), i.e. it includes (1). Furthermore, we can see from (4) that \(r_{n-1}(i-1)=r_n(i)\) for \(n\ge 2\), and so recurrence (2) can alternatively be expressed as a recurrence on \(i\) for a given \(n\) (rather than as a recurrence on \(n\) for a given \(i\)) as

\begin{equation} r_{n}(i)= 2\,r_{n}(i+1)+2^{n-i-2}.\label {eq:rec_alt} \end{equation}

Also, the total number of runs over all \(n\)-strings is

\begin{equation} \label {eq:total} t(n)=\sum _{i=1}^n r_n(i)=(n+1)\,2^{n-2}. \end{equation}

Recursion (2) was previously given in [2] and [4]. Also (4) and (6) were given in [2].

3 Counting Runs Using Combinatorics

Let us now give a way to obtain (6) directly by using combinatorics, i.e. without relying on (4). Consider the partitioning of an arbitrary \(n\)-string into \(p\) nonempty substrings, where \(1\le p\le n\). As shown in the example below, in which the \(n\)-string is represented by \(n\) asterisks and the \(p\) partitions by \(p+1\) vertical bars, we can put that partition into a bijection with \(p\) runs of ones from binary \(n\)-strings:

\(\mid \) \(*\) \(*\) \(*\) \(\mid \) \(*\) \(*\) \(*\) \(*\) \(\mid \) \(*\) \(*\) \(\mid \) \(*\) \(*\) \(*\) \(*\) \(*\) \(\mid \) \(\cdots \) \(\mid \) \(*\) \(\mid \) \(*\) \(*\) \(\mid \)
1 1 1 0 0 0 0 1 1 0 0 0 0 0 \(\cdots \) 0 1 1
0 0 0 1 1 1 1 0 0 1 1 1 1 1 \(\cdots \) 1 0 0

The number of ways in which we can partition an \(n\)-string into \(p\) nonempty substrings is \(n-1\choose p-1\). As we have \(p\) runs of ones associated to each of the \(n-1\choose p-1\) possibilities, the total number of runs over all binary \(n\)-strings is

\begin{equation} \label {eq:total_alt} t(n)=\sum _{p=1}^{n} p \,{n-1\choose p-1}=(n+1)\,2^{n-2}. \end{equation}

Next, let us exploit the parallelism between expressions (4) and (6) to write \(r_n(i)\) in a way that echoes (7). In order to do so, let us put (4) as the addition of the following two terms:

\begin{equation} \label {eq:rn_expanded} r_n(i)=(n-i+1)\,2^{n-i-2}+2^{n-i-1}. \end{equation}

We can now see that the first term can be expanded using the same summation as in (7). Interestingly, by using \(2^n=\sum _{k=0}^n {n\choose k}\), the second term can also be expanded into a summation with the same limits and over the same binomial coefficients as the first one, but without the \(p\) multiplication factors. Combining these two observations we can write (8) as follows:

\begin{align} r_n(i)&=\sum _{p=1}^{n-i}p\, {n-i-1\choose p-1}+\sum _{p=1}^{n-i}{n-i-1\choose p-1}\nonumber \\ &=\sum _{p=1}^{n-i}(p+1)\, {n-i-1\choose p-1}. \label {eq:rn_expanded_2} \end{align} Expression (9) literally tells us that there is a one-to-one correspondence between a partitioning of an arbitrary \((n-i)\)-string intro \(p\) nonempty substrings and \(p+1\) runs of length \(i\) from binary \(n\)-strings. This provides a simple combinatorial approach to obtaining (4).

In order to see how the bijection implied by (9) works we just need to place a run of ones of length \(i\) at each of the \(p+1\) possible positions in the partitioning (i.e. the \(p-1\) divisions plus the start and the end of the string) and then complete a binary \(n\)-string with alternating runs of ones and zeros having lengths determined by the partitioning.

For example, let us take \(n=4\) and \(i=1\) and explicitly illustrate the correspondence between each of the possible partitionings of an arbitrary \(3\)-string “\(***\)” into \(p\) nonempty substrings and \(p+1\) runs of length \(1\) in binary \(4\)-strings:

  • • \(p=1\) \(\to \) \({2\choose 0}=1\) partitioning associated with \(2\) runs of length \(1\)

    \(\mid \) \(*\) \(*\) \(*\) \(\mid \)
    \(\bbar \) \(*\) \(*\) \(*\) \(\mid \) \(\mathbf {1}000\)
    \(\mid \) \(*\) \(*\) \(*\) \(\bbar \) \(000\mathbf {1}\)
  • • \(p=2\) \(\to \) \({2\choose 1}=2\) partitionings, each associated with \(3\) runs of length \(1\):

    \(\mid \) \(*\) \(\mid \) \(*\) \(*\) \(\mid \)
    \(\bbar \) \(*\) \(\mid \) \(*\) \(*\) \(\mid \) \(\mathbf {1}011\)
    \(\mid \) \(*\) \(\bbar \) \(*\) \(*\) \(\mid \) \(0\mathbf {1}00\)
    \(\mid \) \(*\) \(\mid \) \(*\) \(*\) \(\bbar \) \(100\mathbf {1}\)
    \(\mid \) \(*\) \(*\) \(\mid \) \(*\) \(\mid \)
    \(\bbar \) \(*\) \(*\) \(\mid \) \(*\) \(\mid \) \(\mathbf {1}001\)
    \(\mid \) \(*\) \(*\) \(\bbar \) \(*\) \(\mid \) \(00\mathbf {1}0\)
    \(\mid \) \(*\) \(*\) \(\mid \) \(*\) \(\bbar \) \(110\mathbf {1}\)
  • • \(p=3\) \(\to \) \({2\choose 2}=1\) partitioning, associated with \(4\) runs of length \(1\):

    \(\mid \) \(*\) \(\mid \) \(*\) \(\mid \) \(*\) \(\mid \)
    \(\bbar \) \(*\) \(\mid \) \(*\) \(\mid \) \(*\) \(\mid \) \(\mathbf {1}010\)
    \(\mid \) \(*\) \(\bbar \) \(*\) \(\mid \) \(*\) \(\mid \) \(0\mathbf {1}01\)
    \(\mid \) \(*\) \(\mid \) \(*\) \(\bbar \) \(*\) \(\mid \) \(10\mathbf {1}0\)
    \(\mid \) \(*\) \(\mid \) \(*\) \(\mid \) \(*\) \(\bbar \) \(010\mathbf {1}\)

If \(n=4\) and \(i=2\) we look at the partitionings of “\(* *\)” into \(p\) nonempty substrings:

  • • \(p=1\) \(\to \) \({1\choose 0}=1\) partitioning associated with \(2\) runs of length \(2\)

    \(\mid \) \(*\) \(*\) \(\mid \)
    \(\bbar \) \(*\) \(*\) \(\mid \) \(\mathbf {11}00\)
    \(\mid \) \(*\) \(*\) \(\bbar \) \(00\mathbf {11}\)
  • • \(p=2\) \(\to \) \({1\choose 1}=1\) partitioning associated with \(3\) runs of length \(2\)

    \(\mid \) \(*\) \(\mid \) \(*\) \(\mid \)
    \(\bbar \) \(*\) \(\mid \) \(*\) \(\mid \) \(\mathbf {11}01\)
    \(\mid \) \(*\) \(\bbar \) \(*\) \(\mid \) \(0\mathbf {11}0\)
    \(\mid \) \(*\) \(\mid \) \(*\) \(\bbar \) \(10\mathbf {11}\)

Finally, if we let the sum in (9) start at \(p=0\) rather than \(p=1\), then (9) also includes the case1 \(i=n\), whereas (4) is only valid for \(i\le n-1\).

1 Using \({-1\choose -1}=1\) [5] and \({t\choose -1}=0\) for all nonnegative integers \(t\).

4 Counting Runs Probabilistically

Assume that the binary \(n\)-strings are generated uniformly at random, and define \(n\) indicator random variables \(S_1,\ldots ,S_n\) such that \(S_k=1\) if a run of ones starts at the \(k\)-th position of the string and \(0\) otherwise. There are two cases for the probability of \(S_k\):

\begin{equation} \label {eq:pr_sk} \Pr (S_k=1)=\left \{ \begin{array}{l l} \frac {1}{2}&\text { if } k=1\\ \frac {1}{4}&\text { if } 2\le k\le n \end {array}\right ., \end{equation}

as if a run starts at position \(k>1\) then the previous bit must be zero. The total number of runs is thus given by r.v. \(T=\sum _{k=1}^n S_k\), whose expectation \(\Expc (T)=\sum _{k=1}^n \Expc (S_k)=\sum _{k=1}^n \Pr (S_k=1)\) is

\begin{align} \label {eq:exp_t} \Expc (T)&=(n+1)\,2^{-2}. \end{align} The rationale above was given by Alex Proscurin in Expected value of a run of a random bitstring. As we can see, (11) is the normalisation of (6), i.e. \(\Expc (T)=t(n)/2^n\).

Let us extend the procedure above and define indicator r.v.’s \(R_k^{(i)}\) such that \(\Pr (R_k^{(i)}=1)\) is the probability that a run of length \(i\) starts at position \(k\). We have that

\begin{equation} \Pr (R_k^{(i)}=1)=\Pr (R_k^{(i)}=1|S_k=1)\Pr (S_k=1),\label {eq:prk} \end{equation}

because \(\Pr (R_k^{(i)}=1|S_k=0)=0\). As for the conditional probabilities in (12), there are three cases:

\begin{equation} \label {eq:cond_pr} \Pr (R_k^{(i)}=1|S_k=1)=\left \{ \begin{array}{l l} 2^{-i}& \text { if }1\le k\le n-i\\ 2^{-(i-1)}& \text { if }k= n-i+1\\ 0 & \text { if } n-i+1<k\le n \end {array}\right .. \end{equation}

The first two cases are just the application of the geometric distribution with parameter \(1/2\), noting that in the second case the run does not need to be finished by a zero. Now, the number of runs of length \(i\) is given by r.v. \(T^{(i)}=\sum _{k=1}^n R_k^{(i)}\), whose expectation \(\Expc (T^{(i)})=\sum _{k=1}^n \Expc (R_k^{(i)})=\sum _{k=1}^n \Pr (R_k^{(i)}=1)\) is

\begin{equation} \label {eq:exp_ti} \Expc (T^{(i)})=(n-i+3)\,2^{-i-2}. \end{equation}

Again, we have that (14) is the normalisation of (4), i.e. \(\Expc (T^{(i)})=r_n(i)/2^n\).

The total fraction of runs of length \(i\) is

\begin{equation} \label {eq:fni} f_n(i)=\frac {r_n(i)}{t(n)}=\frac {n-i+3}{n+1}\,2^{-i} \end{equation}

for \(1\le i\le n-1\), whereas \(f_n(n)=1/t(n)\). Interestingly, \(f_n(2)=1/4\) for any \(n>2\). For large \(n\), (15) should approximate the fraction of runs of length \(i\) in one single binary \(n\)-string drawn uniformly at random, in which case \(f_n(i)\approx 2^{-i}\).

5 Connections with Compositions, and Relationship to OEIS Sequences

Sequence A045623 from OEIS [6] (number of \(1\)’s in all compositions of \(j+1\)) is defined by \(a(j)=(j+3)\,2^{j-2}\) for \(j\ge 1\) and \(a(0)=1\). Observing (4), \(r_n(i)=a(n-i)\).

Also, sequence A001792 (number of parts in all compositions of \(j + 1\)) is defined by \(b(j) = (j+2)\, 2^{j-1}\). From (6), \(t(n)=b(n-1)\).

Similarly, the equivalent of expression (7) and expression (9) are given in A045623 and A001792, respectively.

These connections are due to the bijection between runs in binary \(n\)-strings and parts of the compositions of \(n\), which is easy to observe when the compositions are put as the different ways in which the sum \(1+1+\cdots +1=n\) can be partitioned into nonzero parts (see diagram at start of Section 3). Therefore, the runs in all binary \(n\)-strings are in a one-to-one correspondence with the parts in all compositions of \(n\), which is \(t(n)\) in both cases. Similarly, the runs of length \(1\) in all binary \(n\)-strings are in a one-to-one correspondence with the \(1\)’s in all compositions of \(n\), which is \(r_n(1)\) in both cases.

After noticing the connection between runs in binary strings and compositions, we found that the gist of the counting argument that we used in Section 2 was previously given by Mike Earnest in Number of parts equal to \(1\) in all compositions of \(n\). Also, the argument to get (7) must be well known in the compositions problem.

Acknowledgements

Thanks to Kevin Ryde for pointing out the connection with A001729.

References

  • [1]  A. M. Mood. The distribution theory of runs. Ann. Math. Statist., 11(4):367–392, December 1940.

  • [2]  K. Sinha and B.P. Sinha. Energy-efficient communication: Understanding the distribution of runs in binary strings. In 1st International Conference on Recent Advances in Information Technology (RAIT), pages 177–181, Dhanbad, India, January 2012.

  • [3]  K. Sinha and B.P. Sinha. On the distribution of runs of ones in binary strings. Computers & Mathematics with Applications, 58(9):1816–1829, 2009.

  • [4]  F. Makri, Z. Psillakis, and N. Kollas. Counting runs of ones and ones in runs of ones in binary strings. Open Journal of Applied Sciences, 2:44–47, 2012.

  • [5]  M.J. Kronenburg. The binomial coefficient for negative arguments, 2011.

  • [6]  N.J.A. Sloane and The OEIS Foundation Inc. The on-line encyclopedia of integer sequences, 2020.