Given an array of n integers, each between 1 and n-1 inclusive, can you detect duplicates in O(1) space and O(n) time?

Assume that the input is available as an array (with O(1) lookup time). Note that there must be at least one duplicate due to the pigeonhole principle.

The trick is to treat each integer as an *index* of the input array, such that it points to another integer in the input i.e. treat it as a “pointer”.

There are a few things to note that will help us solve this:

- Since each integer is a pointer, duplicate integers will result in more than one pointer going to the same position.
- The n
^{th}(last) integer cannot have any pointers going to it, since its position is one greater than the maximum value stated in the problem. - Starting from any integer, if we iteratively follow the pointers, we will eventually reach a position that we’ve seen before i.e. a repeated subsequence, or cycle.
- This could occur immediately i.e. if an integer points to its own position, in which case we have only detected a single pointer going to its own location.
- However, since the last integer cannot have any pointers going to it, if we start from this integer, we have to reach at least one other integer before we get stuck in a cycle.
Such a cycle
*must*include an integer with more than one pointer going to it. This is because every integer except for the start value will initially have one incoming pointer. A cycle will add another pointer to one of these integers (the start value cannot have any incoming pointers, as already noted).

Hence, if we create a sequence by following pointers starting with the last integer, the duplicate integers will be those that point to the beginning of a cycle.

Fortunately, cycle detection is a well-known problem in Computer Science, and there are a few algorithms that can solve this optimally in O(n) time and O(1) space: Floyd or Brent’s algorithms.

The visualisation above shows duplicate detection for a randomised array of 10 integers, using Floyd’s algorithm. The position of the first repeated subsequence is marked in red, and duplicates are marked in orange.

It’s worth noting that cycles can be detected more quickly at the expense of memory.

© Jason Davies 2012.