Welcome to another post related to competitive programming! This post is in continuation to my earlier post “Competitive Programming and Time Complexity” which you can read here. Today I will be discussing the concept of binary search, which is an efficient technique of finding the position of an element in a sorted array. So let’s get started!
The naive approach
Normally, to find a given element (i.e., its index) in an array, we compare the search-item with each of the elements in the array. We continue this process until either we find the search-item, or we reach the end of the list. The implementation of this approach in Java is given below:
static int find(int[] a, int key) {
for(int i = 0; i < a.length; i++) {
if(a[i] == key)
return i;
}
//element not found
return -1;
}
The time complexity of this approach is O(n) (n being the length of the array). The reason is that, in the worst case, the algorithm examines all ‘n’ elements of the array. The question is, can we do better than this? The answer is no, unless the array is known in advance to be sorted. And this is where binary search comes into play.
Searching a sorted array
Assume that the array is sorted in ascending order. The binary search algorithm is often compared to searching for a given word in a dictionary. Here, we maintain an active region in the array where the element can possibly be found, initially consisting of the whole array. (This region is demarcated by the indices of the starting and ending elements, which we store in variables.)
We compare the search-item with the element in the middle of the active region. If the search-item is more than the middle element, search continues to the portion of the active region with all elements to the right of (greater than) the middle element, and vice-versa if the search-item is less than the middle element. Note that the size of the active region gets halved/more than halved in each such step. The process ends when either we find the search-item, or the size of the active region becomes zero and still we do not find the search-item.
Here is the implementation of this approach in Java:
static int find(int[] a, int key) {
int low = 0, high = a.length – 1, mid;
while(low <= high) {
mid = low + (high – low)/2;
if(a[mid] == key)
return mid;
else if(a[mid] < key)
low = mid + 1;
else
high = mid – 1;
}
//element not found
return -1;
}
Here, the variables ‘low’ and ‘high’ maintain the lower and the upper indices of the active region of the array, respectively. As the algorithm roughly halves the size of the active region at every iteration (until it becomes zero), the loop makes roughly log2(n) iterations for large ‘n’ (n = size of array). Therefore, the time complexity is O(log n), a big improvement from our naive solution.
An alternative approach
There is another approach to binary search, which gives the algorithm a different perspective. Here, starting from the first element, we make ‘jumps’ through the array and reduce the jump size as we get closer to the target element. (A ‘jump’ refers to incrementing the index of the current element by a certain amount.) The initial jump size is taken as half the size of the array, and is halved at every stage, until the jump size becomes zero. The Java code and the explanation for it will make things clearer:
static int find(int[] a, int key) {
int k = 0, n = a.length;
for(int b = n/2; b > 0; b /= 2) {
while(k+b < n && a[k+b] <= key)
k += b;
}
if(a[k] == key)
return k;
else
return -1;
}
The variable ‘k’ stores the index of the current element, and ‘b’ stores the jump size, which is halved after every iteration of the ‘for’ loop. In the inner ‘while’ loop, we repeatedly check the value of the (k+b)th element in the array, and increase ‘k’ by ‘b’ as long as this element does not exceed the search-item. Once this is done, ‘b’ is halved and the same procedure is repeated for the new jump size.
If the element is found, then at the end of the loop, ‘k’ will be the index of the search-item; if not, then it will be the largest index such that the corresponding value is less than the search-item. (This could prove to be useful in some problems.)
Time complexity analysis
At first glance, however, this approach appears to have a worse time complexity than the previous approach, because of the use of two nested loops. Let’s have a closer look. Notice that the inner ‘while’ loop will always execute at most twice for a given iteration of the outer loop.
(Intuition: Suppose that the jump size is initially 5, and we have reached a stage when we cannot make another jump of this size. When ‘b’ gets halved, it becomes 2 (because of integer division), and we already know that a jump of size 5 is not allowed. So at the very worst, in the next iteration, we will need to make 2 jumps of size 2 each.)
With this in mind, the inner loop runs in constant time. The outer loop makes roughly log2(n) iterations for large ‘n’ (because we start with b = n/2 and then we keep halving it until it becomes zero). Hence, the overall time complexity is (surprisingly) O(log n), the same as our first approach.
With this we come to the end of this post. Please do stay tuned for more competitive programming posts in the future!
(Reference: Competitive Programmer’s Handbook by Antti Laaksonen)