Java 12 - Mergesort and Performance Analysis

Analysis of Mergesort

  1. Seems alot like binary search. It divides the problem in two at every step.
    1. So, it has log2(N) levels where it recursively calls Mergesort
    2. After the log2(N)'th level, we are at the point of an array to sort of size 1
    3. But it does this for every portion of the array; it doesn't "throw away" any portion
  2. But, in each step, it doesn't just do one compare. It merges.
  3. The Merge call at each step scans through all of the elements that that step is for.
  4. Across all steps at a certain level, it does a total of about N scans
    1. E.g., after two divisions, the original array is split into fourths, and each fourth is merged, so across all four fourths, all N elements are merged. This is true at every level.
  5. So (N scans) * (log2(N) levels) means that Mergesort takes about N*log2(N) operations to sort N elements
    1. For sorting, this is actually really good. Its much better than N^2
    2. Unlike searching, for sorting you basically have to at least visit every element. So it can never be less than linear (searching could just look at much fewer elements).
    3. E.g., for a million elements, BubbleSort would take 1.0e12 operations, or a trillion, while MergeSort would take 1.99e7, or about 20 million.

Performance Analysis

  1. In class, we analyzed how well different algorithms do, like linear search and binary search, and insertion sort and merge sort
  2. But how do we know our programs really do this? And if our program isn't one of these simple algorithms, how do we know "how well" it is performing?
  3. Performance analysis is the act of determining how well the program runs on different sizes of inputs -- how fast it runs, how much memory does it use, etc.
  4. Thus, programming is more than just writing a correct program -- it is
    1. Selecting efficient algorithms for the problem,
    2. Programming them correctly,
    3. Analyzing the performance of the program,
    4. And using that analysis to improve the program (if it needs it)
  5. Sometimes, performance analysis means you have to add statements to your program that just gather statistics, but do not contribute to solving the problem
    1. We call this instrumenting your code
    2. Like your electric meter on your house/apartment

Analyzing Sorting Algorithms

  1. In class, we just looked at how many times each element was accessed
  2. In our program, we can count different types of operations
    1. Comparison operations
    2. Assignment operations
  3. In your lab, you need to instrument your mergesort program to count the number of assignments that take place.