EECS 311: Algorithmic Analysis Notes |
Home Course Info Links Grades |
Lectures Newsgroup Homework Exams |
Algorithmic analysis in this course is relatively informal. There is no cookie-cutter "plug in the numbers" way to do this. Serious investigation of the techniques involved is done in our courses EECS 310 and EECS 336. Below are some key things to DO and NOT DO when analyzing your algorithm and implementation in this class.
Review chapter 2 of the textbook. It has examples of analyzing algorithms in big-Oh form, and shows the standard big-Oh formulas that pop up. Your real question is which of those formulas applies to your algorithm.
Assume the worst case. What you're trying to get is the lowest upper bound you can prove.
Recurrence relations are a handy tool when analyzing recursive algorithms. Solving recurrence relations in general is beyond this course, but there's only a couple of answer forms that apply in what we'll be doing, so all you're really need to see is which one you have. See chapter 2 and a nice introduction to recurrence relations at http://www.cs.duke.edu/~ola/ap/recurrence.html.
Consider all the independent variables. Independent means the things that can grow in size for different problems. This includes aspects of the input, such as the number of data items and the maximum possible size of a single data item, as well as aspects of the output, e.g., how many answers there might be.
In general, there are two key dependent variables for any program: the number of operations performed, and the amount of space used. The first determines when your code will be too slow to solve a big problem, the second when it will run out of memory.
As shown in the book, you boil down many lines of code to a O(1) "unit."
It doesn't matter how big that unit is as long as it doesn't depend on the
independent variable you're investigating. Code depends on a variable
when there's a loop involving that variable, e.g., for (int i = 0; i < n; ++i)
depends on n
.
Watch out hidden loops. When you copy or call find on a container, there's a hidden loop that depends on the size of the container. Remember that passing data by value does a copy.
Note: Some C++ compilers only copy a pass-by-value string if the string gets modified. If you want to assume that, say so in your analysis, but remember that this only makes a difference if your code doesn't modify the string.
With space allocation, one issue is how many copies get made. Even more important is how much space is needed at the same time in the worst case.
Critically evaluate how the empirical data does and does not match the analytical claims. Wishful matching will lose points. Wishful matching means things like "my analysis says N2, and the data has a curve going up, so it fits N2." If you have easy access to some simple curve fitting tools, like those in MATLAB or an online site, like ZunZun.com, use them to see if your data really fits, or if the fit gets worse and worse and numbers get bigger.
Do NOT kill yourself, unless you love doing this kind of thing. Typical good answers are 3 to 5 pages. Longer ones are often filled out with tons of graphs or code printout or other things that add little if any value. It's hard in less than 3 pages to actually make both an analytical and empirical case.
Do NOT include the entire code in the analysis. Only parts of the code that are invoked when the core operations are called matter. Initialization, output printing and so on, are not relevant.
Do NOT do just data analysis, i.e., curve fitting. Do NOT do just algorithmic analysis. Both are critical. The empirical side checks the correctness of the analytical side.
Comments? Send mail to c-riesbeck@northwestern.edu.