PS6, p5: I put my question/answer in the second post so as to hide a possible spoiler for anyone working on this problem.

Question

anonymous · Answer

I think that the pick_best_word is O(n^2) or quadratic because the program iterates through the entire word list AND each word in that wordlist once.

and I think that pick_best_word_faster is O(2^n) or exponential because the amount of possible subsets grows exponentially with n (n being the length of the hand). 

So for a certain hand length pick_best_word_faster is best, but it can quickly slow down as the length of the hand grows.

Any thoughts?

anonymous · Answer

You need to be careful about what you're calling n.  The size of the wordlist is one thing.  The size of the hand is another.  So call them different letters (like n and m).

As for analyzing the algorithms, here are brief descriptions of them:

1. pick_best_word:
    for each word in word_list
        see if word is in hand

2. pick_best_word_faster:
    for each subset in hand
        see if alphagram(subset) is in rearrange_dict

Figure out the complexity for each line.  Multiply to find the overall complexity for the whole process.

anonymous · Answer

So since the word_list does not change, for 1. would it actually be constant since regardless of how big the hand is at worst it will have to iterate through the word_list one whole time?

As for 2, the number of subsets grows exponentially with the length of the hand, so the overall complexity would be exponential right?

I think I should watch the lecture on analysis again.

anonymous · Answer

Let n = size of word list.  Let m = size of hand.  Let w = avg word length in word list.

1. pick_best_word: O(nw)
    for each word in word_list = O(n) (one O(1) for each word)
        see if word is in hand = O(w) (do one O(1) thing for each letter in word)

2. pick_best_word_faster: O(m2^m)
    for each subset in hand = O(2^m) (2^m subsets of m elements)
        see if alphagram(subset) is in rearrange_dict = (O(m/2)=O(m) to sort, O(1) to lookup)

Let me know if you want more detail on these numbers.  But try to figure out where I got them first.

anonymous · Answer

Ok, I think I understand how you got those. My question then is: is 1 generalized as linear because the complexity grows with n and/or w?  and would 2 simply be generalized as exponential even though there is a linear step to it because the exponential-ness is more important than the linear part of the function especially as m increases?

I think I am beginning to comprehend how algorithms are analyzed, but I'm having trouble with the vocabulary/representations if that makes any sense.

anonymous · Answer

This is kind of a weird problem to analyze because there are so many different kinds of inputs (word list size, average word size, hand size).  Normally questions about big-O aren't this complex.

#1 is O(nw) where n is the number of words in the word list, and w is the average word length for the word list.  So, if you think about it, nw is just the total number of LETTERS in the word list.  If you call that number c, then the algorithm is O(c), so it's linear wrt the total number of letters in the word list.  If you suppose that the average word length will remain pretty much constant if you add or remove words (which it probably will), then you could also just say it's O(n).  If you want to say it's linear, then you have to specify what variable that's in reference to.  So you could say it's linear with respect to the number of words in the word list.

For #2 I said it's O(m2^m), but sorting m things will be O(m log m), so it'll really be O(m log m 2^m).  You can't really ignore any of those terms because they're all in there for a reason.  And comparing the algorithm's behavior at different hand sizes (m) will depend on those terms.  True, at even small m's 2^m will dominate m, but the m still contributes.  Changing m from 1 to 10 would multiply a O(2^m) algorithm's running time by 1024, but would multiply a O(m2^m) algorithm's running time by 10,240.  If you ignored that m factor you'd be off by a factor of 10.  That's big.  If it was O(m+2^m) then we could drop the m term.

And if you assume the hand size will be constant, then #2 becomes O(1), which I think is the result they're trying to get at.  But if not then it'd be interesting to look at how the big-O functions behave when n and m change, and for what values you might want to use one algorithm or the other.  In the problem set n is about 80k and m is about 7.  80k dominates 7*log7*2^7, but when won't it?  What about m=8, or 15 or 30?  What if n was 40k or 178k; what values of m would make #2 more efficient?

But, like I said, this is probably more complicated than they were going for.  If you just look at the size of the word list, n, then
1. Linear
2. Constant

anonymous · Answer

Thanks dmancine! Now to try my hand at ps 7...

anonymous · Answer

I'm so baffled by part 3 of this problem, its driving me insane, some1 please help!! I'm stuck on the first part of the problem of generating all possible anagrams of a hand. I haven't yet been able to solve the problem with anagrams that are the same length of the hand. This is how I am currently approaching the problem: -The problem can be broken down into a recursive one, by using the base case of when a hand is one char long, an empty is returned. If a hand is two chars long add this to possible list of permutations, and reverse this two char string and add to list of permutations. -If hand is 3 chars long, start on first char of string, remove this char from the string, and then use the above function on the remaining two chars. add previously discarded char onto result of function, to return two 3 string chars. -Next move onto second char of string.. The problem I'm having with implementing the above and my function fails if the input is > letters long. I'll post the code I’ve written so far, but it might be confusing as hell. I would really appreciate if someone could suggest how to implement this properly or whether it would be better to use a simpler non-iterative approach. Thanks! http://codepad.org/yVjIvdv7