PS6, p3 [Computer Player]:
What I would like my pick_best_word() function to do is find all legal words from letters in hand and then, of those pick the word with the highest score. I am having trouble figuring out how to get the computer to produce words from a set of letters. Any suggestions?

Question

anonymous · Answer

If I could remember what I've done correctly:

1) Loop through the entire wordlist
2) For every word, check whether the letters matched those in the hand.
3) If so, store its score. 
4) Rinse and repeat. If another word had higher score than the previous one, store that instead.

anonymous · Answer

I was lying in bed this morning and though up that plan. It seems like it would take longer than getting possible sub strings from the word and testing those...although i guess both have to iterate through the wordlist...so maybe they would be the same speed?

anonymous · Answer

When you say "possible sub strings from the word", do you mean the hand? If that is the case, that would be a huge headache as you would have a list that contains all possible combinations (Which is not just 7!, but rather 7! + 6! + 5!..., etc). Although the speed to find the word would probably the same, but to create the that list is going to take some time. 

Then again, I'm not much of a fan of brute force as I could never wrap my heads around it.

anonymous · Answer

I did mean hand (still waking up). Also, think I almost got the computer player working.

anonymous · Answer

our sample dictionary is 80k words? other than checking every word in the dictionary- you need to check every permutation of the hand, in varying lengths - if the hand is 5 characters you need to check every permutation taken 2 at a time, 3 at a time 4 at a time and 5 at a time. like this: http://codepad.org/bdUxoWyf there are a lot of algorithms out there, some very cool, find one you can deal with and try to use it.

anonymous · Answer

a five letter hand has 325 of these permutations
a seven letter hand has 13699

anonymous · Answer

@bwCA, yes, except there are no 1-letter words.  For eight letters there are 109,592.  If word lookup is O(1) then for 7-letter hands or less it would be more efficient to generate the subwords and check each of them, for 8-letter hands or longer it's cheaper to test all words against the hand.  (But this assumes subword generation is O(1) as well, which is way off.  It's easier to just always check every word against the hand.)

The code to generate the subwords is kinda hairy, so it's prone to bugs.  But it's a nice little algorithm to try to tackle.  Start with just finding all permutations of n elements, and don't worry about finding subsets of the elements.

Here's the formula bwCA used$$perms = \sum_{k=0}^{n}\left(\begin{matrix}n \ k\end{matrix}\right)n!$$where n is the number of letters in the hand, k is each subword length, and $$\left(\begin{matrix}n \ k\end{matrix}\right)=\frac{n!}{(n-k)!k!}$$and is read "n choose k" (it's the number of ways to choose k elements from a set of n unique elements).

But there aren't any 0- or 1-length words, so k should really start at 2.

anonymous · Answer

Whoops.  That should be k! in the summation, not n!.

anonymous · Answer

Is problem 4 asking to generate all the subwords and test them against the rearranged_dict?

"To find some word that can be made out of the letters in HAND:
 For each subset S of the letters of HAND:
 Let w = (string containing the letters of S in sorted order)
If w in d: return d[w] "

anonymous · Answer

So as I understand this:
HAND = {r, o, t}
All substrings (that arent 0 or 1 letters) would be:
[ro, rt, rot, rto, or, ot, ort, otr, to, tr, tro, tor]

The function then sorts these? making:
[or, ot, rt, ort]

And finally these are tested against rearrange_dict, so that 'or', 'ot', and 'ort' would return as words? (or = or, ot = to, ort = rot)

anonymous · Answer

So, I think this is an efficient way of generating the permutations of a list: http://codepad.org/hpK1JoYf I think I may be over thinking this problem...blah

anonymous · Answer

The number of permutations of n unique elements is n!.  So, for 'abcd' (4 elements) there will be 24 different permutation strings.

What the problem is asking you to do is find the subSETs of the letters in the hand.  A set doesn't care about the order of its elements.  In general, the set of all subsets of a set of elements (called the "power set" of the set) will have 2^n elements (every elements will individually appear or not appear in a given set).  But in our case we don't care about 0- and 1-length words, so we subtract n+1 from that number (1 string of length 0, n strings of length 1).

As an example, use ORT.  Here are the 6 permutations (which we don't care about for this problem):
ORT
OTR
ROT
RTO
TOR
TRO

Here are the 8 subsets (we only care about the ones with 2 or more elements):
'' (empty)
O
R
T
OR
OT
RT
ORT

Anagramming a set of letters to find a word is complicated.  We could generate every permutation (n! of them) and check each one against a word list to see if it's a real word.  A more efficient way is to do what they're suggesting in the problem set.  For each word in the word list, create a canonical representation of the letters in the word (just the letters of the word, in alphabetical order.  I'm a Scrabble player, and we call this the "alphagram" of the word.).  Then add the word to a dictionary with the alphagram as a key.  Since more than one word can have the same alphagram, you have to store a list of words.

Here's what one entry in your dictionary will look like (for your example ORT):
{ 'ORT': ['ORT', 'ROT', 'TOR'] }

Now if my hand is TRO, then I just have to find the alphagram, ORT, then look that up in the dictionary to find its anagrams:
anagrams = wordsByAlphagram['ORT']

Actually, the problem doesn't even go this far.  As long as the computer can find ANY word to play, that's sufficient.  So, just store any valid anagram, instead of the list of all possible anagrams:
{ 'ORT': 'ROT' }

But you can play shorter words than the entire hand size.  That's why they want you to find subSETs of the letters in the hand.  Once you generate each unique subset, find its alphagram, and see if it makes a word (has an entry in wordsByAlphagram).  Finding the subsets is the tricky part.

This is a much easier algorithm to anagram words than finding all permutations and looking each one up.  You find ONE canonical permuation for each word (by the very simple process of sorting its letters), find the same representation for your given set of letters, and see if any of them match.  It takes a lot of work up front, and a lot of memory to store each word and its alphagram, but it makes the task of anagramming quick and straightforward.

The same strategy can be used to check if two strings are equal, ignoring case.  Instead of finding all possible combination of upper/lower case letters in the target string and seeing if the given string matches it, just convert both strings to a canonical representation: all upper case or all lower case.

anonymous · Answer

If I could give you more medals, I totally would! Here is what I have: http://codepad.org/wd5hTWlH The function powerset I found and am still not 100% how it works. If you could take a look at it and maybe explain it a bit I would greatly appreciate it. Thanks!

anonymous · Answer

Looking at line 3, you can see this is a recursive function. It calls itself on a smaller problem (the seq without the last element). The base case is on lines 2 and 6 (if the seq is empty, return a set containing only the empty set). So, if we trust the recursive call to do what it's supposed to do, after line 3 head will have the powerset of the elements in the seq excluding the last element. (If you called seq("abcd") then head will have the powerset of "abc".) The powerset of "abcd" has 16 elements. The powerset of "abc" has 8 elements. All the elements of powerset("abc") will be in powerset("abcd"). So where do those other 8 elements come from? They're just the elements of powerset("abc") with a "d" tacked on to the end. w/o d w/ d ----- ---- 0 d a ad b bd c cd ab abd ac acd bc bcd abc abcd So, line 4 is probably doing that, but how? First, you have to understand what a list comprehension is. Go read http://docs.python.org/tutorial/datastructures.html#list-comprehensions then come back. I'll wait. Okay! So let's break it down. return head + [item + [seq[-1]] for item in head] "head + [??]": Since the list comprehension is creating a list, this statement is creating a new list which is the elements of head followed by the elements from the comprehension. And remember that head is the powerset of "abc" in this example. "[seq[-1]]": This is a list containing just the final element we lopped off for the recursive call, "d" in our example. "for item in head": This is just like how it sounds. Iterate through the elements in head (the subsets of "abc") and call each one "item." But what do we do with them...? "item + [seq[-1]]": ...Extend them each with our lopped off element, "d". (We have to put the element in a list because the "+" operator for lists requires two lists.) So, line 4 is creating a list of the elements in the powerset of "abc", followed by the same elements with "d" tacked on to them. Just like we wanted. It drills down into the recursive calls first so it can get to the first element of the sequence before building up the final list. That way it appends the later elements to the backs of the existing subsets, so the elements in the subsets come out in the same order you put them in.

anonymous · Answer

in problem three we had to generate all the permutations because the order mattered: for the letters a and d, 'ad' as well as 'da' had to be checked. in problem four we made an effort to sort everything first so we only have to generate all the combinations (only 120 for seven letters): for the letters a and d, only 'ad' has to be checked. a little bit of work up front made the search much easier. powerset is pretty cool. it generates all the combinations. If you sort the hand once before you pass it to powerset then all the subsets will in sorted order - line 23 would be unecessary. http://dpaste.com/635186/ The itertools module in the Python Standard Library has a permutation and a combination iterator (Python v.2.6+) that make those tasks pretty easy. http://www.python.org/doc//current/library/itertools.html#itertools.permutations http://www.python.org/doc//current/library/itertools.html#itertools.combinations http://dpaste.com/635174/

anonymous · Answer

dmancine: Thanks for the awesome explanation. I am still having some trouble wrapping my head around recursive functions I think (I can implement them, but then explaining how they work sometimes escapes me). 

bwCA: itertools looks cool. I am on an older version of Python (2.5), but I will download a newer version and play around with that.

anonymous · Answer

I'm so baffled by part 3 of this problem, its driving me insane, some1 please help!! I'm stuck on the first part of the problem of generating all possible anagrams of a hand. I haven't yet been able to solve the problem with anagrams that are the same length of the hand. This is how I am currently approaching the problem: -The problem can be broken down into a recursive one, by using the base case of when a hand is one char long, an empty is returned. If a hand is two chars long add this to possible list of permutations, and reverse this two char string and add to list of permutations. -If hand is 3 chars long, start on first char of string, remove this char from the string, and then use the above function on the remaining two chars. add previously discarded char onto result of function, to return two 3 string chars. -Next move onto second char of string.. The problem I'm having with implementing the above is that my function cannot deal with sequences with a length greater than 3. I'll post the code i;ve written so far, but it might be confusing as hell. I would really appreciate if someone could suggest how to implement this properly or whether it would be better to use a simpler non-iterative approach. Thanks! http://codepad.org/cM1jEPlK

anonymous · Answer

@sachin91, Treating each hand size as a separate case is a recipe for a nightmare.  (Was that a mixed metaphor?)  It also limits the hand size to the number of cases you can handle.

For this problem I'd come at it from the other side.  Try calling get_frequency_dict(word) and see if you can do something with that.

anonymous · Answer

I don't get how calling get_freq_dictword will help - doesn't that just generate a dictionary from a sequence?  Or are you implying that I should use the dictionary form of the hand in order to generate the permutations?  ...

anonymous · Answer

cheers

anonymous · Answer

I'm going to cheat temporarily by using itertools, although if you can help me to find the proper way of doing it, that'll be grand

anonymous · Answer

Like I said, call get_frequency_dict on the WORD.  That gives you a count of all the letters in the word.  You already have a count of all the letters in the hand.  How can you use those two sets of counts to see if the word could be made from the hand?

anonymous · Answer

i get what you are saying, I needed you to state the obvious! thanks