Went back to the drawing board on ps1. I found a prime generator in the Python Tutorial & built upon it. It works, it's simple, I wrote it in no time. The script has far less lines than ones that eliminate evens & only divide by primes. I'd like to get your thoughts on reasons why my approach is better or worse than the recommended approach for ps1.

http://pastebin.com/12x29yH2

Question

Went back to the drawing board on ps1. I found a prime generator in the Python Tutorial & built upon it.  It works, it's simple, I wrote it in no time. The script has far less lines than ones that eliminate evens & only divide by primes.  I'd like to get your thoughts on reasons why my approach is better or worse than the recommended approach for ps1.

http://pastebin.com/12x29yH2

anonymous · Answer

Your program is good since it works, but it will be slower than the recommended approach that you are referring to. This is because your program checks every number between the candidate and 2. The other program checked fewer numbers. Number of lines isn't a good indicator of efficiency.

anonymous · Answer

On small data sets it doesn't matter that much, it's really the difference of seconds.  But when you are trying for sets of greater quantities (see my note at the bottom for test sets of over 100,000).  Part of computer science is knowing your data set and understanding the algorithm being used on it and its inherent costs.
  
Using a pentium4 1.7GHz laptop from three years ago with kubuntu I did some testing for the 1,000 and 10,000 prime respectively:

All integers: 0m2.12s, 4m10.58s
All integers up to half: 0m2.59s, 5m25.07s
Only odd: 0m0.83s, 1m38.61s, 4hr19m*
Only odd up to half: 0m0.92s, 1m46.64s, 4hr9m*
Only prime: 0m0.31s, 0m20.64s
Only prime up to half: 0m0.36s, 0m23.45s

Conclusively, in small data sets cutting out the upper half isn't worth the check we do (although this may be brought more efficiently if we didn't check every integer but instead kept a list with integers up to half of the current).

The fastest algorithm for data sets less than 10,000 seems to be only dividing by primes, even if they are more than half of the current integer due to the time it takes to calculate that on each iteration.  As we can see from the test results it is 7 fold on a set of 1,000 vs 12 fold on a set of 10,000.  Imagine using a set of 10^12 and how much more efficient it would be.

*I haven't provided numbers for the 100,000 on each set because in my earlier tests it took more than 4 hours each.  But we can see from those two tests that at 1000,000 using the half of current restriction then becomes more effecient.

anonymous · Answer

Wow, ohyonghao! That is a very thorough answer :)  Interesting how readability might be inversely related to efficiency. So for a simple program like calculating the 1000th prime, it's fine and good.  But if I wanted to use this inside of a larger program, and possibly make it recursive, then my inefficient script could really bring the larger program to its knees.

anonymous · Answer

For those who are curious my code is 17lines not including the Problem Set header and comments (33 lines with).
I test your code on my system with the following results:
All integers: 0m2.8s, 3m44s

Testing out the 10,000 I found a limitation of your script where it assumes that the answer is within the first part that you give it and you have to do sort of a guess and check for the upper bounds, this would probably be one of the reasons why it would take nealy 50% more time for the 1,000th prime.  Since I used 105,000 for your candidate on the script to test( 10,000 prime is 104,729 ) it was faster than mine using all integers.  Again it has to do with understanding data sets.  My function can calculate (given limitations of using integer on a 64bit system, and time, a resource that is getting sparser every minute) infinite numbers of prime integers.

If you get into doing millions of hashs per second I've seen just changing a bitwise operation (one of the fastest operations you can do in computing) add 3% efficiency.

anonymous · Answer

it looks to me that the program you posted follows the hints in the problemset to a T. I am not sure which recommended methods you are referring to. it is simple and readable and doesn't have unnecessary optimizations (which can make code harder to read). careful when you talk about a generator (in python) - i was expecting to see something specific. here is mine using a generator function (primes): http://pastebin.com/RTXAc9zB http://docs.python.org/tutorial/classes.html#generators

anonymous · Answer

bwCA- There is a lot going on in your code, I'll have to spend some time learning some terminology to follow it. I haven't read about generators yet- I was using it in the sense of the general English definition, not as it relates to Python specifically. My main obstacle right now is syntax (and definitions) I haven't learned yet for stuff I want to do!         

The recommended methods I was referring to was only testing odds & minimizing your divisors by all primes less than half the square root of the candidate.  I ignored that & divided by everything less than the candidate.  My program also tested and confirmed the number 2, which if you followed the directions you would have had to hard code 2 in the list of confirmed primes because it eliminated all even numbers.

anonymous · Answer

yep, i saw that (the 2 thing) and i didn't know that the else clause could be used with a for statement so i learned something - looks like it could be useful http://wiki.python.org/moin/PythonSpeed/PerformanceTips

anonymous · Answer

http://codepad.org/83aHBurX

anonymous · Answer

Thanks for that PerformanceTips link. I changed a lot of things to my piece of code.

If I used a sieve it would shave off much more seconds, but I need to figure out how to create a sieve to find the 'nth prime number instead of a sieve that lasts until the number n (whether it's prime or not)

If you love speed use a modern sieve (like sieve of atkins or better) and a fast language like C, and  you might even find the 13 billionth prime number in a second

anonymous · Answer

agdgdgdgwngo, that's an easy-to-read, concise piece of code.  So maybe readability is only inversely related to efficiency if you don't know what you're doing XD. Now I need to learn what "global" does. I don't know how anyone could move past ps1. I keep tinkering with it. As I learn new terminology, I go back & rework it. I'm sure once I get "global," I'm gonna want to stick that in there, too. Help, I'M stuck in a loop!

anonymous · Answer

Just watch the next series of lectures and do the rest of the problem sets. Later on (in problem set 6 and onwards) the professors will ask you to go crazy on optimization etc, especially in problem set 8.

Also, I didn't like global being in my answer. It's very easy to change that and turn my script into something people can import as a module, albeit a slow one, to simply find the nth prime number.

anonymous · Answer

http://codepad.org/zzBDUz03 You will see that the sieve algorithm is much more efficient

tonybaldwin · Answer

Yeah, it looks like your dividing by EVERY number between 2 and your query, which is less efficient than only using the primes, and eliminating even numbers, frankly. I did find one way to do this without doing the divisions, etc., but NOT in python. In fact, I also used this same method (division/remainder, etc.) in tcl, as it seemed the best. But in bash, I simply factored the number, stuffed it's factors in an array, and tested for a third element in the array. (it gives like 12: 2 2 3, or 7: 7, so, when a prime, as in 7, there are only two values in the array. No third element, the number is prime). I am not aware of a simple method to factor the number in python that would make this a viable solution for this assignment. I demonstrate all three, python, tcl, bash, here: http://baldwinsoftware.com/wiki/pmwiki.php?n=MIT.Ps1a Later I may try to port my solutions to perl, ruby, lisp, as well, like I did with my first assignment (which was, admittedly, much easier).