Can anyone help me find a way to write this program more efficiently? The algorithm I have works, but it doesn't work for the integer bound of 2^31-1. (I reach the CPU limit) It consumes a positive integer x and produces the minimal number of squares required to write x as a sum of squares (i.e., it produces a number between 1 and 4). To access the (up to) four numbers whose squares sum to x, after calling sumofquares(x) you use getsquare which consumes an integer i and returns the ith number whose square is summed in the sum of squares.
Since it's only a positive int, you could make it unsigned to extend the range to 2^32. If that's not enough, long long will give you 64 bits of room. If you need more than that, you'll have to either write some code to work with large numbers or use an existing library for arbitrary precision math such as the GMP ( http://en.wikipedia.org/wiki/GNU_Multi-Precision_Library).
// Implementation of sumofsquares.h #include <stdio.h> #include <limits.h> #include "sumofsquares.h" // Function isqrt(n) consumes a positive int n and produces // and int x such that x*x<=n and (x+1)*(x+1)>n // Implementation uses a simple Newton iteration int isqrt (int n){ int x=n+1,y; if (n==1) return 1; else if (n==0) return 0; else if (n<0) return -1; else if (n==INT_MAX) return 46340; for (y=1; y*y<=n && (y<= (1<<15)); y=2*y); while (y<x) { x = y; y = (y+n/y)/2; } return x; } /* sumofsquares(int x) consumes x and produces the minimum number m such that x can be written as as sum of squares of m integers. */ int a1 = 0; int a2 = 0; int a3 = 0; int a4 = 0; int sumofsquares(int x) { int sqrtx = isqrt(x); for (a1 = 0; a1 <= sqrtx; a1++) { for (a2 = 0; a2 <= sqrtx; a2++) { for (a3 = 0; a3 <= sqrtx; a3++) { a4=isqrt(x-a1*a1-a2*a2-a3*a3); if (a1*a1+a2*a2+a3*a3+a4*a4 == x) { if(a3 == 0) return 1; if(a2 == 0) return 2; if(a1 == 0) return 3; else return 4; } } } } return 0; } /* Precondition: Assume sumofsquares(x) has been called and returned m so that x = a1^2+...+am^2 The function getsquare consumes i and produces ai in the above list */ int getsquare(int i) { if(i == 1) return a1; if(i == 2) return a2; if(i == 3) return a3; else return a4; } int main(void){ int number; printf("Enter a positve integer:\n"); while (1 == scanf("%d",&number)) { printf("Sum of squares for %d has %d numbers\n", number, sumofsquares(number)); printf("which are %d %d %d %d\n", getsquare(1), getsquare(2), getsquare(3), getsquare(4)); } return 0; }
the other code had some problems with it when I was trying to up effieciency. I need this to work for ints 2^31 -1 without long or longlong or unsigning it. The real problem is that at large ints the program reaches the availible CPU limit. This is because the algorthm is O(n^3) right now I think.
I included the main function which I use to test it
Oh, you're looking for optimization? Let me take a peek. There are lots of small optimizations you could perform, but you're right to look at algorithmic complexity first.
yeah, no matter what I do I can't figure out how to do it without the nested loops
and I haven't done a whole lot with C yet so I don't know how to use arrays. I just started learning about pointers and structs. I only know Scheme before I started learning C
There used to be another loop for a4, but I got rid of it with the isqrt which I think makes it faster.
Yeah, so right now you're running at the maximum sqrt(2^31-1)^3, so is roughly 9.9*10^13 iterations, which is pretty massive. Removing one of the loops is definitely the right thing to do in this case. The call to isqrt does n iterations with a division in them, but in this case the n is dependent on the loop variables which are linearly increasing, so the isqrt runtime in that loop will have a runtime of something like O(log n) but that's just a guess. The first question would be, can you remove another loop in the same fashion?
I don't see how though, really.
I tried doing that too. I had something like a1=isqrt(x); a2=isqrt(x-a1*a1); a3=isqrt(x-a1*a1-a2*a2); a4=isqrt(x-a1*a1-a2*a2-a3*a3); then checked to see if it added up to the right number. The problem is it doesn't always produce the minimum amount of squares. for example it would say that 310 is 17 4 2 1 while a smaller one is 2 9 15
this way basically goes from the biggest square that fits in the number. But that's not always the one you need.
Yeah, you don't have any information a priori about which numbers will fit the condition. There may be a specific algorithm for precisely this problem, but I'm not familiar with one off the top of my head. If you can't reduce algorithmic complexity any further, you could reduce the computational load in the inner loop. For example, you could replace a1*a1 with a variable a1Sqr you calculate only inside of the first loop. Do the same for a2*a2, and you'll have saved several trillion multiplications already.
yeah there was a complicated algorithm with prime decomposition I found online, but I couldn't make sense of it. removing the multiplications seems like a good idea
That's a relatively small optimization though considering there's not much computation going on in the loop. Reducing the number of iterations would be a much safer bet. The overhead of the loops (each of those is a branching instruction) is probably just as large as the actual computation in the loop. You could catch the return 0 case earlier by early exiting: after incrementing a1, check if a1*a1 is already >x. If that's the case, the result of a1*a1+a2*a2+a3*a3+a4*a4 will always be >x, and you'll always return 0 (because the result of the calculation will only get larger with more iterations). The same can be done for the other variables. That only helps you exit early in the case where you won't be able to find a sum of squares, though.
well there is apparently some proof that all numbers can be written as the sum of 4 squares so it should always find one and I alread have the condition that a1 <= sqrt(x) so a1*a1 will never be larger
Ah, yes. Didn't catch that - that, I think, is the also the key. Because that also means, that for a2 and a3 (and a4, if that loop was still there), you'll never have to search through the entire space. If a1==5, for example, a2*a2+a3*a3+a4*a4 must be <= x-25. So you could reduce the maximum for each nested loop based on the current iterator value of the parent loop.
I think I had that in before, but then it went slower because I was adding additional calculations on each iteration
but I guess I could do it in the parent loop too
Yeah, you can do that for a2 and a3. I'm not sure how much it will help, because it's quite possible that you'll reach the exit condition in the inner loop before a2 or a3 come even close to being sqrtx.
Also make sure you build this with compiler optimizations turned on. There's potentially quite a bit to be gained just from that ;)
I wish I could, but I don't think I'm allowed. They are really pushing the efficiency thing. The due date has actually already passed for this though. It's hust really frusterating that I couldn't figure it out.
Don't feel too bad - it's a tricky problem. I'll stew on it a little longer, there's probably a simple solution we're just not seeing.
That's what I think too, I've asked a few of my classmates and they couldn't figure it out either. My prof said that it should be pretty easy. They provided the sqrt function so I'm thinking that is an important factor. This is what I've got so far: int sumofsquares(int x) { int sqrtx = isqrt(x); for (a1 = 0; a1 <= sqrtx; a1++) { int a1sqr = a1*a1; int limit1 = isqrt(x-a1sqr); for (a2 = 0; a2 <= limit1; a2++) { int a2sqr = a2*a2; int limit2 = isqrt(x-a1sqr-a1sqr); for (a3 = 0; a3 <= limit2; a3++) { int a3sqr = a3*a3; a4=isqrt(x-a1sqr-a2sqr-a3sqr); if (a1sqr+a2sqr+a3sqr+a4*a4 == x) { if(a3 == 0) return 1; if(a2 == 0) return 2; if(a1 == 0) return 3; else return 4; } } } } return 0; }
Thank you very much for the help by the way
Not a problem!
Join our real-time social learning platform and learn together with your friends!