Can anyone help me find a way to write this program more efficiently? The algorithm I have works, but it doesn't work for the integer bound of 2^31-1. (I reach the CPU limit)

It consumes a positive integer x and produces the minimal number of squares required to write x as a sum of squares (i.e., it produces a number between 1 and 4).

To access the (up to) four numbers whose squares sum to x, after calling sumofquares(x)
you use getsquare which consumes an integer i and returns the ith number whose square is summed in the sum of squares.

Question

Can anyone help me find a way to write this program more efficiently? The algorithm I have works, but it doesn't work for the integer bound of 2^31-1. (I reach the CPU limit)

It consumes a positive integer x and produces the minimal number of squares required to write x as a sum of squares (i.e., it produces a number between 1 and 4).

To access the (up to) four numbers whose squares sum to x, after calling sumofquares(x) 
you use getsquare which consumes an integer i and returns the ith number whose square is summed in the sum of squares.

anonymous · Answer

Since it's only a positive int, you could make it unsigned to extend the range to 2^32. If that's not enough, long long will give you 64 bits of room. If you need more than that, you'll have to either write some code to work with large numbers or use an existing library for arbitrary precision math such as the GMP ( http://en.wikipedia.org/wiki/GNU_Multi-Precision_Library).

anonymous · Answer

// Implementation of sumofsquares.h #include #include #include "sumofsquares.h" // Function isqrt(n) consumes a positive int n and produces // and int x such that x*x<=n and (x+1)*(x+1)>n // Implementation uses a simple Newton iteration int isqrt (int n){ int x=n+1,y; if (n==1) return 1; else if (n==0) return 0; else if (n<0) return -1; else if (n==INT_MAX) return 46340; for (y=1; y*y<=n && (y<= (1<<15)); y=2*y); while (y

anonymous · Answer

the other code had some problems with it when I was trying to up effieciency. I need this to work for ints 2^31 -1 without long or longlong or unsigning it. The real problem is that at large ints the program reaches the availible CPU limit. This is because the algorthm is O(n^3) right now I think.

anonymous · Answer

I included the main function which I use to test it

anonymous · Answer

Oh, you're looking for optimization?
Let me take a peek. There are lots of small optimizations you could perform, but you're right to look at algorithmic complexity first.

anonymous · Answer

yeah, no matter what I do I can't figure out how to do it without the nested loops

anonymous · Answer

and I haven't done a whole lot with C yet so I don't know how to use arrays. I just started learning about pointers and structs. I only know Scheme before I started learning C

anonymous · Answer

There used to be another loop for a4, but I got rid of it with the isqrt which I think makes it faster.

anonymous · Answer

Yeah, so right now you're running at the maximum sqrt(2^31-1)^3, so is roughly 9.9*10^13 iterations, which is pretty massive. Removing one of the loops is definitely the right thing to do in this case. The call to isqrt does n iterations with a division in them, but in this case the n is dependent on the loop variables which are linearly increasing, so the isqrt runtime in that loop will have a runtime of something like O(log n) but that's just a guess.
The first question would be, can you remove another loop in the same fashion?

anonymous · Answer

I don't see how though, really.

anonymous · Answer

I tried doing that too. I had something like
 a1=isqrt(x);
 a2=isqrt(x-a1*a1);
 a3=isqrt(x-a1*a1-a2*a2);
 a4=isqrt(x-a1*a1-a2*a2-a3*a3);

then checked to see if it added up to the right number. The problem is it doesn't always produce the minimum amount of squares. for example it would say that 310 is 17 4 2 1 while a smaller one is 2 9 15

anonymous · Answer

this way basically goes from the biggest square that fits in the number. But that's not always the one you need.

anonymous · Answer

Yeah, you don't have any information a priori about which numbers will fit the condition.  There may be a specific algorithm for precisely this problem, but I'm not familiar with one off the top of my head.

If you can't reduce algorithmic complexity any further, you could reduce the computational load in the inner loop. For example, you could replace a1*a1 with a variable a1Sqr you calculate only inside of the first loop. Do the same for a2*a2, and you'll have saved several trillion multiplications already.

anonymous · Answer

yeah there was a complicated algorithm with prime decomposition I found online, but I couldn't make sense of it. removing the multiplications seems like a good idea

anonymous · Answer

That's a relatively small optimization though considering there's not much computation going on in the loop. Reducing the number of iterations would be a much safer bet.  The overhead of the loops (each of those is a branching instruction) is probably just as large as the actual computation in the loop.
You could catch the return 0 case earlier by early exiting: after incrementing a1, check if a1*a1 is already >x. If that's the case, the result of a1*a1+a2*a2+a3*a3+a4*a4 will always be >x, and you'll always return 0 (because the result of the calculation will only get larger with more iterations). The same can be done for the other variables.
That only helps you exit early in the case where you won't be able to find a sum of squares, though.

anonymous · Answer

well there is apparently some proof that all numbers can be written as the sum of 4 squares so it should always find one and I alread have the condition that a1 <= sqrt(x) so a1*a1 will never be larger

anonymous · Answer

Ah, yes. Didn't catch that - that, I think, is the also the key.
Because that also means, that for a2 and a3 (and a4, if that loop was still there), you'll never have to search through the entire space. If a1==5, for example, a2*a2+a3*a3+a4*a4 must be <= x-25. So you could reduce the maximum for each nested loop based on the current iterator value of the parent loop.

anonymous · Answer

I think I had that in before, but then it went slower because I was adding additional calculations on each iteration

anonymous · Answer

but I guess I could do it in the parent loop too

anonymous · Answer

Yeah, you can do that for a2 and a3. I'm not sure how much it will help, because it's quite possible that you'll reach the exit condition in the inner loop before a2 or a3 come even close to being sqrtx.

anonymous · Answer

Also make sure you build this with compiler optimizations turned on. There's potentially quite a bit to be gained just from that ;)

anonymous · Answer

I wish I could, but I don't think I'm allowed. They are really pushing the efficiency thing. The due date has actually already passed for this though. It's hust really frusterating that I couldn't figure it out.

anonymous · Answer

Don't feel too bad - it's a tricky problem. I'll stew on it a little longer, there's probably a simple solution we're just not seeing.

anonymous · Answer

That's what I think too, I've asked a few of my classmates and they couldn't figure it out either. My prof said that it should be pretty easy. They provided the sqrt function so I'm thinking that is an important factor.

This is what I've got so far:
int sumofsquares(int x) {
   int sqrtx = isqrt(x);
   for (a1 = 0; a1 <= sqrtx; a1++) {
     int a1sqr = a1*a1;
     int limit1 = isqrt(x-a1sqr);
     
     for (a2 = 0; a2 <= limit1; a2++) {
        int a2sqr = a2*a2;
        int limit2 = isqrt(x-a1sqr-a1sqr);
        
        for (a3 = 0; a3 <= limit2; a3++) {
           int a3sqr = a3*a3;
           a4=isqrt(x-a1sqr-a2sqr-a3sqr);
           
             if (a1sqr+a2sqr+a3sqr+a4*a4 == x) {
               if(a3 == 0)
                return 1;
               if(a2 == 0)
                return 2;
               if(a1 == 0)
                return 3;
               else
                return 4;
            }
          }    
        }
      }
  return 0;
}

anonymous · Answer

Thank you very much for the help by the way

anonymous · Answer

Not a problem!