On Fri, 15 May 2009 14:34:51 GMT, Jan Panteltje
The run time in C is 13 seconds here on a 1GHz processor.
Can you specify your 'old HP computer' ?
I can win maybe 1 second by writing the code a bit different.
And a 3GHz would do it in 12 / 4 = 4 seconds...
A bigger cache would help a bit perhaps.
A Cray would be even better.
What does you C code look like? Mine is in the other posting.
Else you goofed a factor 10.
Seems to me anyways
Here's my PowerBasic code:
===================================================
#COMPILE EXE
' SUM.BAS
' TRY SUMMING A LOT OF INTS INTO AN ARRAY OF LONGS...
' JL MAY 14, 2009 PBCC4
FUNCTION PBMAIN () AS LONG
COLOR 15,9
CLS
DIM A(64000000) AS INTEGER ' INPUT ADC SAMPLES
DIM S(64000000) AS LONG ' SUMMING ARRAY
DIM X AS LONG
DIM Y AS LONG
DIM Z AS LONG
' INIT INPUT ARRAY TO RANDOM-ISH VALUES...
FOR X = 1 TO 64000000 ' THIS IS MUCH FASTER
A(X) = X AND 32767 ' THAN CALLING RND()!
NEXT
T! = TIMER
PRINT "Start... ";
FOR Y = 1 TO 10
FOR X = 1 TO 64000000
S(X) = S(X) + A(X)
NEXT
NEXT
PRINT "Done"
E! = TIMER - T!
PRINT USING$("Time per loop ##.### sec ##.## ns/add", E!/10, 1E9*E!/(10*64E6))
PRINT
' DISPLAY SOME RESULTS TO MAKE SURE IT REALLY WORKED...
FOR X = 1 TO 10
PRINT X, A(X), S(X)
NEXT
PRINT
FOR X = 63999001 TO 63999010
PRINT X, A(X), S(X)
NEXT
INPUT A$
END FUNCTION
===================================================
On my computer, a 1.9 GHz Xeon with 2G ram, winXP, I get this
result...
Start... Done
Time per loop 0.231 sec 3.61 ns/add
1 1 10
2 2 20
3 3 30
4 4 40
5 5 50
6 6 60
7 7 70
8 8 80
9 9 90
10 10 100
63999001 3097 30970
63999002 3098 30980
63999003 3099 30990
63999004 3100 31000
63999005 3101 31010
63999006 3102 31020
63999007 3103 31030
63999008 3104 31040
63999009 3105 31050
63999010 3106 31060
===================================================
One of my guys did a C version (I refuse to program in C) to run on
the Kontron under Linux, a slightly slower CPU, 2G ram. I asked him
for his source code, and he spent about a half hour cleaning it up to
be presentable... which I asked him NOT to do. Anyhow, here it is:
* mathsmash.c - a VERY crude benchmark
*
* time the sum of 64-million 16-bit integers into 64-million 32-integer sums.
*
* gcc -O3 mathsmash.c -o mathsmash.o
*
* NOTE: The loop is performed 10 times to make the measurement duration more reasonable.
*
* Timing is done by observation or including the system("date") functions.
*
*
*/
#define SIXTYFOURMILLION (0x100000 * 64)
#define DATA_ARRAY_SIZE SIXTYFOURMILLION
#include <stdio.h>
int main()
{
unsigned short *inbound_data;
unsigned int *sum_data;
int multiply;
unsigned long index = 0;
#if 0
/* Initialize data */
printf ("Zeroing data\n");
#endif
inbound_data = (unsigned short *) malloc (sizeof ( short ) *
DATA_ARRAY_SIZE);
sum_data = (unsigned *) malloc ((sizeof ( int )) *
DATA_ARRAY_SIZE);
printf ("inb_ptr = 0x%08x, sum_ptr= 0x%08x\n", inbound_data,
sum_data);
printf ("\n START sum operation...\n");
// system ("date");
for (multiply = 0; multiply < 10; multiply ++) // 10 x
{
for ( index = 0; index < DATA_ARRAY_SIZE; ++index )
sum_data[index] += inbound_data[index];
}
printf ("\n END sum operation...\n");
// system ("date");
}
===================================================
He commented out the system date things because they're buggy or
something, and timed it with his wristwatch at about 0.25 seconds per
64M add, about the same as the PowerBasic.
He used subscripts, not pointers, as I did. The inner loop compiles to
five instructions.
My program is prettier.
John