Maker Pro
Maker Pro

Larkin, Power BASIC cannot be THAT good:

J

Jan Panteltje

The run time in C is 13 seconds here on a 1GHz processor.
Can you specify your 'old HP computer' ?

I can win maybe 1 second by writing the code a bit different.
And a 3GHz would do it in 12 / 4 = 4 seconds...
A bigger cache would help a bit perhaps.

A Cray would be even better.


What does you C code look like? Mine is in the other posting.

Else you goofed a factor 10.

Seems to me anyways :)
 
The run time in C is 13 seconds here on a 1GHz processor.
Can you specify your 'old HP computer' ?

I can win maybe 1 second by writing the code a bit different.
And a 3GHz would do it in 12 / 4 = 4 seconds...
A bigger cache would help a bit perhaps.

A Cray would be even better.

What does you C code look like? Mine is in the other posting.

Else you goofed a factor 10.

Seems to me anyways :)

I just tried in Matlab, on a 2GHz core2-duo with 2GB

with 32bit signed ints: ~2.5 second
with 16bit signed ints: ~1.0 second
with 64bit floats: ~4.0 second

-Lasse
 
T

Tim Williams

The run time in C is 13 seconds here on a 1GHz processor.
Can you specify your 'old HP computer' ?

I can win maybe 1 second by writing the code a bit different.
And a 3GHz would do it in 12 / 4 = 4 seconds...
A bigger cache would help a bit perhaps.

A Cray would be even better.

What does you C code look like? Mine is in the other posting.

Else you goofed a factor 10.

Seems to me anyways :)

Typing the following into Open Watcom,

-=-=-

#include <stdio.h>
#include <stdlib.h>
#include <windows.h>

#define ARRAY_SIZE 64000000

int main(void) {

short *a; int *s; int i;
int startTime, endTime;

a = malloc(ARRAY_SIZE * sizeof(short));
s = malloc(ARRAY_SIZE * sizeof(int));
if (a == NULL || s == NULL) {
printf("Memory allocation failed.\n");
return -1;
}

printf("Starting...\n");
startTime = GetTickCount();

for (i = 0; i < ARRAY_SIZE; i++) {
s += a;
}

endTime = GetTickCount();

printf("Total time taken adding %i array entries: %f seconds.\n",
ARRAY_SIZE, ((float)(endTime - startTime)) / 1000);

free(a); free(s);

return 0;
}

-=-=-

and saving as test.c and compiling, I get the typical output:

-=-=-
E:\WATCOM\Projects>test
Starting...
Total time taken adding 64000000 array entries: 1.453000 seconds.

E:\WATCOM\Projects>test
Starting...
Total time taken adding 64000000 array entries: 1.546000 seconds.
-=-=-

My computer is an Athalon 2500 at 1.66GHz, 1.1GB PC133 RAM (currently
472MB free, so no problems allocating the test), running Windows XP
SP2. Basically state-of-the-art way back in the year 2001. If your
computers are taking more than a couple seconds, either your compiler
really sucks or your computers suck even more. :)

Tim
 
J

Jan Panteltje

I just tried in Matlab, on a 2GHz core2-duo with 2GB

with 32bit signed ints: ~2.5 second
with 16bit signed ints: ~1.0 second
with 64bit floats: ~4.0 second

-Lasse

Yes, what I think happens is that those core2 duo execute those intructions
a lot faster then my Celeron or whatever it is, so that would gain an other
200%, so Larkin's '''Old''' HP' must be a 3 GHz core?

Maybe I should upgrade to a more recent processor, but luckely I do not need to
add 64M integers :)
 
J

Jan Panteltje

The run time in C is 13 seconds here on a 1GHz processor.
Can you specify your 'old HP computer' ?

I can win maybe 1 second by writing the code a bit different.
And a 3GHz would do it in 12 / 4 = 4 seconds...
A bigger cache would help a bit perhaps.

A Cray would be even better.

What does you C code look like? Mine is in the other posting.

Else you goofed a factor 10.

Seems to me anyways :)

Typing the following into Open Watcom,

-=-=-

#include <stdio.h>
#include <stdlib.h>
#include <windows.h>

#define ARRAY_SIZE 64000000

int main(void) {

short *a; int *s; int i;
int startTime, endTime;

a = malloc(ARRAY_SIZE * sizeof(short));
s = malloc(ARRAY_SIZE * sizeof(int));
if (a == NULL || s == NULL) {
printf("Memory allocation failed.\n");
return -1;
}

printf("Starting...\n");
startTime = GetTickCount();

for (i = 0; i < ARRAY_SIZE; i++) {
s += a;
}

endTime = GetTickCount();

printf("Total time taken adding %i array entries: %f seconds.\n",
ARRAY_SIZE, ((float)(endTime - startTime)) / 1000);

free(a); free(s);

return 0;
}

-=-=-

and saving as test.c and compiling, I get the typical output:

-=-=-
E:\WATCOM\Projects>test
Starting...
Total time taken adding 64000000 array entries: 1.453000 seconds.

E:\WATCOM\Projects>test
Starting...
Total time taken adding 64000000 array entries: 1.546000 seconds.
-=-=-

My computer is an Athalon 2500 at 1.66GHz, 1.1GB PC133 RAM (currently
472MB free, so no problems allocating the test), running Windows XP
SP2. Basically state-of-the-art way back in the year 2001. If your
computers are taking more than a couple seconds, either your compiler
really sucks or your computers suck even more. :)

Tim


Tim, you forgot that I was running >2 loops< inside each other, as Larkin's
original post mentions:

for(j = 0; j < 10; j++)
{
for(i = 0; i < BIG_SIZE; i++)
{
mem += b;
}
}

So multiply your result by 10, and you got 15 seconds, even slower then me
on the eeePC with 512 MB ram and 900 MHz celeron in Linux with gcc-4.0 :)
Sorry 'bout that ;-)
 
I

IanM

Jan Panteltje said:
The run time in C is 13 seconds here on a 1GHz processor.
Can you specify your 'old HP computer' ?

I can win maybe 1 second by writing the code a bit different.
And a 3GHz would do it in 12 / 4 = 4 seconds...
A bigger cache would help a bit perhaps.

A Cray would be even better.

What does you C code look like? Mine is in the other posting.

Else you goofed a factor 10.

Seems to me anyways :)

Typing the following into Open Watcom,

-=-=-

#include <stdio.h>
#include <stdlib.h>
#include <windows.h>

#define ARRAY_SIZE 64000000

int main(void) {

short *a; int *s; int i;
int startTime, endTime;

a = malloc(ARRAY_SIZE * sizeof(short));
s = malloc(ARRAY_SIZE * sizeof(int));
if (a == NULL || s == NULL) {
printf("Memory allocation failed.\n");
return -1;
}

printf("Starting...\n");
startTime = GetTickCount();

for (i = 0; i < ARRAY_SIZE; i++) {
s += a;
}

endTime = GetTickCount();

printf("Total time taken adding %i array entries: %f seconds.\n",
ARRAY_SIZE, ((float)(endTime - startTime)) / 1000);

free(a); free(s);

return 0;
}

-=-=-

and saving as test.c and compiling, I get the typical output:

-=-=-
E:\WATCOM\Projects>test
Starting...
Total time taken adding 64000000 array entries: 1.453000 seconds.

E:\WATCOM\Projects>test
Starting...
Total time taken adding 64000000 array entries: 1.546000 seconds.
-=-=-

My computer is an Athalon 2500 at 1.66GHz, 1.1GB PC133 RAM (currently
472MB free, so no problems allocating the test), running Windows XP
SP2. Basically state-of-the-art way back in the year 2001. If your
computers are taking more than a couple seconds, either your compiler
really sucks or your computers suck even more. :)

Tim


Tim, you forgot that I was running >2 loops< inside each other, as
Larkin's
original post mentions:

for(j = 0; j < 10; j++)
{
for(i = 0; i < BIG_SIZE; i++)
{
mem += b;
}
}

So multiply your result by 10, and you got 15 seconds, even slower then me
on the eeePC with 512 MB ram and 900 MHz celeron in Linux with gcc-4.0 :)
Sorry 'bout that ;-)

Just tried Tim's code (with another loop) compiled with Visual C++ 2008
express edition and run on a laptop with a 1.7GHz T2250 and 1GB Ram

2.23s
 
J

Jan Panteltje

Jan Panteltje said:
The run time in C is 13 seconds here on a 1GHz processor.
Can you specify your 'old HP computer' ?

I can win maybe 1 second by writing the code a bit different.
And a 3GHz would do it in 12 / 4 = 4 seconds...
A bigger cache would help a bit perhaps.

A Cray would be even better.

What does you C code look like? Mine is in the other posting.

Else you goofed a factor 10.

Seems to me anyways :)

Typing the following into Open Watcom,

-=-=-

#include <stdio.h>
#include <stdlib.h>
#include <windows.h>

#define ARRAY_SIZE 64000000

int main(void) {

short *a; int *s; int i;
int startTime, endTime;

a = malloc(ARRAY_SIZE * sizeof(short));
s = malloc(ARRAY_SIZE * sizeof(int));
if (a == NULL || s == NULL) {
printf("Memory allocation failed.\n");
return -1;
}

printf("Starting...\n");
startTime = GetTickCount();

for (i = 0; i < ARRAY_SIZE; i++) {
s += a;
}

endTime = GetTickCount();

printf("Total time taken adding %i array entries: %f seconds.\n",
ARRAY_SIZE, ((float)(endTime - startTime)) / 1000);

free(a); free(s);

return 0;
}

-=-=-

and saving as test.c and compiling, I get the typical output:

-=-=-
E:\WATCOM\Projects>test
Starting...
Total time taken adding 64000000 array entries: 1.453000 seconds.

E:\WATCOM\Projects>test
Starting...
Total time taken adding 64000000 array entries: 1.546000 seconds.
-=-=-

My computer is an Athalon 2500 at 1.66GHz, 1.1GB PC133 RAM (currently
472MB free, so no problems allocating the test), running Windows XP
SP2. Basically state-of-the-art way back in the year 2001. If your
computers are taking more than a couple seconds, either your compiler
really sucks or your computers suck even more. :)

Tim


Tim, you forgot that I was running >2 loops< inside each other, as
Larkin's
original post mentions:

for(j = 0; j < 10; j++)
{
for(i = 0; i < BIG_SIZE; i++)
{
mem += b;
}
}

So multiply your result by 10, and you got 15 seconds, even slower then me
on the eeePC with 512 MB ram and 900 MHz celeron in Linux with gcc-4.0 :)
Sorry 'bout that ;-)

Just tried Tim's code (with another loop) compiled with Visual C++ 2008
express edition and run on a laptop with a 1.7GHz T2250 and 1GB Ram

2.23s


Tim's code does not have the 10 x outside loop, so that makes it 22.3 seconds.
 
M

Martin Brown

The only way I can get it to run that slow on my 3GHz old P4 HT chip is
with full debugging enabled and the optimiser completely disabled in MS
C++ Win32 console environment. I would hope for nearly an order of
magnitude faster using SSE.

NB You really should initialise the arrays first.

Regards,
Martin Brown
 
I

IanM

Jan Panteltje said:
Jan Panteltje said:
On a sunny day (Fri, 15 May 2009 09:04:43 -0700 (PDT)) it happened Tim
<[email protected]>:

The run time in C is 13 seconds here on a 1GHz processor.
Can you specify your 'old HP computer' ?

I can win maybe 1 second by writing the code a bit different.
And a 3GHz would do it in 12 / 4 = 4 seconds...
A bigger cache would help a bit perhaps.

A Cray would be even better.

What does you C code look like? Mine is in the other posting.

Else you goofed a factor 10.

Seems to me anyways :)

Typing the following into Open Watcom,

-=-=-

#include <stdio.h>
#include <stdlib.h>
#include <windows.h>

#define ARRAY_SIZE 64000000

int main(void) {

short *a; int *s; int i;
int startTime, endTime;

a = malloc(ARRAY_SIZE * sizeof(short));
s = malloc(ARRAY_SIZE * sizeof(int));
if (a == NULL || s == NULL) {
printf("Memory allocation failed.\n");
return -1;
}

printf("Starting...\n");
startTime = GetTickCount();

for (i = 0; i < ARRAY_SIZE; i++) {
s += a;
}

endTime = GetTickCount();

printf("Total time taken adding %i array entries: %f seconds.\n",
ARRAY_SIZE, ((float)(endTime - startTime)) / 1000);

free(a); free(s);

return 0;
}

-=-=-

and saving as test.c and compiling, I get the typical output:

-=-=-
E:\WATCOM\Projects>test
Starting...
Total time taken adding 64000000 array entries: 1.453000 seconds.

E:\WATCOM\Projects>test
Starting...
Total time taken adding 64000000 array entries: 1.546000 seconds.
-=-=-

My computer is an Athalon 2500 at 1.66GHz, 1.1GB PC133 RAM (currently
472MB free, so no problems allocating the test), running Windows XP
SP2. Basically state-of-the-art way back in the year 2001. If your
computers are taking more than a couple seconds, either your compiler
really sucks or your computers suck even more. :)

Tim

Tim, you forgot that I was running >2 loops< inside each other, as
Larkin's
original post mentions:

for(j = 0; j < 10; j++)
{
for(i = 0; i < BIG_SIZE; i++)
{
mem += b;
}
}

So multiply your result by 10, and you got 15 seconds, even slower then
me
on the eeePC with 512 MB ram and 900 MHz celeron in Linux with gcc-4.0
:)
Sorry 'bout that ;-)

Just tried Tim's code (with another loop) compiled with Visual C++ 2008
express edition and run on a laptop with a 1.7GHz T2250 and 1GB Ram

2.23s


Tim's code does not have the 10 x outside loop, so that makes it 22.3
seconds.


(with another loop) mean't I added the 10 * loop so still 2.23s
 
J

Jan Panteltje

The only way I can get it to run that slow on my 3GHz old P4 HT chip is
with full debugging enabled and the optimiser completely disabled in MS
C++ Win32 console environment. I would hope for nearly an order of
magnitude faster using SSE.

Cool, I just tried with gcc -O4 and it runs in .56 the time :)


NB You really should initialise the arrays first.

That will not affect timing of the loop.
 
J

Jan Panteltje

Jan Panteltje said:
On a sunny day (Fri, 15 May 2009 09:04:43 -0700 (PDT)) it happened Tim
<[email protected]>:

The run time in C is 13 seconds here on a 1GHz processor.
Can you specify your 'old HP computer' ?

I can win maybe 1 second by writing the code a bit different.
And a 3GHz would do it in 12 / 4 = 4 seconds...
A bigger cache would help a bit perhaps.

A Cray would be even better.

What does you C code look like? Mine is in the other posting.

Else you goofed a factor 10.

Seems to me anyways :)

Typing the following into Open Watcom,

-=-=-

#include <stdio.h>
#include <stdlib.h>
#include <windows.h>

#define ARRAY_SIZE 64000000

int main(void) {

short *a; int *s; int i;
int startTime, endTime;

a = malloc(ARRAY_SIZE * sizeof(short));
s = malloc(ARRAY_SIZE * sizeof(int));
if (a == NULL || s == NULL) {
printf("Memory allocation failed.\n");
return -1;
}

printf("Starting...\n");
startTime = GetTickCount();

for (i = 0; i < ARRAY_SIZE; i++) {
s += a;
}

endTime = GetTickCount();

printf("Total time taken adding %i array entries: %f seconds.\n",
ARRAY_SIZE, ((float)(endTime - startTime)) / 1000);

free(a); free(s);

return 0;
}

-=-=-

and saving as test.c and compiling, I get the typical output:

-=-=-
E:\WATCOM\Projects>test
Starting...
Total time taken adding 64000000 array entries: 1.453000 seconds.

E:\WATCOM\Projects>test
Starting...
Total time taken adding 64000000 array entries: 1.546000 seconds.
-=-=-

My computer is an Athalon 2500 at 1.66GHz, 1.1GB PC133 RAM (currently
472MB free, so no problems allocating the test), running Windows XP
SP2. Basically state-of-the-art way back in the year 2001. If your
computers are taking more than a couple seconds, either your compiler
really sucks or your computers suck even more. :)

Tim

Tim, you forgot that I was running >2 loops< inside each other, as
Larkin's
original post mentions:

for(j = 0; j < 10; j++)
{
for(i = 0; i < BIG_SIZE; i++)
{
mem += b;
}
}

So multiply your result by 10, and you got 15 seconds, even slower then
me
on the eeePC with 512 MB ram and 900 MHz celeron in Linux with gcc-4.0
:)
Sorry 'bout that ;-)


Just tried Tim's code (with another loop) compiled with Visual C++ 2008
express edition and run on a laptop with a 1.7GHz T2250 and 1GB Ram

2.23s


Tim's code does not have the 10 x outside loop, so that makes it 22.3
seconds.


(with another loop) mean't I added the 10 * loop so still 2.23s


Yea, I am down to 7 seconds now compiling with -O4 on my eeePC.
 
J

Jan Panteltje

The run time in C is 13 seconds here on a 1GHz processor.
Can you specify your 'old HP computer' ?

I can win maybe 1 second by writing the code a bit different.
And a 3GHz would do it in 12 / 4 = 4 seconds...
A bigger cache would help a bit perhaps.

A Cray would be even better.


What does you C code look like? Mine is in the other posting.

Else you goofed a factor 10.

Seems to me anyways :)


Here's my PowerBasic code:

===================================================
#COMPILE EXE

' SUM.BAS
' TRY SUMMING A LOT OF INTS INTO AN ARRAY OF LONGS...

' JL MAY 14, 2009 PBCC4

FUNCTION PBMAIN () AS LONG

COLOR 15,9
CLS

DIM A(64000000) AS INTEGER ' INPUT ADC SAMPLES
DIM S(64000000) AS LONG ' SUMMING ARRAY
DIM X AS LONG
DIM Y AS LONG
DIM Z AS LONG

' INIT INPUT ARRAY TO RANDOM-ISH VALUES...

FOR X = 1 TO 64000000 ' THIS IS MUCH FASTER
A(X) = X AND 32767 ' THAN CALLING RND()!
NEXT

T! = TIMER

PRINT "Start... ";

FOR Y = 1 TO 10

FOR X = 1 TO 64000000
S(X) = S(X) + A(X)
NEXT

NEXT

PRINT "Done"

E! = TIMER - T!
PRINT USING$("Time per loop ##.### sec ##.## ns/add", E!/10, 1E9*E!/(10*64E6))
PRINT

' DISPLAY SOME RESULTS TO MAKE SURE IT REALLY WORKED...

FOR X = 1 TO 10
PRINT X, A(X), S(X)
NEXT

PRINT

FOR X = 63999001 TO 63999010
PRINT X, A(X), S(X)
NEXT

INPUT A$


END FUNCTION


===================================================

On my computer, a 1.9 GHz Xeon with 2G ram, winXP, I get this
result...

Start... Done
Time per loop 0.231 sec 3.61 ns/add

1 1 10
2 2 20
3 3 30
4 4 40
5 5 50
6 6 60
7 7 70
8 8 80
9 9 90
10 10 100

63999001 3097 30970
63999002 3098 30980
63999003 3099 30990
63999004 3100 31000
63999005 3101 31010
63999006 3102 31020
63999007 3103 31030
63999008 3104 31040
63999009 3105 31050
63999010 3106 31060

===================================================

One of my guys did a C version (I refuse to program in C) to run on
the Kontron under Linux, a slightly slower CPU, 2G ram. I asked him
for his source code, and he spent about a half hour cleaning it up to
be presentable... which I asked him NOT to do. Anyhow, here it is:

* mathsmash.c - a VERY crude benchmark
*
* time the sum of 64-million 16-bit integers into 64-million 32-integer sums.
*
* gcc -O3 mathsmash.c -o mathsmash.o
*
* NOTE: The loop is performed 10 times to make the measurement duration more reasonable.
*
* Timing is done by observation or including the system("date") functions.
*
*
*/

#define SIXTYFOURMILLION (0x100000 * 64)
#define DATA_ARRAY_SIZE SIXTYFOURMILLION

#include <stdio.h>

int main()
{
unsigned short *inbound_data;

unsigned int *sum_data;


int multiply;

unsigned long index = 0;

#if 0
/* Initialize data */
printf ("Zeroing data\n");
#endif

inbound_data = (unsigned short *) malloc (sizeof ( short ) *
DATA_ARRAY_SIZE);
sum_data = (unsigned *) malloc ((sizeof ( int )) *
DATA_ARRAY_SIZE);

printf ("inb_ptr = 0x%08x, sum_ptr= 0x%08x\n", inbound_data,
sum_data);

printf ("\n START sum operation...\n");


// system ("date");

for (multiply = 0; multiply < 10; multiply ++) // 10 x
{
for ( index = 0; index < DATA_ARRAY_SIZE; ++index )
sum_data[index] += inbound_data[index];
}

printf ("\n END sum operation...\n");

// system ("date");


}

===================================================

He commented out the system date things because they're buggy or
something, and timed it with his wristwatch at about 0.25 seconds per
64M add, about the same as the PowerBasic.

He used subscripts, not pointers, as I did. The inner loop compiles to
five instructions.

My program is prettier.

John

Thank you for that code John, but unfortunately I do not have Power BASIC.
But you did mention you tried it in C.
I wonder if compiling your C code with -O4 in Linux would make it faster then
the power BASIC version, as it gains 2x speed here.
 
J

Jan Panteltje

Forget my remark about -O4, runs the same with your C code as -O3, about 7 seconds on my eeePC.
 
T

Tim Williams

In other languages--

Compiled in FreeBASIC version 0.17b.

-=-=-
'$DYNAMIC
DIM a(64000000) AS SHORT
DIM s(64000000) AS INTEGER
DIM i AS INTEGER

Start! = TIMER
FOR i = 0 TO 63999999
s(i) += a(i)
NEXT

PRINT USING "One pass in ##.### seconds."; TIMER - Start!;
-=-=-

Typical output:

E:\PROGRA~1\FreeBASIC>test
One pass in 1.439 seconds.
E:\PROGRA~1\FreeBASIC>test
One pass in 1.818 seconds.

So it offers fairly similar performance. I would suppose PowerBasic
is also comparable.

Now, I could write the 16 bit assembly version and test that, too, but
the 16 bit part would be kind of tricky given the dataset size. ;-) I
could test a million loop iterations, but the whole thing would still
fit inside processor cache, so it's not a fair comparison.

Tim
 
J

Jan Panteltje

In other languages--

Compiled in FreeBASIC version 0.17b.

-=-=-
'$DYNAMIC
DIM a(64000000) AS SHORT
DIM s(64000000) AS INTEGER
DIM i AS INTEGER

Start! = TIMER
FOR i = 0 TO 63999999
s(i) += a(i)
NEXT

PRINT USING "One pass in ##.### seconds."; TIMER - Start!;
-=-=-

Typical output:

E:\PROGRA~1\FreeBASIC>test
One pass in 1.439 seconds.
E:\PROGRA~1\FreeBASIC>test
One pass in 1.818 seconds.

So it offers fairly similar performance. I would suppose PowerBasic
is also comparable.

Now, I could write the 16 bit assembly version and test that, too, but
the 16 bit part would be kind of tricky given the dataset size. ;-) I
could test a million loop iterations, but the whole thing would still
fit inside processor cache, so it's not a fair comparison.

Tim

Ah, a free-basic!
I did a google, and downloaded the Linux version.
But:
test9.bas(1) error 135: Only valid in -lang deprecated or fblite or qb, found 'DYNAMIC' in ''$DYNAMIC'
test9.bas(2) warning 24(2): Array too large for stack, consider making it var-len or SHARED
test9.bas(3) warning 24(2): Array too large for stack, consider making it var-len or SHARED
test9.bas(6) error 137: Suffixes are only valid in -lang deprecated or fblite or qb, found 'Start' in 'Start! =3D TIMER'
test9.bas(7) error 12: Expected 'TO' in 'FOR i =3D 0 TO 63999999'
test9.bas(7) error 3: Expected End-of-Line, found 'TO' in 'FOR i =3D 0 TO 63999999'
test9.bas(8) error 3: Expected End-of-Line, found 'a' in 's(i) +=3D a(i)'
test9.bas(11) error 137: Suffixes are only valid in -lang deprecated or fblite or qb, found 'Start' in 'PRINT USING "One pass in ##.### seconds."; TIMER - Start!;'

Not enough memory (385 MB) on this PC.

Nice BASIC anyways.

DIM i AS INTEGER

FOR i = 0 TO 10
print "HELLO"
NEXT

grml: ~ # fbc test10.bas

grml: ~ # ./test10
HELLO
HELLO
HELLO
HELLO
HELLO
HELLO
HELLO
HELLO
HELLO
HELLO
HELLO

LOL
I once had some BASIC programs for inductors and stuff...
Maybe all gone, CP/M and Sinclair times..



WOW! Have not used BASIC is ages...
 
J

John Devereux

Jan Panteltje said:
Forget my remark about -O4, runs the same with your C code as -O3, about 7 seconds on my eeePC.

I suspect this is all limited by the system memory bandwidth.

So that as long as the data sizes are the same, everything that is at
all efficient will produce the same results.
 
J

Jan Panteltje

He used -O3, whatever that means.

From 'man gcc'
-O3 Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops and -fgcse-after-reload options.


-O4 seems to have disappeared from my man gcc, but gcc accepts it nevertheless.
IIRC that was the most severe for speed optimisation known, but I could be wrong.
I used tgcc-4.0, but also have 2.95 and 3.3 o nthsi system, some things
may have changed and not the mamual... it does exists, people talk about it too,
http://gcc.gnu.org/cgi-bin/search.cgi?q=-O4&cmd=Search&m=all&s=DRP

Is that 7 seconds to add the array once, or for 10 times?

No, your code, or his code rather then, 10 x loop.

So, on a Celeron 900 MHz with 512 MB RAM, but this Celeron is clocked down to 670 or something
like that on the eeePC to save power....
So if you multiply 670 * (7 / 2.2) = 2.13 GHz if the processor worked the same.
I think that is about in the same ballpark as you have.
I must admit pretty good for power-BASIC.
 
N

Nobody

The run time in C is 13 seconds here on a 1GHz processor.
Can you specify your 'old HP computer' ?

I can win maybe 1 second by writing the code a bit different.
And a 3GHz would do it in 12 / 4 = 4 seconds...
A bigger cache would help a bit perhaps.

For adding arrays, memory bandwidth will be the dominant factor. The ALU
will spend most of its time idle, waiting upon memory I/O.

And if you don't have 512MB of RAM (64M * 2 * 4), then you're going to
be swapping, which will totally kill performance.
 
J

Jan Panteltje

For adding arrays, memory bandwidth will be the dominant factor. The ALU
will spend most of its time idle, waiting upon memory I/O.

And if you don't have 512MB of RAM (64M * 2 * 4), then you're going to
be swapping, which will totally kill performance.

../test2
memory needed=384 MB

This because it is (64M * 4) + (64M * 2).
From in C:
fprintf(stderr, "memory needed=%d MB\n",
( (BIG_SIZE * sizeof(int32_t) ) + (BIG_SIZE * sizeof(int16_t) ) ) / 1000000 );

Plus you need some for the OS and loaded modules, but not much in Linux.
 
T

Tim Williams

Ah, a free-basic!
I did a google, and downloaded the Linux version.
But:
test9.bas(1) error 135: Only valid in -lang deprecated or fblite or qb, found 'DYNAMIC' in ''$DYNAMIC'

Ah yes, I think I use -lang qb, something like that. Call me old
skool. :p

Without the dynamic allocation, the rest of the errors figure...
found 'Start' in 'Start! =3D TIMER'

Say, what's the "3D" in this showing up for? Did you copy&paste
wrong, or did my message get encoded stupid, or did yours?
WOW! Have not used BASIC is ages...

I use it fairly regularly. Despite being DOS, an exorbitantly
inefficient environment, it still runs on XP, and does things that (in
lieu of a similarly powerful scripting language) are satisfactory for
my purposes. I could do C, but I don't know it nearly as well as I
know QuickBasic.

Tim
 
Top