J
Jan Panteltje
runtime in C on a 2.4Ghz "intel core2 duo" machine is about
3 seconds. about 2 when multithreaded.
dunno about that, those beasts are maninly focussed on floating point
performance, and this is an integer problem.
Cray was (maybe still is?) a vector processing hardware platform.
With vector processing I mean this:
memory
add memory
memory
address counter
So basically clock from one memory into the next via the adder.
That is very very fast for huge data sets that all need the same operation.
I have done some hardware like that
No memory has 'float', it is all integers of some size
What does you C code look like? Mine is in the other posting.
here's the multithreaded one I used
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <pthread.h>
static long S[64000000];
static short A[64000000];
typedef struct argstype
{
short *a_start;
long *s_start;
long count;
} argstype;
static void *thread_func(void *vptr_args)
{
argstype *arg=vptr_args;
{
long *sp=arg->s_start;
short *ap=arg->a_start;
long x=arg->count;
do
*sp++ += *ap++;
while (--x);
}
return NULL;
}
int main(int argc,char **argv)
{
long y,n;
n=10;
if(argc>1)
n=atoi(argv[1]);
for(y=0;y<n;++y)
{
argstype instance_a={A,S,32000000};
argstype instance_b={A+32000000,S+32000000,32000000};
pthread_t thread_a;
// start a thread to process half the data
if (pthread_create(&thread_a, NULL, thread_func, &instance_a) != 0)
{
perror("pthread_create");
return EXIT_FAILURE;
}
// process the other half in the foreground
thread_func(&instance_b);
// wait for the thread to finish.
if (pthread_join(thread_a, NULL) != 0)
{
perror("pthread_join");
return EXIT_FAILURE;
}
}
return 0;
}
OK, I tried that, gcc -o test10 test10.c -lpthread
but it wont run on the eeePC, it simply says:
Killed
Looks like 512 MB is not enough
And only one core of course.
On an other PC with 385964k RAM it starts swapping:
~# nice -n -19 ./test10
nice -n -19 ./test10 19.25s user 0.63s system 99% cpu 20.046 total