Larkin, Power BASIC cannot be THAT good:

Clifford Heath · May 22, 2009

JosephKK said:
What languages, besides Ada, do have constructs that deal with
concurency? And it must be a language construct, not a common
implementation construct.

Erlang is the current poster child. It uses asynchronous message passing,
has awesome performance that scales linearly up to at least hundreds of
CPUs, and has been used for more than a decade to build time-critical
software (like telephone exchanges for example).

There are many functional languages where parallelism doesn't need to be
explicit - they are side-effect free so can be automatically parallelised
without the programmer even needing to know.

Both Erlang and FP require a shift of focus for the programmer, but that's
a Good Thing ;-).

Clifford Heath.

Martin Brown · May 22, 2009

John said:
The difference is that a branch-not-zero is faster than an immediate
compare.

In the stone age that was true. But on the latest CPUs with
sophisticated branch prediction, out of order and speculative execution
the time to compute the immediate compare is invisible. This is
particularly true in this case where the loop doesn't contain enough
other computational work to absorb the memory access time.

The only way to speed it up is to either vectorise it with SIMD SSE
instructions or add judicious prefetch cache hints to the loop code.
Branch target alignment to 32 bytes might also help.

Most of these tricks have limited utility on the latest quad cores, but
they do help on earlier P4 models.

Any time you programmers need help understanding how computers work,
feel free to ask an engineer.

You have shot yourself in the foot with this cheap jibe. It is now very
clear for all to see that you do not have the first clue about how
things execute on modern Intel CPUs. BASIC hacker mentality.

It is you who needs to ask an expert. In case you feel inclined to "get
a clue" the relevant Intel optimisation manuals are at:

http://www.intel.com/products/processor/manuals/index.htm

The one you want is right at the bottom titled "Intel® 64 and IA-32
Architectures Optimization Reference Manual"

Regards,
Martin Brown

Jasen Betts · May 22, 2009

...

Ah, if you only knew the power of binary trees...

hash tables...

Nobody · May 23, 2009

Strange, as I did program in Visual C++, what according to you is 'OO'.

You can write C++ code without using any of the OO features, as it's
almost a superset of C.

There is nothing to understand in ohoh (Dutch word play), it is
just a construct by those who do not know how to program, to
cover up for their shortcomings.

No, you just don't understand it.

Just like your argument does not have anything other then 'modularity',
I mentioned 'modularity' before, and modularity is something
one can achieve in a much simpler and clearer way in C.

Not really. OO provides a different kind of modularity. E.g. it's possible
to derive (extend) a class without knowing anything about the data
representation.

But as C++ programmers so not really know C, they only know the ++ constructs,
so they use those, creating bloat and crap code.

You're just projecting your own ignorance of OO onto people who do
understand it.

You do not have to take my opinion for it, if you trust MIT, this is what they say:
(from http://en.wikipedia.org/wiki/Object_oriented)
<quote>

Er, that quote wasn't written by "MIT". The author appears to be citing
views he doesn't entirely understand, much like you.

If you want to argue the pros and cons of OO with me, you need to do
so on the merits; appeals to "authority" will get you nowhere.

[Especially RMS; his aversion to OO is one of the main reasons why XEmacs
remains as a fork rather than having merged with GNU Emacs.]

Also, I don't give much credence to academics without much real-world
experience. While the academic side has its uses, there's a lot that you
can only learn by practical experience. In particular, knowing what works
when software is developed and maintained over decades by hundreds of
different people, most of whom make relatively minor contributions.

I mentioned linked lists, and people should know those.
Guess what Lisp is based on?
http://en.wikipedia.org/wiki/Lisp_programming_language

Before any of that OO crap, one should first learn the basics of programming.

I learned Lisp long before any OO languages (my first was Smalltalk), and
it is one of the few languages that I use regularly. But even Lisp has
CLOS, which suggests that even Lisp programmers have a use for OO.

Then you likely do not want OO.
What we need is programming power, to solve problems in the fastest simplest way,
not code castles in the sky.

OO doesn't really help if you're only solving small problems. But it can
be extremely useful when you're dealing with code which needs to be
readily extensible, or is so large that "global" knowledge isn't feasible,
or which naturally fits the OO paradigm (e.g. component-based systems and
simulations).

Nobody · May 23, 2009

What languages, besides Ada, do have constructs that deal with
concurency? And it must be a language construct, not a common
implementation construct.

Java has the "synchronized" method qualifier. The language specification
also spells out load/store semantics in some detail to ensure consistent
semantics for concurrent code.

Concurrency was fundamental to Occam, but that's ancient history.

Quite a few languages include threading in their standard library (and
modern languages tend to put as much as possible into their standard
library rather than in the language itself).

Nico Coesel · May 23, 2009

Jan Panteltje said:
Now, well, OK, now that I started to write anyways... what the big mistake is,
is in my view that Strouskop (or whatever his name is), and maybe others too,
seem to think 'big problems' are things that need to be solved by 'one big program'.
If you think that, and would actually try that, then you get huge sources that need to
somehow be compiled together, and the thing becomes more and more difficult, and likely impossible,
for mere humans, to maintain.

I agree with you (for a change). Several others and myself are working
on a fairly complex system. We divided the software into several
seperate simple programs which share a messaging system and a
database.

A much simpler way, and if you wrote some application, say a filter, or some processing, then you just pipe
the stream through it....

This is exactly how Microsoft Directshow works. Its a chain of
programs processing audio and/or video.

So what that boils down to is that you do NOT have one big application, but a whole lot of universally applicable small
ones, that are each easy to maintain, and can easily interface with other things in the world, other applications,
and hell if you are so inclined you can call these small applications 'objects', buy of course they are not objects,
they are different programs, possibly written in different languages, working together to get the job done.

You just captured the whole meaning of object oriented design.

Nobody · May 23, 2009

Now, well, OK, now that I started to write anyways... what the big mistake is,
is in my view that Strouskop (or whatever his name is), and maybe others too,
seem to think 'big problems' are things that need to be solved by 'one big program'.
If you think that, and would actually try that, then you get huge sources that need to
somehow be compiled together, and the thing becomes more and more
difficult, and likely impossible, for mere humans, to maintain.

It doesn't need to be a large monolithic executable. If anything, OO is
particularly suitable for libraries, as virtual methods make it easy to
extend the library without needing to modify the source code.

But splitting a program into libraries (or multiple programs) doesn't, by
itself, reduce the maintenance burden or the the overall complexity. If
all of the libraries fit within a common framework, the system isn't
really much different to a monolithic program. If they don't, the system
is likely to be even more complex as a result.

I decided to use pipes, and pipe the video stream from one application
to the other... A much simpler way, and if you wrote some application,
say a filter, or some processing, then you just pipe the stream through
it....

There are cases where that imposes an unacceptable performance penalty, as
you can't just pass references (pointers) through a pipe; you have to
pass the data as well.

And you either have to export/import the data to/from a defined protocol,
or your internal format becomes the defined protocol, which means that you
haven't necessarily modularised anything.

It can also become a maintenance nightmare, as you need to extend the
protocol in order to add additional data. All modules which process the
data have to understand the new protocol, even those which don't use the
additional data.

OO gets around this by "publishing" the specification while hiding
(most of) the implementation details.

E.g. you might have a "dictionary" (associative array) class, with methods
to add, remove and look-up entries. The implementation could be a B-tree
or a hash table or even just an array; code which uses it doesn't need to
know the details, and doesn't need to change if the implementation changes.

You can even have multiple implementations to allow for different
implementations being more efficient for specific use cases, or having
additional features, and any code which accepts a dictionary as an
argument will work with all implementations, including those which were
added later.

Nobody · May 23, 2009

This is exactly how Microsoft Directshow works. Its a chain of programs
processing audio and/or video.

Except, it's not programs but DLLs which all execute within the
application's address space.

And the design is heavily OO, with the graph nodes implemented as
subclasses of CBaseFilter.

You just captured the whole meaning of object oriented design.

Er, not really. That's more a description of modularity in general.

OO provides a specific form of modularity through abstraction, i.e. public
interface, hidden implementation. Objects are defined by their semantics
(what you can do to them) rather than their syntax (what they are made of).

The Unix "pipeline" philosophy is contrary to OO, in that it separates the
code from the data, so the format of the data must be globally known.
OTOH, OO ties the code and data together so that the data is only accessed
through its associated code.

OO is more about *enforcing* modularity than merely providing it. This
allows you to change the "internals" of a component without having to
analyse vast amounts of code to see if anything is relying upon the
details of the existing implementation.

[Unix analogy: the NSS (name-service switch) mechanism allows user and
host information to be obtained from plain text files (/etc/passwd,
/etc/hosts, etc (no pun intended)), Berkeley DB files, NIS, LDAP, DNS etc.
The client just calls e.g. getpwent() rather than parsing /etc/passwd and
failing on a system which uses NIS.]

Java takes this a step further an uses the encapsulation as a security
mechanism. You can pass an object reference to arbitrary code, safe in the
knowledge that the recipient can only access the object through the
provided interfaces.

[Unix analogy: a restricted file which normal (non-root) users can only
access via a setuid/setgid binary. E.g. you can't modify /etc/passwd
directly, but you can use chsh and chfn to change *your own* shell and
full name.]

Nobody · May 24, 2009

Thus my limitation. I did not want to hear about C (fork()) and
threads let alone heavyweight versus lightweight processes.

Neither fork() nor pthreads are part of the standard C library as defined
by the ANSI/ISO standards.

If you exclude language features implemented through the standard library,
some languages don't even provide assignment. E.g. in Lisp, assignment is
via set/setq/setf functions, not an assignment operator, function
definition is via defun, and so on.

Kim Enkovaara · May 26, 2009

Frank said:
Sun bought Trolltech, now it is LPGL for all platforms. Some time ago,
before Sun bought the company, you could bought a commercially licence for

Nokia bought trolltech, not Sun.

--Kim

Frank Buss · May 26, 2009

Kim said:
Nokia bought trolltech, not Sun.

Thanks, you are right:

http://www.nokia.com/A4813580

I mixed this up with the Oracle/Sun thing:

http://www.sun.com/third-party/global/oracle/

If there are more of such acquires, in the end there will be only a small
number of very big companies.

Anssi Saari · May 26, 2009

Will have to learn the SIMD instructions. I didn't think I had an
assembler that could handle even MMX, but apparently FreeBASIC can...

Interestingly, I got SSE2 code for the inner loop when compiling the C
code below with gcc 4.3.2 and -O3 -march=core2. The C code was from
John Devereux in the PowerBasic Rocks! thread, just changed a little
to print out the time per addition. Oh yeah, doesn't seem to run any
faster than "normal" code with addls and movls...

Here's the assembler output for the summing loops:

xorl %edx, %edx
.p2align 4,,10
.p2align 3
..L2:
xorl %eax, %eax
.p2align 4,,10
.p2align 3
..L3:
movdqa a(%eax), %xmm0
paddd s(%eax), %xmm0
movdqa %xmm0, s(%eax)
addl $16, %eax
cmpl $256000000, %eax
jne .L3
incl %edx
cmpl $10, %edx
jne .L2

And the C code:

#include <stdio.h>
#include <stdlib.h>

#define SIZE 64000000

int s[SIZE];
int a[SIZE];

int main(int argc, char **argv)
{
unsigned start_usecs, current_usecs, diff_usecs;
struct timeval start_timeval, current_timeval;

/* get start time */
gettimeofday(&start_timeval, NULL);

int x,y;

/* The Loop */

for(y=0;y<10;y++)
{
for(x=0;x<64000000;x++)
{
s[x] = s[x] + a[x];
}
}

/* get elapsed time */
gettimeofday(&current_timeval, NULL);

/* calculate the difference */
current_usecs = current_timeval.tv_usec + (1000000 * current_timeval.tv_sec);
start_usecs = start_timeval.tv_usec + (1000000 * start_timeval.tv_sec);
diff_usecs = current_usecs - start_usecs;

fprintf(stderr, "Time used is %d us (%.4f s).\n", diff_usecs, (float)
diff_usecs / 1000000.0);

fprintf(stderr, "Time used per add is %.4f ns.\n", (float)diff_usecs / 640000.0);
fprintf(stderr, "Ready\n");
exit(0);
}

Nico Coesel · May 26, 2009

Jan Panteltje said:
[email protected] said:

Will have to learn the SIMD instructions. I didn't think I had an
assembler that could handle even MMX, but apparently FreeBASIC can...

Click to expand...

Interestingly, I got SSE2 code for the inner loop when compiling the C
code below with gcc 4.3.2 and -O3 -march=core2. The C code was from
John Devereux in the PowerBasic Rocks! thread, just changed a little
to print out the time per addition. Oh yeah, doesn't seem to run any
faster than "normal" code with addls and movls...

Here's the assembler output for the summing loops:

xorl %edx, %edx
.p2align 4,,10
.p2align 3
.L2:
xorl %eax, %eax
.p2align 4,,10
.p2align 3
.L3:
movdqa a(%eax), %xmm0
paddd s(%eax), %xmm0
movdqa %xmm0, s(%eax)
addl $16, %eax
cmpl $256000000, %eax
jne .L3
incl %edx
cmpl $10, %edx
jne .L2

And the C code:

#include <stdio.h>
#include <stdlib.h>

#define SIZE 64000000

int s[SIZE];
int a[SIZE];

int main(int argc, char **argv)
{
unsigned start_usecs, current_usecs, diff_usecs;
struct timeval start_timeval, current_timeval;

/* get start time */
gettimeofday(&start_timeval, NULL);

int x,y;

/* The Loop */

for(y=0;y<10;y++)
{
for(x=0;x<64000000;x++)
{
s[x] = s[x] + a[x];
}
}

Click to expand...

Anyone tried to unroll this loop a little?

for(x=0;x<64000000;x+4)
{
s[x] = s[x] + a[x];
s[x+1] = s[x+1] + a[x+1];
s[x+2] = s[x+2] + a[x+2];
s[x+3] = s[x+3] + a[x+3];
}

It could reduce the overhead caused by the for loop.

Frank Buss · May 26, 2009

Jan said:
Well, if Nokia indeed bought it, and if they ever start using it,
they will be dead.
As it is bloatware.

Did you ever used one of Nokias development tools or even end user
applications for transfering the address book to a mobile phone? Can't be
more bloated

Nobody · May 26, 2009

Perhaps you will explain to us why Lisp does not provide assignment?
What language properties / features causes this to be the case?
(Never tried to use Lisp myself.)

It provides assignment as a function (well, several functions), not as a
language feature.

The language itself defines the syntax for literal values (integers,
strings, lists, ...), and the semantics of evaluation. Evaluating a list
calls the function specified by the first element with the remaining
elements as arguments; e.g. evaluating (+ 1 2 3) calls the + function with
1,2,3 as arguments, which will return 6. Evaluating (setq x 7) assigns 7
to x.

The language itself boils down to: everything except a list evaluates
to itself, lists are evaluated as function calls. Everything else
(arithmetic, assignment, function definition, conditionals, loops, ...) is
implemented by predefined functions. There are no keywords and no infix
notation.

Anssi Saari · May 27, 2009

Frank Buss said:
Did you ever used one of Nokias development tools or even end user
applications for transfering the address book to a mobile phone? Can't be
more bloated

Well, I use PC suite a lot. It runs OK, but the startup time is very
long. Maybe there is some kind of virtual machine startup involved.
After all, these are very simple apps.

Then again, I had a Linux PDA, Sharp Zaurus. The GUI uses QT/embedded
as I recall and it wasn't particularly bloated.

Kim Enkovaara · May 27, 2009

Jan said:
Well, if Nokia indeed bought it, and if they ever start using it,
they will be dead.
As it is bloatware.

There are already S60 and Windows CE ports of Qt available.

Qt is not very heavy library. It is split into many modules and not all
of them need to be loaded or even built. Especially for embedded
platforms the library can be customized. Qt has been used already for
years in mobile phones.

Just looking into the qt directory size does not tell that much. Full
build includes XML parsers, html renderer (based on webkit), database
connectors, scripting engines etc. And also debugging symbols for
c++ libraries are huge when built with some compilers (gcc for example),
but they can be stripped off.

--Kim

Martin Brown · May 28, 2009

JosephKK said:
Whoosh, i heard that blow by me. Never mind, i can use search tools.

LISP is derived from the lambda calculus, and was at the core of most of
the early symbolic algebra packages like REDUCE. In LISP everything is a
linked list of functions - and it can be compiled to fast native code.
It is exceptionally good for certain types of symbolic programming.

WIKI has a short piece on the mathematics which looks superficially OK
http://en.wikipedia.org/wiki/Combinatory_logic

Unlambda is about as minimalist and cryptic as a Turing complete
language can get:

http://en.wikipedia.org/wiki/Unlambda

Regards,
Martin Brown

Martin Brown · May 28, 2009

A typical spurious Larkin just so story. Arguing from ignorance.

Branch prediction and speculative execution on these CPUs is so good
that they only mispredict on the final termination before loop exit.

Nice in theory, wrong in practise. Try it.

I did. Theory and practice agree just fine here. The differences between
CPUs though even in the Core 2 family is startling even on the
relatively small sample I have immediately to hand.

There is absolutely no measurable difference between a count up or count
down assembler loop on a Core2 Quad, Core 2 Duo or P4. There is enough
slack with the faster CPUs to hide 3 immediate operand instructions
without altering the loop timing at all (just detectable on a P4 3GHz).

Fastest on the Core 2 Q6600 was the modern array style indexing
mov eax, [edi+4*ecx] etc at 2.159 +/- 0.008
This is typical of modern compiler output for x86.

Runner up was the cache aware algorithm at 2.190 +/- 0.007
Obvious simple pointer based loop gave 2.233 +/- 0.007
Cutting out word aligned 16 bit reads gave 2.215 +/- 0.008
Loop unrolling slowed it down to 2.250
SIMD was slowest of all at 2.288 (very disappointing)

The big surprise was the portable Mobile Core 2 Duo T5200 @ 1.6GHz
SIMD was fastest at 2.356 +/- 0.020
Array style indexing next at 2.420 +/- 0.010
Pointer loop using addition 2.450 +/- 0.012
Pointer loop using LOOP 3.225 +/- 0.026 !!!!

*Big* surprise using LOOP instruction on this older Core2 CPU slowed
things down by a massive 0.8s. I repeated it several times as I didn't
believe it at first. But it is a solid result.

On older and weaker CPUs the SIMD and cache aware code did better.

Try it. I went from 0.225 seconds to 0.207 by counting down.

You have no idea what you are talking about. It is some coincidental
change in the generated code (possibly use of LOOP instruction to count
down).

If you knew how to examine the generated code you would have done so by
now. But instead you keep harping on about the magical properties of
Power Basic.

I grant you its code generator must be fairly good - but the timing
pattern on the latest CPUs is extremely flat. In other words any half
way reasonable loop construct executes much faster than the memory
subsystem can supply the data to work on. Evidence is that it is the
write back to main memory that is causing all the problems.

The code generated is all that matters going up or down memory makes no
difference at all. There is more than enough slack in the loop timing.
It is utterly dominated by memory access delays.

Regards,
Martin Brown

Jasen Betts · Jun 7, 2009

True, although one problem with *NIX command line utilities is that there's no
standardization in how control parameters are passed to them.

??? there is
int argc, char *argv[]
which is considerably more standardisation than the windows XP or vista command-line has.

Microsoft's PowerShell got this right... instead of using standard text input
and output, they just pass objects around (in a chain, if desired), and all
the "options" parsing is just calling methods on a well-defined object.

it seems like an interesting concept.

Moore's Lobby Podcast

Menu

Categories

Platforms

Content

Connect With Us

Network

Larkin, Power BASIC cannot be THAT good:

Larkin, Power BASIC cannot be THAT good:

Clifford Heath

Martin Brown

Jasen Betts

Nobody

Nobody

Nico Coesel

Nobody

Nobody

Nobody

Kim Enkovaara

Frank Buss

Anssi Saari

Nico Coesel

Frank Buss

Nobody

Anssi Saari

Kim Enkovaara

Martin Brown

Martin Brown

Jasen Betts

Similar threads