Maker Pro
Maker Pro

How to develop a random number generation device

J

John Larkin

Doing it right isn't all that hard to ensure. XOR gates don't cost
much. If you XOR a pseudo-random bit string with a slightly random
bit string, you get a more random bit string.

If you clock everything with a good crystal, you can avoid having the
clock rate vary depending on the state of the output. The ring
oscillators can be sampled at the clock edge to make a slightly random
bit stream.

I suggested xor'ing a random bitstream *into* the shift register, not
merely with its output. That causes it to make random jumps in its
state space, so that the only statistical defects of a pseudo-random
register (repetition and the ability to recognize the algorithm)
become impossible to determine.

But xoring a mediocre random bit stream, say from a zener source, with
a long-sequence pseudo-random stream is pretty good, because it cleans
up most of the defects of the zener noise, namely DC bias and the
grosser autocorrelations. In theory the bias might be exploited to
discover the shift register polynomial, although that will be a chore
if the bit rates are similar.

John
 
D

David Brown

MooseFET said:
On Sep 11, 4:58 pm, John Larkin

It's up to the linker to build the segments, and the run-time
link-loader picks the addresses - the compiler is not involved in the
process.
No the problem isn't really with code mixed with data. It is data
mixed with data. The return addresses etc are on the stack along with
the local arrays. This means that a routine can overwrite the return
address with data by walking off the end of an array. Once that
happens the return instruction jumps you to the bad code.

That's a common sort of buffer overflow (it's not the only one, but it's
a good example). But as long as you can't execute code in the stack or
data segments, it's difficult to exploit such overflows for anything
more than crashing the program (since you can only jump to existing
code). When the system lets you execute from the stack or the data
sections, the attacker has more flexibility since they can insert their
own machine code into the program's data, then execute it.

Thus the main protection against malicious buffer overflow attacks is to
ensure that the stack and data segments are marked non-executable
(something Linux has had for ages, and windows has caught up with - if
you enable it). Of course, that means that self-modifying code can't
work, which is a problem for just-in-time compilers and certain other
techniques such as trampolines.

The other protective mechanism is to randomise the addresses of the
segments (supported in Linux, and possibly now in Vista?). This makes
it almost impossible to pick suitable addresses when overwriting the
return address on a buffer overflow, so that at best you can cause a
crash but not specific malicious behaviour.


Traditionally on most architectures, you get the code first, then the
statically allocated data (initialised data followed by uninitialised
data), then the heap. The stack starts at the top of memory and grows
downwards - the heap and stack meet in the middle when you run out of
memory. On modern OS's with virtual memory, the code, data, heap and
stack are often separate.
 
D

David Brown

Nobody said:
The issue isn't about modifying code, related or otherwise. It's about
either injecting new code or executing existing code with
attacker-supplied data.

This isn't about protecting one process from another, but about protecting
a process from itself. Most of the existing mechanisms for mitigating
buffer overruns are implemented in either the compiler or libraries. The
only OS-level mechanisms (things that work on any executable, however it
was built) involve making it harder to exploit an overrun (e.g.
randomising memory locations) rather than actually preventing the overrun.


No, the issues apply to any OS. But binary compatibility is much more
important for Windows (and Mac) than for Linux.

If you try to run a 5-year old Linux binary on a current distribution,
you'll probably find that a lot of the interfaces on which it depends have
either disappeared or have changed in an incompatible manner. Lack of a
stable ABI is a simple fact of life on Linux.

Binary compatibility for the user-level API is extremely important for
Linux, and the API is very stable - new API's and system calls are
added, but existing ones are seldom changed or removed, and never
without very good reason. Most five year old Linux binaries will run
fine on the same architecture as they were build for (assuming you have
the right libraries) - the only API calls that have changed are very
seldom used.

I think you are confusing the issue with the device driver binary API,
as used by device drivers internal to the kernel. There the API is very
explicitly flexible, and may be changed whenever it is convenient (the
*source* level API compatibility is high - no one wants to have to make
extra source level changes without reason. But if all that's needed is
a recompile, that's okay).
 
N

Nobody

Binary compatibility for the user-level API is extremely important for
Linux, and the API is very stable - new API's and system calls are
added, but existing ones are seldom changed or removed, and never
without very good reason.

That's only really true for "core" libraries, e.g. libc. Once you get up
to higher levels (e.g. X and GUI toolkits), the ABIs can change quite
frequently.

Anything which uses C++ tends to be especially fragile, partly due to
extensive use of structures (objects), partly due to the amount of inlined
code (which often relies upon structure layout and won't change along with
non-inlined functions in the library).
Most five year old Linux binaries will run
fine on the same architecture as they were build for (assuming you have
the right libraries)

IOW, a five year old binary will run on a five year old OS distribution.
If you want to run it on a newer distribution, you need to start chasing
down all of the five-years-ago versions of the libraries that it uses.
- the only API calls that have changed are very
seldom used.

It isn't so much the syntax of the calls (even a fairly slack programmer
can figure out that changing the argument list will change the API), but
the structures which they use (and, less frequently, their semantics).
 
M

MooseFET

It's up to the linker to build the segments, and the run-time
link-loader picks the addresses - the compiler is not involved in the
process.

I included the linker in the "compiler/linker" in many cases they are
the same program. In many environments the segments end up in memory
in the same order as they were in the file.
 
M

MooseFET

The OS doesn't know where one "buffer" ends and the next one begins;

It could know in most cases. The common trick is to overrun on the
stack onto return addresses. The OS can defend the return addresses
at some overhead cost.



That prevents overflowing malloc()ed buffers, but it doesn't work for
those on the stack ("auto" local variables) or in the data segment (global
or "static" local variables).

Defending the stack can be as per above. The data segment doesn't
have to contain code addresses so it is much harder to end up else
where in code space from changes to the data segment.
 
M

MooseFET

A very dangerous assumption. XOR is a linear operation.

I disagree. Note that I said "slightly random" and "pseudo-random".
These are two very different things.

LSFR(register ^ random) = LSFR(register) ^ LSFR(random)

Yes, and that doesn't apply to what I said.
Those things can exibit a periodic behavior.

Crystals ideally are periodic. Hit and miss regulators never can be.

Sampling a ring oscillator is a "slightly random" thing. It has a
periodic component which doesn't increase the randomness and a random
component which does.
 
M

MooseFET

Assuming no correlation between two sources of data, XOR
cannot decrease the randomness of the output stream below that
of either input stream If one of the streams is 100% true
random, the output of the XOR will be 100% true random, the
only exception being the case where the second input is
derived from the first.

You can take it a bit further. If one input is X% truly random the
result must be at least X% random if no information can travel between
the sources.
 
M

MooseFET

I suggested xor'ing a random bitstream *into* the shift register, not
merely with its output. That causes it to make random jumps in its
state space, so that the only statistical defects of a pseudo-random
register (repetition and the ability to recognize the algorithm)
become impossible to determine.

But xoring a mediocre random bit stream, say from a zener source, with
a long-sequence pseudo-random stream is pretty good, because it cleans
up most of the defects of the zener noise, namely DC bias and the
grosser autocorrelations. In theory the bias might be exploited to
discover the shift register polynomial, although that will be a chore
if the bit rates are similar.

You can discover the pseudo random part but you have to process a
great deal more data to do it. When you have discovered it, you still
have the partly random zener noise you can't do anything about.

I was arguing a simple case because, I though, what I said would be
obvious. Mixing in some real randomness at the output of a pseudo
random generator is not the ideal way to do things but it really is
quite good. Assuming you don't tell people how long of a shift
register you are using, that is.

The first step in figuring out the sequence is much like doing a very
long FFT on the data. You need to discover the repeat period of the
sequence. In a pseudo random generator, there is no signal at lower
frequencies. When you add even a crappy zener based noise, you mix
the high frequency noise down to zero masking the repeat period.
 
N

Nobody

It could know in most cases. The common trick is to overrun on the
stack onto return addresses. The OS can defend the return addresses
at some overhead cost.

How? Neither CALL nor RET are privileged instructions.

The compiler can defend against this; that's what StackGuard and ProPolice
do. But I fail to see what the OS can do.
 
J

JosephKK

John Larkin [email protected] posted to
sci.electronics.design:
Ancient computers, PDP-11 and VAX certainly, had memory management
hardware that separated I and D space, where I space was read-only,
and D space could not be executed. And the OS's enforced those
rules. It was common to have many users running the exact same code,
but mapped into different data spaces.

Problem is, neither Intel nor Microsoft was in the mainstream of
computing when they kluged up x86 and Windows.

John

The hardware only became capable of the basics of worthwhile
implementation in early Pentiums, and became capable of really
worthwhile implementations with Opteron (AMD) and EMT64 (Intel).
Microsnoft still has not caught up yet. Just try getting drivers
for 64-bit versions of MS OS's (since Windows2000).
 
M

MooseFET

How? Neither CALL nor RET are privileged instructions.

The stack segment can be made privileged. This means that every
access to a local variable held on the stack would cause an
interrupt. For processors like the x86, this would add the overhead
as I stated.

I have run programs under "bochs" that have done useful things in
reasonable time. I see no reason, modern processors, to use some CPU
time to make code safe.
 
M

MooseFET

John Larkin [email protected] posted to
sci.electronics.design:







The hardware only became capable of the basics of worthwhile
implementation in early Pentiums, and became capable of really
worthwhile implementations with Opteron (AMD) and EMT64 (Intel).


I disagree with what you may not have meant to say above. In the
microprocessor area, you are largely correct but in other machines,
there were many hardware systems that could protect against buffer
overflows getting evil code to run. Some of them used a different
stack for call and return than for the data. Some such as the IBM-360
didn't have a stack and required each routine to handle its "save
area".

Some of the more DSPish machines would also be hard to make a buffers
overflow do anything evil. They are far from general purpose machines
so although they may show that it could have been done, we can say
that they could have made a general purpose PC that was well defended.
 
M

Martin Brown

Depends on the OS. OS/2 did a pretty good job of preventing anything
more than the faulty thread from going down - and that was in the bad
old 386 days. IBM hobbled its capabilities by insisting that it ran on
286s too (a fatal error). Intel x86 CPUs onwards had a (largely
unused) Bound instruction ever since the iAPX186. You may not be able
to stop all buffer overruns, but you can still guard against a lot of
them. Almost nobody did :(

Some compilers of the 80's had builtin optional stack and array bounds
checking that was very useful during debugging but usually switched
off for production builds.

386 could do enough in protected mode to make exploits very difficult
with the right OS. OS/2 on a 386 was very robust, although its
acceptance was stymied by the sheer stupidity of IBMs marketting
department.

What nobody at the time could understand was why users loved their
Windoze even though it fell over hourly.
I disagree with what you may not have meant to say above. In the
microprocessor area, you are largely correct but in other machines,
there were many hardware systems that could protect against buffer
overflows getting evil code to run. Some of them used a different
stack for call and return than for the data. Some such as the IBM-360
didn't have a stack and required each routine to handle its "save
area".

Sadly that isn't a good example. It was quite possible to get into
Key0 on the old 360 and 370 series machines if you read the right
manuals.
Some of the more DSPish machines would also be hard to make a buffers
overflow do anything evil. They are far from general purpose machines
so although they may show that it could have been done, we can say
that they could have made a general purpose PC that was well defended.

It is impossible to execute data on a Harvard architecture machine.
That is one hard line solution.

Simulating that model by aggressively separating readonly code and
data segments and only permitting genuine code segments to execute
goes a long way towards making things secure. Self modifying code not
allowed.

Regards,
Martin Brown
 
N

Nobody

The stack segment can be made privileged. This means that every
access to a local variable held on the stack would cause an
interrupt. For processors like the x86, this would add the overhead
as I stated.

Yeah; a lot of overhead.
I have run programs under "bochs" that have done useful things in
reasonable time. I see no reason, modern processors, to use some CPU
time to make code safe.

Performing a context switch on every stack write may well be slower
than simply emulating everything in software (like bochs). I doubt that
the market would accept that kind of performance hit.
 
D

David Brown

MooseFET said:
MooseFET said:
On Sep 11, 4:58 pm, John Larkin
[... buffer overflow ...]
It sounds to me like C compilers/linkers tend to allocate memory to
code, buffers, and stack sort of anywhere they like.
It's up to the linker to build the segments, and the run-time
link-loader picks the addresses - the compiler is not involved in the
process.

I included the linker in the "compiler/linker" in many cases they are
the same program. In many environments the segments end up in memory
in the same order as they were in the file.

The linker is not the same program (for C) in any environment I have
ever heard of - but it is generally *called* by the compiler
automatically, so it just looks like it is part of the compiler. The
point is, any linking issues are handled by linking directives and not
by anything you give to the compiler (i.e., the source code).

The link-loader is a different animal altogether - it is what the
operating system uses to actually load and run a program. It handles
the final linking of the binary with any required run-time libraries, it
allocates space and addresses for the different segments of the program,
and it links the parts together. It is at that stage that the addresses
are finalised. In particular, if you are using a system with randomised
addresses, each time a program is loaded it is linked to a different
random address.
 
D

David Brown

Nobody said:
That's only really true for "core" libraries, e.g. libc. Once you get up
to higher levels (e.g. X and GUI toolkits), the ABIs can change quite
frequently.

To be a little pedantic, things like X and gui toolkits are not part of
Linux - the comment was about the Linux API (and I presume you mean
"API", not "ABI" - i.e., the interface for calling system functions from
languages like C, rather than the low-level specification for C function
calling conventions).

But more generally, the situation is fairly similar with these sorts of
libraries - they get extended, but there are not often major
incompatible changes. What changes is often catered by the standard
"./configure" process, and when necessary Linux installations have
copies of the various versions of the libraries.
Anything which uses C++ tends to be especially fragile, partly due to
extensive use of structures (objects), partly due to the amount of inlined
code (which often relies upon structure layout and won't change along with
non-inlined functions in the library).

I'm not sure how valid that is - but it's also worth remembering that in
the world of open source, source code compatibility is more important
than binary compatibility. A great many binary incompatibilities can be
resolved by a simple recompile - standard practice when making a new
version of a Linux distribution, but alien to the world of windows
programming.
IOW, a five year old binary will run on a five year old OS distribution.
If you want to run it on a newer distribution, you need to start chasing
down all of the five-years-ago versions of the libraries that it uses.

In the cases where that is necessary, distributions often have these
five year old libraries in their repositories.
It isn't so much the syntax of the calls (even a fairly slack programmer
can figure out that changing the argument list will change the API), but
the structures which they use (and, less frequently, their semantics).

Again, structures often have members added - but seldom removed or
changed. In this case, a recompile will be a simple fix. Windows APIs
often include the size of structures as part of their parameters (or as
a field in the structure itself), so that code can take into account
varying sizes for different versions of windows (newer versions add to
the end of the structures). This allows better binary compatibility, at
the cost of run-time overhead (space needs to be allocated dynamically
rather than statically).

I have never heard of changing user-level binary API interfaces being a
serious problem on Linux (it is viewed by some to be a serious problem
at the device driver level, but mainly by people who simply don't
understand Linux), but I suppose it might be for some types of software.
 
R

Rich Grise

It isn't so much the syntax of the calls (even a fairly slack programmer
can figure out that changing the argument list will change the API), but
the structures which they use (and, less frequently, their semantics).

It's because there about three competent C++ programmers in the known
Universe.

Thanks,
Rich
 
K

krw

Not by the *OS*.

Sure it can. Not in Windows and not with C++, perhaps. An OS can
surely make it impossible to write safe code and a real OS is
required to make safe code possible.
 
Top