Your code runs in EL0, the Kernel in EL1.
Once you get a feeling why there are different privilege modes and understand the reason they exist, you will understand why ARM has for so called "hypervisors" an additional level of privilege in ts latest IP releases for Cortex Mx controllers, ARM Cortex M23 and 33, EL2.
As to your last questions, "vead", as long as you do not need deterministic latency for your system to react well below 1 ms, Linux is fine. Once you get to the latency limits of a release of Linux for embedded systems, there are patches for certain Linux releases that allow you to shorten deterministic latency times.
See:
foam bags