[Dprglist] Timing in Python on the RPi

Thu Dec 3 11:57:54 PST 2020

Murray – this is a follow-up to our discussion about timing, in Python,  on the Pi during the 12/1 RBNV meeting.  

(I can get into more details and/or show the tests in action on the command line during the next RBNV if there is interest)

I ran a few tests on my Pi 4 to see if there is anything unique or different about Python when it comes to real-time performance.   I mainly wanted to check if Python performs any worse than a normal process.   (I don’t believe there is a way for it to do any better than other processes….).  

You mentioned time.monotonic() on the call, so I played with that a bit but in the end I used time.perf_counter().  On the Pi the two timing methods in Python seem to be equivalent, but on Windows time.perf_counter() delivers sub-millisecond granularity whereas time.monotonic() gives you a granularity of 16ms or so.

I usually use a utility called “cyclictest” (apt-get install rt-tests) to evaluate real-time performance on a linux system and a tool called “stress” (apt-get install stress) to create a load.     Cyclictest basically does a sleep() and then checks if the thread woke up at the expected time.  

It turns out that it only takes a few lines of Python code to replicate the essence of what cyclictest does, i.e. 

Loop:
       t1=timestamp(); 
       sleep(1 millisecond); 
       t2=timestamp();
       //keep track of min,avg,max 
//report results

Conclusion:  Python is bound by the same scheduling laws and limitations that other processes under Linux are bound to,  which means the following:

1. 

If you don’t explicitly control the scheduling class  and the priority, a thread is scheduled based on the standard policy, which is the “completely fair scheduler” aka SCHED_NORMAL.   This means a sleep(1 ms) will sometimes turn into a sleep(1 ms + N ms) – where N is almost completely unbounded and depends heavily on what else is going on and only on a system that is 100% idle, N will be close to 0 most of the time.   

On my Pi with only pi-hole and htop running in the background, my python test reported a latency (i.e. timing error) as high as 14 milliseconds (i.e. a sleep(1) turned into a sleep(15).  Cyclictest on the other hand reported a worst-case latency that was “only” 6 milliseconds

2. 

If you do explicitly control scheduling class and priorities, for example by using the chrt utility ( chrt -f 60 pyton3 ./t.py ),  you can expect your timers to behave with A LOT  more accuracy.   

Both python and cyclictest reported a worst case timing error of about 250 microseconds during my tests.

The key here is to use scheduling class SCHED_FIFO (the -f in chrt) or SCHED_RR.

3.

Even if you use SCHED_FIFO  with sufficiently high priority, the fact that standard Linux distributions are not configured for real-time becomes immediately evident as soon as you run certain stress tests.    Basically, even a low priority process/thread that makes system calls can cause the Linux kernel to enter a critical section from which it won’t emerge until several milliseconds later and during which preemption (i.e. rescheduling) is disabled.   

Both the python test and cyclictest reported an error in the 5 millisecond range for the following memory allocation workload:  “stress –vm 4 –vm-bytes 64M”

4.

If you want to do better than that, you need a kernel that has the real-time patches applied and configuration option PREEMPT_RT enabled.   With this, the Linux kernel no longer enters these multi-millisecond long critical sections and so it doesn’t get much in the way of good real-time performance.   Also, the interrupt handlers of most device drivers now get turned into prioritized threads and therefore even a poorly written device driver won’t get much in the way provided you use a priority that is higher than that of the interrupt threads.    PREEMPT_RT is required for ROS2 if you expect to benefit from the real-time improvements that ROS2 has made over ROS

With this, latencies of <250microseconds even under all types of stress conditions are quite doable.  I have seen reports of <100 microseconds also.   

Here is the python timing test code. Results in microseconds, except for the last one, which is in seconds:

import time

while True:
    min=1000.0
    max=0.0
    avg=0.0
    sum=0.0
    i=0
    T0=time.perf_counter()
    while i < 1000:
        i=i+1
        t1=time.perf_counter()
        time.sleep(0.001)
        t2=time.perf_counter()
        t3=t2-t1
        if t3>max: 
            max=t3
        if t3<min:
            min=t3
        sum=sum+t3

    avg=sum/i

    avg=(avg*1000000)-1000
    min=(min*1000000)-1000
    max=(max*1000000)-1000
    T1=time.perf_counter()
    duration=T1-T0
    print("min,avg,max,duration = %7.1f , %7.1f , %7.1f , %3.1f" % (min,avg,max,duration))

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dprg.org/pipermail/dprglist-dprg.org/attachments/20201203/b4173271/attachment.html>