Monday, April 27, 2020

Measuring AVR interrupt latency

One thing I like about AVR MCUs is that their datasheets are relatively short and simple.  It's also one of the things I don't like, because the datasheets often lack important details.  Understanding external interrupt latency is one things that is lacking complete and clear details.  I decided to investigate the interrupt latency of the ATtiny13 and the ATtiny85.  The datasheet's description of interrupt response time and external interrupts is identical for both parts.

Interrupt Response Time

The ATtiny13 datasheet section 4.7.1, under the heading "Interrupt Response Time", says, "The interrupt execution response for all the enabled AVR interrupts is four clock cycles minimum. After four clock cycles the Program Vector address for the actual interrupt handling routine is executed. [...] The vector is normally a jump to the interrupt routine, and this jump takes three clock cycles. [...] If an interrupt occurs when the MCU is in sleep mode, the interrupt execution response time is increased by four clock cycles."

While section 4.7.1 is reasonably detailed, it has one significant error, and another important omission.  The error is the sentence, "The vector is normally a jump to the interrupt routine, and this jump takes three clock cycles".  All AVRs with less than 8KB of flash, like the ATtiny13, have no jump instruction.  They only have a relative jump "rjmp", which takes two clock cycles.  This is obviously a copy/paste error from the datasheet of an AVR with  more than 8KB of flash.  Anyone familiar with the AVR instruction set would likely catch this simple error. The omission from section 4.7.1 is much harder to recognize until you carefully examine section 9.2 and figure 9-1 in the datasheet.

Figure 9-1 shows a circuit which appears to add a latency of two clock cycles to pin change interrupts.  There is no written description for the circuit, and the external interrupt details in section 9.2 of the datasheet state, "Pin change interrupts on PCINT[5:0] are detected asynchronously."  Since pin change interrupts can be used to wake the part from power-down sleep mode when all clocks are disabled, they must be detected asynchronously during power-down sleep.  To determine when they are detected synchronously requires testing.

To test the interrupt latency I wrote a program in assembler that can generate low pulses of different lengths using PWM.  I chose not to write the program in C because I want to be able to measure the interrupt latency down to a single cycle.  On the t13, PB1 is the pin for INT0, PCINT1, and  OC0B.  By using OC0B to generate a low pulse on PB1, I'll be able to trigger INT0 and PCINT1 without any external connections.  When the interrupt is triggered, it should take four cycles to execute the code at the interrupt vector.  That code is an rjmp to the interrupt function, and that rjmp takes two additional clock cycles.  For the best-case latency, the first instruction in the interrupt function will execute six cycles after the interrupt is triggered.

The first instruction of the interrupt function checks the state of the pin that triggered the interrupt (the "sbic" instruction).  If the pin is low, it skips the next instruction, then goes into an infinite loop.  If the pin is high, it toggles the LED pin.  Since the PWM is configured to generate a low pulse, if the pulse has ended before the sbic, the LED will light up to indicate the interrupt response time was too slow.  The length of the pulse is one cycle longer than the value stored in OCR0B, which is done at lines 28 and 29.  My testing consisted mainly of modifying the OCR0B value, then building and flashing the modified code to the AVR.

Results

As expected INT0 latency is 4 clock cycles from the end of the currently executing instruction.  This means that if the interrupt occurs during the first cycle of a call instruction which takes 3 cycles, the interrupt response time will be 6 cycles.  For pin change interrupts, the latency is 6 cycles, indicating the synchronizer circuit adds 2 cycles of latency.  In idle sleep mode, both INT0 and PCINT latency is 8 cycles, indicating pin change interrupts operate asynchronously when the CPU clock is not running.

Wednesday, April 8, 2020

Better asserts in C with link-time optimization

I've been a fan of link-time optimization for several years.  I've been a fan of efficient programming for even longer.  I was an early fan of C++ because features like function overloading made it easier to move decisions done at run-time in C to compile-time with C++.  As C++ has become more complex over the decades, I've become less of a C++ fan, and appreciate the simplicity of C.

For small embedded systems like 8-bit AVRs and ARM M0, run-time error checking with assert() has minimal usefulness compared to UNIX, where a a core dump will help pinpoint the error location and the state of the program at the time of the error.  Even if the usability problems were solved, real-time embedded systems may not be able to afford the performance costs of run-time error checking.

Both C++ and C support static assertions.   Anyone who has tried to use static_assert likely has encountered "expression in static assertion is not constant" errors for anything but the simplest of checks.  The limitations of static_assert is well documented elsewhere, so I will not go into further details in this post.

I had long understood that LTO allowed the compiler to evaluate expressions in code at build time,  I never realized it's potential for static error checking.  The idea came to me when looking at a fellow embedded developer's code for fast Arduino digital IO.  In particular, Bill's code introduced me to the gcc error function attribute.  The documentation describes the attribute as follows:

  • If the error or warning attribute is used on a function declaration and a call to such a function is not eliminated through dead code elimination or other optimizations, an error or warning (respectively) that includes message is diagnosed.  This is useful for compile-time checking ...
Despite the fact that it seems the error attribute was introduced to address some of the limitations of static asserts, it doesn't seem to be commonly used.  After some experimentation, I came up with a basic example.
pll.c:
__attribute((error("")))
void constraint_error(char * details);

volatile unsigned pll_mult;


void set_pll_mult(unsigned multiplier)
{
    if (multiplier > 8) constraint_error("multlier out of range");
    pll_mult = multiplier;
}

main.c:
extern void set_pll_mult(unsigned multiplier);

int main()
{
    set_pll_mult(9);
}

$ gcc -Os -flto -o main *.c
In function 'set_pll_mult.constprop',
    inlined from 'main' at main.c:6:5:
pll.c:9:25: error: call to 'constraint_error' declared with attribute error:
     if (multiplier > 8) constraint_error("multlier out of range");
                         ^
When set_pll_mult() is called with an argument greater than 8, a compile error occurs.  When it is compiled with a valid multiplier, the "if (multiplier > 8)" statement is eliminated by the optimizer.  One drawback to the technique is that the caller (main.c in this case) is not identified when the called function is not inlined.  Increasing the optimization level to O3 may help to get the function inlined.

Wednesday, February 12, 2020

Building a better bit-bang UART - picoUART

Over the past years, one of my most popular blog posts has been a soft UART for AVR MCUs.  I've seen variations of my soft UART code used in other projects.  When MicroCore recently integrated a modified version of my old bit-bang UART code, it got me thinking about how I could improve it.

There were a few limitations to my earlier UART code.  One was that it didn't support baud rates below 19.2kbps at 8Mhz or baud rates below 38.4kbps at 16Mhz.  It was also problematic for people that tried to integrate it into C/C++ libraries, as the code was written in AVR assembler.  Another problem that was recently brought to my attention by James Sleeman, was that the UART receive didn't work well at moderately high baud rates such as 57.6kbps.  Since my AVR skills had improved over time, I was confident I could make tangible improvements to the code I wrote in 2014.


The screen shot above is from picoUART running on an ATtiny13, at a baud rate of 230.4kbps.  The new UART has several improvements over my old code.  To understand the improvements, it helps to understand how an asynchronous serial TTL UART works first.


Most embedded systems use 81N communication, which means 8 data bits, 1 stop bit, and no parity.  Each frame begins with a low start bit, so the total frame is 1 start bit + 8 data bits + 1 stop bit for a total of 10 bits.  Frames can be sent back-to-back with no idle time between them.  The data is sent at a fixed baud rate, and when either the receiver or transmitter varies from the chosen baud rate, errors can occur.

When it comes to the timing error tolerance of asynchronous serial communications, I've often read that somewhere between 2% and 3.5% timing error is acceptable.  I've also read many "experts" claim that a micro-controller needs an accurate external crystal oscillator in order to avoid UART timing errors.  The truth is that UART timing can be off by a total of over 5% without encountering errors.  By total, I mean the sum of the errors for both ends, so if a transmitter is 2% fast, and the receiver is 2% slow, the 81N data frames can still be received error-free.  The timing on a USB-TTL UART adapter is usually accurate to within 0.1%, so if I am sending data from an AVR that is running 3% slow, my PL2303HX adapter still receives it error-free.

If a frame is being transmitted at 57.6kbps, each bit needs to last 1000/57.6 = 17.36us.  That means 17.36us after bringing the line low for the start bit, the least significant bit needs to be sent.  A receiver will wait for the start bit to begin, wait another 17.36, and then wait for the middle of the first bit to sample the line.  If the line is high, the bit is a 1, and it it is low, the bit is a zero.  So the receiver will sample the first bit 1.5 * 17.36 = 26.04us after the line goes low to signal the start bit.  The last(8th) bit will be sampled after 8.5 *17.36 = 147.56us.  If the transmitter is to slow, and is still transmitting the 7th bit, it will cause a communication error, as the receiver will interpret the 7th bit as actually being the 8th bit.  If the transmitter is still sending the 7th bit after 147.56us, then it is sending at 8/8.5 or 0.941 * 57.6 = 54.2kbps.  Since many UARTs check for a valid stop bit, the maximum timing error is usually 9/9.5 or 94.7% of the baud rate.

The transmit timing of my earlier soft UART implementations is accurate to within 3 clock cycles.  This was each iteration of the delay loop takes 3 clock cycles - one for decrement and two for the branch:
    ldi delayArg, TXDELAY
TxDelay:
    dec delayArg
    brne TxDelay

And since delayArg is an 8-bit register, the maximum delay added to the transmission of each bit is 2^8 * 3 = 768 cycles.  On a MCU running at 8Mhz, that limited the lowest baud rate to around 8000/768 or 10.4kbps.  To allow for lower bit rates, picoUART needed to support longer delays.  I also wanted to support more accurate timing, so picoUART uses __builtin_avr_delay_cycles during the transmission of each bit.  The exact number of cycles to wait is calculated by some inline functions, which is a better way of doing the calculations than the macros I had used before.  Writing picoUART in C made the timing calculations more difficult, since compiler has some flexibility in how the code is compiled to AVR machine instructions.  In order to get avr-gcc to generate the exact sequence of instructions that I wanted, I had to use one inline asm statement.  When I used a C "while" loop instead of the asm goto "brne" instruction, the loop was one cycle longer due to a superfluous compare instruction.  Future versions of the compiler may have improved optimization and omit the compare, which would slightly impact the timing.

As with the transmit code, picoUART's receive code is accurate to within one cycle.  Unlike my earlier UART code, picoUART returns after reading the 8th bit instead of waiting for the stop bit.  Because of this change, picoUART begins by waiting for the line to be high before waiting for the start bit.  Without the initial wait for high, back-to-back calls to purx() could lead an error when the 8th bit of one frame is 0(low) and gets interpreted as the start bit of the next frame.  This change approximately triples the amount of time for the AVR to process each byte in a continuous stream of data.

My earlier UART code had two incompatible versions.  One version used open-drain communication, where the transmit line is pulled high by an external resistor, and pulled low by the AVR.  This version supported using a single wire for both receive and transmit.  While it also worked with separate pins, some users found it inconvenient to add the pull-up resistor.  Instead they would choose the "push-pull" version, where the AVR drives the line high and pulls it low.  With picoUART a single version works for both use cases, because it works in "push-pull" mode only during transmit.  When not actively transmitting, the IO pin is set to input mode with the internal pull-up activated.

I've tried to help both the noobs and experienced AVR developers.  The noob can download a release zip file to add as an Arduino library.  If you are an old AVR developer like me that prefers a keyboard over a mouse, you'll find a basic Makefile with the echo example.  The default baud rate is 115.2kbps, although it is capable of accurate timing at much higher speeds such as 1mbps for an AVR running at 8Mhz.  The default transmit is on PB0, with PB1 for receive.  The defaults can be changed in pu_config.h, or with build flags like "-DPU_BAUD_RATE=230400L".

Saturday, January 11, 2020

Picoboot v3 with autobaud and timeout

Today I released v3 beta2 of picoboot.  Like the last release of picoboot, it takes up only 256 bytes, which is the minimum bootloader size supported on the ATmega88 and ATmega168.  This means picoboot will free up 256 bytes of flash if you currently use Optiboot.  Without any potential benefit from reduced size, this release focused on robustness and speed.


The above screen shot shows reading the 16kB flash memory of an ATmega168 in 1.32 seconds.  Using 500kbps instead of 250 will read the flash in under one second, and will read 32kB of flash from an ATmega328 in two seconds.  Not only is it fast, it is reliable, with no errors using CH340G and CP2102 adapters under Windows, and PL2303HX adapters under Linux.  So as long as your serial driver supports 250 or 500kbps and doesn't round them down to 230,400 and 460,800, you can have reliable and high speed uploading and verify of code on ATmega MCUs.

Earlier versions of picoboot supported a bootloader toggle mode, where resetting the MCU once entered the bootloader, and resetting again ran the application code.  I designed this with boards that don't support the auto-reset functionality of the Arduino bootloader.  However this turned out to be problematic with some boards that do have auto-reset, where picoboot could sometimes toggle out of bootloader mode when it was supposed to enter bootloader mode.  With v3, picoboot now implements a timeout where it will wait for a few seconds and if no communication is received from avrdude, the bootloader will exit.

Like the previous versions, picoboot does not use the watchdog timer, and will not impact application code that uses the watchdog reset.  To make picoboot useful for use with a standalone AVR on a breadboard, it does not rely on a user LED on PB5 to indicate bootloader activity.  Instead, when the bootloader starts, it lowers the TXD line (PD1).  This will light the RX LED on the attached serial adapter.  If the bootloader times out, PD1 will be left floating before the bootloader exits.

My recommended baud rate for picoboot is 250kbps.  This baud rate results in 0 timing error with the AVR USART when used with the common clock rates of 8, 12, and 16Mhz, as well as the less-common 20Mhz.  The faster 500kbps also results in 0 timing error with the USART, however poor design of some serial adapters  makes the higher speeds more susceptible to noise, particularly when long wires are used to connect to the AVR.  I didn't encounter problems at 500kbps, but I was a bit surprised by how much noise I saw on my oscilloscope when testing a CP2102.

If you are using the Arduino IDE rather than the command line, I explained how to change the boards.txt file in my blog post about picoboot v1.

I plan to test v3 beta 2 for about a month, so expect the final v3 in early February.  In addition to further testing on the mega168 and mega328, I'll test the mega88.  If there is enough interest in a build for the mega8, I'll look into supporting it too.