Thursday, December 17, 2015

Lightweight AVR assembler functions

Over the past few years, I've written a few posts about small problems in the way avr-gcc generates code.  These problems are typically poor optimization, and since they don't cause problems in program functionality, they tend not to be fixed.  I believe the biggest opportunity in optimization is in inter-procedure register analysis, which is being worked on in GCC 5, but is unlikely to support 8-bit AVR MCUs.  Therefore I have developed a lightweight ABI for calling assembler functions:

inline void eelog(char logdata)
{
    // no output, x register input, no clobbers
    asm volatile (
    "rcall eelog_\n"
    :
    : "x" (logdata)
    : );
}

Before explaining the above code, it helps to understand the standard avr-gcc register usage.  Functions are free to use registers r18-r27 and r30-31, which means that if the code the calls a function is using any of these registers, they need to be saved before calling the function.   This applies even if the called function doesn't even modify any of the registers, since the compiler doesn't do inter-procedure register analysis.  The following small program and assembler code shows what I mean:

void main(void)
{
    volatile uint8_t* ioreg;
    uint8_t offset = 0x3f;
    eelog(42);
    ioreg = (uint8_t*)0x0020;
    do {
        eelog(*(ioreg + offset));
    } while ( offset-- );
}

00000092 <main>:
  92:   cf 93           push    r28
  94:   8a e2           ldi     r24, 0x2A       ; 42
  96:   f4 df           rcall   .-24            ; 0x80 <eelog>
  98:   cf e3           ldi     r28, 0x3F       ; 63
  9a:   ec 2f           mov     r30, r28
  9c:   f0 e0           ldi     r31, 0x00       ; 0
  9e:   80 a1           ldd     r24, Z+32       ; 0x20
  a0:   ef df           rcall   .-34            ; 0x80 <eelog>
  a2:   c1 50           subi    r28, 0x01       ; 1
  a4:   d0 f7           brcc    .-12            ; 0x9a <main+0x8>
  a6:   cf 91           pop     r28
  a8:   08 95           ret

The compiler uses r28 for the offset counter, since the function being called is not allowed to use r28/r29 (aka the Y register).  Since r30/31 (Z) can be modified (clobbered) by the function, the Z register needs to be initialized each time through the loop (at 0x9a and 0x9c).  The single parameter is passed in r24.

Although I've been programming in AVR assembler for a few years now, it has taken me a while to learn inline asm.  I still prefer writing in plain asm, but inline asm is the only way I've found to call assembler functions from C without having to follow the standard calling convention.  The sample code at the start of this post defines a function named eelog that takes a single parameter passed in r26 (the low byte of the X register).  Using this technique, the compiler can safely use r24 for the offset counter instead of r28:
00000080 <main>:
  80:   aa e2           ldi     r26, 0x2A       ; 42
  82:   08 d0           rcall   .+16            ; 0x94 <eelog_>
  84:   8f e3           ldi     r24, 0x3F       ; 63
  86:   e8 2f           mov     r30, r24
  88:   f0 e0           ldi     r31, 0x00       ; 0
  8a:   a0 a1           ldd     r26, Z+32       ; 0x20
  8c:   03 d0           rcall   .+6             ; 0x94 <eelog_>
  8e:   81 50           subi    r24, 0x01       ; 1
  90:   d0 f7           brcc    .-12            ; 0x86 <main+0x6>
  92:   08 95           ret

A careful reader will notice that more optimal code would use r30 for the offset counter, but even recent versions of avr-gcc aren't that smart.  Given other examples of poor optimization have persisted for several years, it is unlikely avr-gcc will get as good as hand-written asm anytime soon.  Another thing sharp readers may notice is that the volatile keyword in the "asm volatile" statement is superfluous because asm statements that have no output operands are implicitly volatile.  I think it is best to leave it in, in case the function is later changed to have an output operand, or if it is used as a template for another function.

If anyone is wondering why I didn't make a simplified version of the the standard calling convention that just used r24/25, it is because there is no inline asm constraint for r24/r25.  The constraint "w" might get the compiler to use r24/25, but it also could use r26/27 (X), r28/29 (Y) or r30/31 (Z).

Conclusion

Using this technique can significantly reduce register pressure on the compiler, which will reduce code size and increase speed.  Inline asm constraints could also be used for a function that returns multiple values, without having to use a C struct.  Code can be found in my new avrutils repository.


No comments:

Post a Comment