Saturday, April 19, 2014

gcc link time optimization can fix bad programming practices

I like to write efficient code, but realize that many people relatively new to C programming don't understand some of the simple ways to write fast and efficient code.  Even some popular sites don't demonstrate best practices in their code, such as the blinking LED Arduino tutorial.  I've written a generic AVR C version:
#include <avr/io.h>
#include <util/delay.h>

int LEDPIN = 1;
void main(void)
  while (1) {
    PINB |= (1<<LEDPIN);

I build it with avr-gcc 4.8.0:
avr-gcc -mmcu=attiny85 -DF_CPU=8000000 -Os -Wall -Wno-main    blink.c   -o blink

Then I check the size:
$ avr-size blink
  text    data     bss     dec     hex filename
   122       2       0     120      7e blink

To show what the problem is here's part of the disassembled code (avr-objdump -D blink):
0000002a <__do_copy_data>:
  2a:   10 e0           ldi     r17, 0x00       ; 0
  2c:   a0 e6           ldi     r26, 0x60       ; 96
  2e:   b0 e0           ldi     r27, 0x00       ; 0
  30:   e4 e7           ldi     r30, 0x74       ; 116
  32:   f0 e0           ldi     r31, 0x00       ; 0
  34:   02 c0           rjmp    .+4             ; 0x3a <__do_copy_data+0x10>
  36:   05 90           lpm     r0, Z+
  38:   0d 92           st      X+, r0
  3a:   a2 36           cpi     r26, 0x62       ; 98
  3c:   b1 07           cpc     r27, r17
  3e:   d9 f7           brne    .-10            ; 0x36 <__do_copy_data+0xc>
  40:   02 d0           rcall   .+4             ; 0x46 <main>
  42:   16 c0           rjmp    .+44            ; 0x70 <_exit>

00000046 <main>:
  46:   21 e0           ldi     r18, 0x01       ; 1
  48:   30 e0           ldi     r19, 0x00       ; 0
  4a:   46 b3           in      r20, 0x16       ; 22
  4c:   c9 01           movw    r24, r18
  4e:   00 90 60 00     lds     r0, 0x0060
  52:   02 c0           rjmp    .+4             ; 0x58 <main+0x12>
  54:   88 0f           add     r24, r24
  56:   99 1f           adc     r25, r25
  58:   0a 94           dec     r0
  5a:   e2 f7           brpl    .-8             ; 0x54 <main+0xe>
  5c:   48 2b           or      r20, r24
  5e:   46 bb           out     0x16, r20       ; 22
  60:   4f ef           ldi     r20, 0xFF       ; 255
  62:   84 e3           ldi     r24, 0x34       ; 52
  64:   9c e0           ldi     r25, 0x0C       ; 12
  66:   41 50           subi    r20, 0x01       ; 1
  68:   80 40           sbci    r24, 0x00       ; 0
  6a:   90 40           sbci    r25, 0x00       ; 0
  6c:   e1 f7           brne    .-8             ; 0x66 <main+0x20>
  6e:   00 c0           rjmp    .+0             ; 0x70 <main+0x2a>
  70:   00 00           nop
  72:   eb cf           rjmp    .-42            ; 0x4a <main+0x4>

All the code in __do_copy_data is to copy any global variables (in this case LEDPIN) from flash to RAM.  The code from address 46 to 5e is to set the bit in PINB, based on the value of LEDPIN, which is stored at address 0x0060 in RAM.  This could be done with a single sbi (set bit) instruction, but the compiler doesn't do that, because the code from blink.c could be linked with other code that changes the global variable LEDPIN.  This is even when the output is a linked elf file like blink, because other object files can still be added to it.

By making the LEDPIN variable const, the compiler should know the bit to set will always be bit 1.  As expected, with that change, it generates a single sbi instruction, instead of the 12 instructions above:
  46:   b1 9a           sbi     0x16, 1 ; 22

However it still copies the value of LEDPIN to RAM with the __do_copy_data code.  This is because some other code that gets linked in may use the global variable LEDPIN.  By making the variable static, the compiler will know it is not used outside the current file.  So after changing the code as follows:
static const int LEDPIN = 1;
The code is much smaller:
   text    data     bss     dec     hex filename
     74       0       0      74      4a blink-static-const

But we can't easily make every new C programmer a really good programmer, so that's why having a compiler that can figure out that for the blink application LEDPIN is only defined once, is only used in the main function and nowhere else.  That's one of the things that gcc's link time optimization (LTO) is can do.  After adding the -flto compiler flag, the original version now compiles to 74 bytes - the same as when defining LEDPIN as static const:
   text    data     bss     dec     hex filename
     74       0       0      74      4a blink-flto

A code-generation bug was introduced in avr-gcc 4.8.0, so I suggest using 4.7.3 or waiting for 4.8.3 which will contain a fix.

Friday, April 11, 2014

Tuning 433Mhz ASK modules

I purchased a transmit/receive pair of modules from DX, but despite other people that have been able to get 20m range with the same type of modules, I was having no luck beyond 3-4m.

After doing some more research, I decided to take another shot at getting more distance out of the modules. The receiver supposedly has a wide bandwidth, and can be tuned with an adjusting screw (a variable inductor). I soldered some wires to a 3.5mm audio jack to listen to the output of the receiver, and wrote a small program to generate a 3.3kHz waveform:
/* output 3.3kHz tone on PORTB */
#define F_CPU 8000000L

#include <avr/io.h>
#include <util/delay.h>

#define TONEPIN 0     /*connect to transmitter data pin */
void main()
  DDRB = (1<<TONEPIN);    /* output mode */
  while (1) {
    PINB = (1<<TONEPIN);
I plugged the audio jack into my computer line-in and configured it to play the audio from line-in.  A set of external speakers with a 3.5mm audio jack would work just as well. I adjusted the tuning screw until I could clearly hear the 3.3kHz tone. I then took the transmitter (on a small breadboard with battery power) around the house, and outside. With the transmitter on top of my car in the driveway some 10-12m away, I could still clearly hear the tone (with a bit of static).

At short range these modules worked doing 9600bps, it seems they don't have much of a low-pass filter. So I plan to add one with a cutoff around 2kHz, then do some tests with 1200bps data.

The 315Mhz version of these modules appears to have a very similar circuit with a tuning coil, so this technique should work them as well.

Saturday, April 5, 2014

ATtiny85 as a 433Mhz transmitter - fail

I bought a 433mhz transmitter and receiver with the intention of using them for battery-powered wireless sensor nodes.  In small quantities they can be purchased for <$1/pair, they can easily be used with libraries like VirtualWire, or they can even be used for serial UART communications.

My intention is to have multiple intermittent transmitters per receiver, so if I purchased 10 pair, I'd have 9 unused receiver modules.  I thought I might be able to get my ATtiny85's to transmit a 433Mhz signal.  I had read about spritesmod's FM transmitter hack, however 433Mhz is well beyond the 85Mhz maximum frequency spec for the t85's PLL.  And the maximum square wave frequency output is half the PLL frequency - so even if I could overclock the PLL to 100Mhz, I could at best output a 50Mhz square wave.

I considered using multiple clock doubler circuits, however after some thinking I remembered it's still possible to generate a 433Mhz signal as a harmonic of a lower frequency square wave.  After trying some differing multiples, I worked out that if I generated a 39.45Mhz square wave, the 11th harmonic would be very close to 433.92Mhz.  That would require the PLL to run at 78.9Mhz, and the internal RC oscillator to run at 78.9/8 =9.86Mhz instead of the normal 8Mhz.

The RC oscillator frequency on the tiny85 is tuned by changing the value of the OSCCAL register.  Figure 22-42 of the datasheet shows it can be tuned to over 14Mhz:
I wrote a small C program which used timer1 to output a square wave, and used a logic analyzer to measure the frequency.  After a few tries, I found that adding 23 to the default OSCCAL value tuned the RC oscillator to approximately 9.86Mhz.  Once that was done, I modified the code to enable the PLL, and output a square wave of 1/2 the PLL frequency and an ASK duty cycle of 30ms on and 50ms off.  For an antenna, I cut a 17cm long piece of 24AWG copper wire from some cat5 cable and plugged it into my breadboard connected to pin6 (OC1A) of the t85.  I hooked up my logic analyzer to the 433Mhz receiver's rx pin, and here's what I got:
The signal is recognizable, but the transmitter and receiver were only 15cm apart, and the signal wasn't completely clean.  Once I moved the t85 more than 50cm away, I could see only noise.  The 433Mhz transmitter that came with the receiver didn't work at the 10m range that some people have been able to get, but it did work well 2-3m from the receiver.

So why didn't it work?  I'm guessing that the ATtiny85 output drivers do not generate a sharp enough square wave to create strong harmonics at 433Mhz.  An EDN article I found states, "most digital output waveforms follow a nearly Gaussian profile", meaning the transitions do not have significant high-frequency components.

I think there is still potential in the idea.  I may purchase a 315Mhz receiver, or see if I can re-tune the 433Mhz receiver to 315Mhz (there is a screw on the board that looks like a variable capacitor).  If I output a 35Mhz square wave, the 9th harmonic at 315Mhz will be much stronger than the 433Mhz signal.  Running the PLL at 90Mhz would generate a 45Mhz square wave, and the 7th harmonic should be stronger than the 9th harmonic at 35Mhz.   If that still doesn't work well enough, I might try fast swtiching mosfets or transistors that are rated >500Mhz, in order to generate a sharper output waveform.  Comment if you have any other ideas.

Wednesday, March 12, 2014

aspspi - USBasp SPI terminal

In the past couple months I've one of the things I've been working on is controlling nrf24l01 modules.  Communication with the modules is via SPI, and debugging was somewhat tedious as it involved modifying code on the AVR, downloading the code, and observing the results output via serial UART.

While writing the picoboot module for avrdude, I noticed avrdude has a terminal mode which allows for commands to be sent to an AVR.  The serial programming interface for AVR MCUs uses SPI, similar to the nrf modules.  I realized instead of an AVR, the programmer could be attached to any device that communicates via SPI.  A couple hours later I had finished writing aspspi for avrdude - which permits a USBasp to be used for interactive SPI communication.

Avrdude requires an avr part (-p), which is not applicable to aspspi, so just use any part listed in avrdude.conf.  The -F argument is to ignore signature checking (since there isn't an avr attached), -u turns off safe mode, and -t enters terminal mode.

In the screen shot above "send 3 0 0 0" sends the command to read register 3 on the nrf.  The response is the status (0e) and the contents of register 3 (03) repeated 3 times.  Sending 0x23 is the command to write to register 3.  As a single-byte register the first 2 zeros are ignored, and the last byte (1) is what gets written to the nrf.

With aspspi I was able to interactively test nrf modules, and I even found an undocumented register 0x1e, which contains at least 3 bytes: 68 66 05.

The code for aspspi is in the picoboot-avrdude repository.

Wednesday, February 26, 2014

Picoboot beta1 release

Today I've released the beta-1 version of picoboot.  Requiring only 66 bytes of flash, picoboot is the smallest AVR bootloader, taking a quarter of the space required by other "tiny" bootloaders which start at around 512 bytes.  Not only is it the smallest available bootloader, it is the smallest possible bootloader for AVRs with a 64-byte page size.  Picoboot is also fast, taking less than 3 seconds to write 8126 bytes to flash.

Future plans include builds for the ATtiny2313a and support for zero-wire auto-reset.

$ avrdude -c picoboot -p t85 -P com16

avrdude.exe: AVR device initialized and ready to accept instructions

Reading | ################################################## | 100% 0.00s

avrdude.exe: Device signature = 0x1e2a00
avrdude.exe: programmer operation not supported
avrdude.exe: programmer operation not supported
avrdude.exe: programmer operation not supported

avrdude.exe done.  Thank you.

Monday, February 17, 2014

Breadboard programming cable for ATtiny85, ATtiny88, ATmega328, ATtiny2313, and other AVR MCUs

For programming AVR MCUs, I use a USBasp.  Initially, I would connect the USBasp header pins to header pins on the breadboard with individual jumper wires, then jumper the appropriate pins on the MCU.  Then I noticed a repeating pattern in the pinout of many AVRs.  Here's an example:
Notice the pattern?  MOSI, MISO, SCK, & VCC are all in the same order.  Then I found this page with instructions on building a programming cable for an Arduino mini.  I decided to build a simpler programming cable that would work on a number of AVR MCUs and Pro Minis with a minimum of jumper wires.  Here's the pin arrangement I decided on:

  • GND
  • RST
  • VCC
  • SCK
  • MISO
  • MOSI
The connector works on the ATtiny85 and ATtiny2313 with 2 jumpers (GND & RST).  It needs 3 on the pro mini (GND, RST, & VCC), and with the ATtiny88 and ATmega328, just 2 jumpers are needed - RST and one connecting AVCC to VCC.
Here's the finished result, and the 10-pin ribbon cable with a wire I used to key the connector so I won't plug it in the wrong way.

Thursday, February 6, 2014

Zero-wire serial auto-reset for Arduino

Various versions of the Arduino will reset the board by toggling the serial DTR line, a feature called auto-reset.  Since it relies on the DTR line, it won't work with TTL serial adapters that don't break out the DTR line.  After writing my half-duplex serial UART, I thought of using the TTL serial break signal which holds the line at 0V for several ms.  Normal serial communications would also send 0V, but at 57.6kbps, it would never last more than 160us before returning to the idle high voltage state.  So what I needed was a circuit would not reset when the line is low for 160us, but would reset when the line is low for 100ms or more.

This can be done with a simple Resistor-Capacitor or RC circuit.  The time for a capacitor to discharge by 63% is equal to the resistance in ohms times the capacitance in farads.  Picking a common 0.1uF capacitor and a 10kOhm resistor gives an RC time constant of 1ms, high enough above the 160us time to provide a good safety margin.  Another reason for picking those values is that most pro mini boards, including the Baite Pro mini I recently purchased, use those values for the DTR auto-reset feature.  My plan was to make use of the existing components for my auto-reset circuit.  I also needed a way to quickly charge the capacitor, since the idle time between bytes won't be long enough to fully charge the capacitor through the resistor.  A diode serves that purpose, so the whole circuit takes just 3 parts.
The pro mini already has a 0.1uF capacitor in series between the DTR and RST line, so I shorted DTR to GND to make the connection between the capacitor and ground.  I used tweezers to move the tiny chip resistor (actually 11kOhm on the Baite board) from near the button to the space between the RXD and RST pins:

To finish off the circuit, I trimmed the leads on a diode and soldered it between the RXD and RST pins.  I chose to solder it to the reset on the left side of the board to avoid messing up the chip resistor I soldered on the right side.

To test it, I connected the pro mini to a TTL serial adapter, set up putty to open a serial connection, then pressed Ctrl-Break to send a break sequence.  I then saw the quick flashes from optiboot, indicating the board reset.  I also tested serial communications with a small program that echos characters typed, to make sure it wouldn't reset with normal communications.  The ASCII code for '@' is 0x40, so it has 7 zero bits and makes a good test character.  There was no resets while sending regular characters - so far so good.

Having completed the hardware, the final part was the software.  I looked at Avrdude and found that the auto-reset feature is in the arduino_open function of arduino.c.  I generally do PC software development under Linux, so to learn something different I decided to modify the windows version of avrdude to support my break auto-reset.  A quick google search revealed how to send a break sequence, so I modified ser_set_dtr_rts() in ser_win32.c to send a break signal in addition to toggling DTR and RTS:

        if (is_on) {
                EscapeCommFunction(hComPort, SETDTR);
                EscapeCommFunction(hComPort, SETRTS);
                EscapeCommFunction(hComPort, CLRBREAK);
        } else {
                EscapeCommFunction(hComPort, CLRDTR);
                EscapeCommFunction(hComPort, CLRRTS);
                EscapeCommFunction(hComPort, SETBREAK);

Modifying the code was the easy part; building it was going to take more work.  I found a page explaining how to build avrdude with MinGW, and tried to follow the directions.  The current version of libusb-win32 is slightly different.  Instead of usb.h the include file is named lusb0_usb.h.  I was building on a Window7 64-bit machine so I copied bin\ia64\libusb0.sys instead of libusb0_x86.dll to C:\MinGW\bin.  I also had to setup fstab as described here, by copying fstab.sample to fstab.  I thought I'd be ready to build, but when I ran configure I ran into the gcc.exe No Disk bug:

The machine has an internal USB flash memory reader, so rather than open up the computer and unplug it, I disabled "USB mass storage device" in the device manager.

Configure then ran without errors, and gave the following output:
Configuration summary:
DON'T HAVE libelf
DO HAVE    libusb
DON'T HAVE libusb_1_0
DON'T HAVE libftdi1
DON'T HAVE libftdi
DO HAVE    libhid
DON'T HAVE pthread
ENABLED    parport
DISABLED   linuxgpio

After compiling, I had a working avrdude.exe that supported my zero-wire auto-reset feature:
$ ./avrdude -c arduino -P com15 -p m328p -U flash:r:m328.bin:r

avrdude.exe: AVR device initialized and ready to accept instructions

Reading | ################################################## | 100% 0.01s

avrdude.exe: Device signature = 0x1e950f
avrdude.exe: reading flash memory:

Reading | ################################################## | 100% 4.00s

avrdude.exe: writing output file "m328.bin"

avrdude.exe: safemode: Fuses OK (H:00, E:00, L:00)

avrdude.exe done.  Thank you.

The patch for ser_win32.c and the avrdude.exe are here.  I've submitted the patch to Joerg Wunsch so it can be included in the next release of avrdude.
Edit: Jeorg Wunsch seems to be a patch nazi, so it probably won't get added to the official avrdude while he's in control.