Wednesday, February 12, 2020

Building a better bit-bang UART - picoUART

Over the past years, one of my most popular blog posts has been a soft UART for AVR MCUs.  I've seen variations of my soft UART code used in other projects.  When MicroCore recently integrated a modified version of my old bit-bang UART code, it got me thinking about how I could improve it.

There were a few limitations to my earlier UART code.  One was that it didn't support baud rates below 19.2kbps at 8Mhz or baud rates below 38.4kbps at 16Mhz.  It was also problematic for people that tried to integrate it into C/C++ libraries, as the code was written in AVR assembler.  Another problem that was recently brought to my attention by James Sleeman, was that the UART receive didn't work well at moderately high baud rates such as 57.6kbps.  Since my AVR skills had improved over time, I was confident I could make tangible improvements to the code I wrote in 2014.


The screen shot above is from picoUART running on an ATtiny13, at a baud rate of 230.4kbps.  The new UART has several improvements over my old code.  To understand the improvements, it helps to understand how an asynchronous serial TTL UART works first.


Most embedded systems use 81N communication, which means 8 data bits, 1 stop bit, and no parity.  Each frame begins with a low start bit, so the total frame is 1 start bit + 8 data bits + 1 stop bit for a total of 10 bits.  Frames can be sent back-to-back with no idle time between them.  The data is sent at a fixed baud rate, and when either the receiver or transmitter varies from the chosen baud rate, errors can occur.

When it comes to the timing error tolerance of asynchronous serial communications, I've often read that somewhere between 2% and 3.5% timing error is acceptable.  I've also read many "experts" claim that a micro-controller needs an accurate external crystal oscillator in order to avoid UART timing errors.  The truth is that UART timing can be off by a total of over 5% without encountering errors.  By total, I mean the sum of the errors for both ends, so if a transmitter is 2% fast, and the receiver is 2% slow, the 81N data frames can still be received error-free.  The timing on a USB-TTL UART adapter is usually accurate to within 0.1%, so if I am sending data from an AVR that is running 3% slow, my PL2303HX adapter still receives it error-free.

If a frame is being transmitted at 57.6kbps, each bit needs to last 1000/57.6 = 17.36us.  That means 17.36us after bringing the line low for the start bit, the least significant bit needs to be sent.  A receiver will wait for the start bit to begin, wait another 17.36, and then wait for the middle of the first bit to sample the line.  If the line is high, the bit is a 1, and it it is low, the bit is a zero.  So the receiver will sample the first bit 1.5 * 17.36 = 26.04us after the line goes low to signal the start bit.  The last(8th) bit will be sampled after 8.5 *17.36 = 147.56us.  If the transmitter is to slow, and is still transmitting the 7th bit, it will cause a communication error, as the receiver will interpret the 7th bit as actually being the 8th bit.  If the transmitter is still sending the 7th bit after 147.56us, then it is sending at 8/8.5 or 0.941 * 57.6 = 54.2kbps.  Since many UARTs check for a valid stop bit, the maximum timing error is usually 9/9.5 or 94.7% of the baud rate.

The transmit timing of my earlier soft UART implementations is accurate to within 3 clock cycles.  This was each iteration of the delay loop takes 3 clock cycles - one for decrement and two for the branch:
    ldi delayArg, TXDELAY
TxDelay:
    dec delayArg
    brne TxDelay

And since delayArg is an 8-bit register, the maximum delay added to the transmission of each bit is 2^8 * 3 = 768 cycles.  On a MCU running at 8Mhz, that limited the lowest baud rate to around 8000/768 or 10.4kbps.  To allow for lower bit rates, picoUART needed to support longer delays.  I also wanted to support more accurate timing, so picoUART uses __builtin_avr_delay_cycles during the transmission of each bit.  The exact number of cycles to wait is calculated by some inline functions, which is a better way of doing the calculations than the macros I had used before.  Writing picoUART in C made the timing calculations more difficult, since compiler has some flexibility in how the code is compiled to AVR machine instructions.  In order to get avr-gcc to generate the exact sequence of instructions that I wanted, I had to use one inline asm statement.  When I used a C "while" loop instead of the asm goto "brne" instruction, the loop was one cycle longer due to a superfluous compare instruction.  Future versions of the compiler may have improved optimization and omit the compare, which would slightly impact the timing.

As with the transmit code, picoUART's receive code is accurate to within one cycle.  Unlike my earlier UART code, picoUART returns after reading the 8th bit instead of waiting for the stop bit.  Because of this change, picoUART begins by waiting for the line to be high before waiting for the start bit.  Without the initial wait for high, back-to-back calls to purx() could lead an error when the 8th bit of one frame is 0(low) and gets interpreted as the start bit of the next frame.  This change approximately triples the amount of time for the AVR to process each byte in a continuous stream of data.

My earlier UART code had two incompatible versions.  One version used open-drain communication, where the transmit line is pulled high by an external resistor, and pulled low by the AVR.  This version supported a single wire for both receive and transmit.  While it also worked with separate pins, many users found it inconvenient to add the pull-up resistor.  Instead they would my "push-pull" version, where the AVR drives the line high and pulls it low.  With picoUART a single version works for both use cases, because it works in "push-pull" mode only during transmit.  When not actively transmitting, the IO pin is set to input mode with the internal pull-up activated.

I've tried to help both the noobs and experienced AVR developers.  The noob can download a release zip file to add as an Arduino library.  If you are an old AVR developer like me that prefers a keyboard over a mouse, you'll find a basic Makefile with the echo example.  The default baud rate is 115.2kbps, although it is capable of accurate timing at much higher speeds such as 1mbps for an AVR running at 8Mhz.  The default transmit is on PB0, with PB1 for receive.  The defaults can be changed in pu_config.h, or with build flags like "-DPU_BAUD_RATE=230400L".

Saturday, January 11, 2020

Picoboot v3 with autobaud and timeout

Today I released v3 beta2 of picoboot.  Like the last release of picoboot, it takes up only 256 bytes, which is the minimum bootloader size supported on the ATmega88 and ATmega168.  This means picoboot will free up 256 bytes of flash if you currently use Optiboot.  Without any potential benefit from reduced size, this release focused on robustness and speed.


The above screen shot shows reading the 16kB flash memory of an ATmega168 in 1.32 seconds.  Using 500kbps instead of 250 will read the flash in under one second, and will read 32kB of flash from an ATmega328 in two seconds.  Not only is it fast, it is reliable, with no errors using CH340G and CP2102 adapters under Windows, and PL2303HX adapters under Linux.  So as long as your serial driver supports 250 or 500kbps and doesn't round them down to 230,400 and 460,800, you can have reliable and high speed uploading and verify of code on ATmega MCUs.

Earlier versions of picoboot supported a bootloader toggle mode, where resetting the MCU once entered the bootloader, and resetting again ran the application code.  I designed this with boards that don't support the auto-reset functionality of the Arduino bootloader.  However this turned out to be problematic with some boards that do have auto-reset, where picoboot could sometimes toggle out of bootloader mode when it was supposed to enter bootloader mode.  With v3, picoboot now implements a timeout where it will wait for a few seconds and if no communication is received from avrdude, the bootloader will exit.

Like the previous versions, picoboot does not use the watchdog timer, and will not impact application code that uses the watchdog reset.  To make picoboot useful for use with a standalone AVR on a breadboard, it does not rely on a user LED on PB5 to indicate bootloader activity.  Instead, when the bootloader starts, it lowers the TXD line (PD1).  This will light the RX LED on the attached serial adapter.  If the bootloader times out, PD1 will be left floating before the bootloader exits.

My recommended baud rate for picoboot is 250kbps.  This baud rate results in 0 timing error with the AVR USART when used with the common clock rates of 8, 12, and 16Mhz, as well as the less-common 20Mhz.  The faster 500kbps also results in 0 timing error with the USART, however poor design of some serial adapters  makes the higher speeds more susceptible to noise, particularly when long wires are used to connect to the AVR.  I didn't encounter problems at 500kbps, but I was a bit surprised by how much noise I saw on my oscilloscope when testing a CP2102.

If you are using the Arduino IDE rather than the command line, I explained how to change the boards.txt file in my blog post about picoboot v1.

I plan to test v3 beta 2 for about a month, so expect the final v3 in early February.  In addition to further testing on the mega168 and mega328, I'll test the mega88.  If there is enough interest in a build for the mega8, I'll look into supporting it too.



Tuesday, December 10, 2019

Arduino code on the ATtiny13

Optimization and minimalism are things I appreciate in technology.  Since the standard Arduino AVR core takes 1-2kB of flash memory, one might think that porting to a MCU with 1kB of flash and 64 bytes of RAM would be a futile effort.  However Hans [MCUDude] has managed to support most of the Arduino core API with MicroCore, while only using a tiny amount of flash and RAM.

Although you can't use Serial.print, digitalRead, digitalWrite, analogRead, and analogWrite are all implemented.  When the millis timer is not needed, it can be disabled to save space.  When it is used, it takes about 60 bytes of flash thanks to my AVR assembler implementation.  MicroCore also includes my optimized versions of shiftIn and shiftOut that are faster and smaller than the standard Arduino versions.

One thing that is missing from the MicroCore documentation is details on how small it is.  An empty sketch takes just 46 bytes of flash and no RAM.  A blink example using sleep and millis takes 154 bytes of flash and 5 bytes of RAM.  The buttons example for my TM1638 library takes 266 bytes of flash and 0 bytes of RAM.

While has MicroCore has a software SPI implementation (TinySPI), it doesn't have I2C/Wire.  I've been working on a bit-bang I2C library that is optimized for size and speed.  On a 9.6Mhz ATtiny13 it runs at 660kHz.  The next step before I release it is to make it as compatible as reasonably possible with the Arduino Wire library.

In addition to expanding the functionality of MicroCore, I'm hoping to simplify the use of millis so it is not linked in when it is not used.  This would avoid having to modify core_settings.h, or clutter up the IDE menus with an option to enable/disable millis.  I keep an eye on the github issues list for MicroCore, and will implement requested improvements that I find interesting.

Wednesday, July 18, 2018

Reading extended signature bytes with AVRdude


AVR MCUs like the ATtiny85 and the ATtiny13 store their signature and RC oscillator data in a special page of flash.  Just like the flash for program storage, this special page of flash can be erased and reprogrammed.  If you are not careful with your ICSP connections, it's not hard to accidentally erase this special page of flash.  I have 2 ATtiny13 chips that had their signature erased when I forgot to connect the power wire from a USBasp.  The voltage on the MOSI and SCK lines was enough to power up the ATtiny13, but without a stable Vcc, the serial programming instructions were scrambled in such a way that the chips interpreted them as an undocumented command to erase the signature page.

Official documentation for the signature table is rather terse, but forum and blog posts about this special page of flash date back for more than a decade.  Here is an example of the official documentation from the ATtiny85 datasheet:

The last row in the table has an error, since the page size of the t85 is 64 bytes, the signature table addresses go up to 0x3F, not 0x2A.  For devices like the ATtiny25 and ATtiny13 with a 32-byte page size, the signature table addresses only go up to 0x1F.  Most AVRs have the ability to read the signature table in software using the SPM instruction and the RSIG bit in SPMCSR.  The ATtiny13 does not not support reading the signature in software, so the only way to read it is through the serial programming interfaces.  I even tried setting the reserved bits in SPMCSR on the t13 in case reading the signature through software is an undocumented feature, but that did not work.

For most users, the ability to read the extended signature bytes is likely academic.  Since each chip seems to have slightly different data in the reserved signature area, it could be used for a sort of serial number to keep track of different chips.  The practical use for reading the signature page is when you can also reprogram it.  Since the default OSCCAL value is stored in the signature table, it would be possible to tune the frequency to a different default value.  One of my goals for reprogramming the signature page is to have a UART-friendly clock rate for debugWire.  Having the default OSCCAL value correspond to a clock speed close to 7372.8Mhz means debugWire will run at a baud rate of 7372.8/128 = 57.6 kbps.  It would also be possible to store other calibration data such as a more precise measurement of the internal voltage reference than what is specified in the datasheet.

While my ultimate goal is to create an AVR programmer that will read and write the signature table, a simple first step is to have AVRdude read the signature.  Fortunately, the programming sequences used by AVRdude are not hard-coded in the source.  The avrdude.conf file contains information on the command sequences to use for different protocols such as standard serial and high-voltage programming.  To get AVRdude to read 32 bytes of signature data instead of just 3, make the following changes to the ATtiny85 section of the avrdude.conf file:

  memory "signature"
     size    = 32;
     read    = "0  0  1  1   0  0  0  0   0  0  0  x   x  x  x  x",
               "x  x  x a4  a3 a2 a1 a0   o  o  o  o   o  o  o  o";

To read the 32 bytes of calibration data (which is just the odd bytes of the signature page), make the equivalent change for memory "calibration".

The next step in my plan is to write a program that tests undocumented serial programming instructions in order to discover the correct command for writing the signature page.  Since there are likely other secret serial programming instructions that do unknown things to the AVR, I could end up bricking a chip or two in the process.  If anyone already knows the program signature page opcode, please let me know.

Saturday, July 7, 2018

Sonoff S26: OK hardware, bad app

After I read about the Sonoff S26, and that it is compatible with Google home, I decided to order a couple of them.  In addition to using them as a remote-controlled power switch, I'm interested in resuming my experimentation with the esp8266 that the devices use.

The physical construction is OK, though, as can be seen from the photo above, the logo is upside-down.  I know there is some debate over which way a Nema 5-15R outlet should be oriented, with the ground down being more common in residential, and ground up perhaps being more common in commercial.  The readability of the logo does not effect functionality, so that's not a big deal.  What does effect functionality is the size of the S26, which can partially block the ability to use both outlets.

Cnx-software has a teardown of the S26, so I won't bother repeating any of the details.  The unit I received was virtually indistinguishable from the one depicted in the teardown.  The one slight difference I noticed was the esp8266 module in my S26 has a hot air solder leveling finish (HASL) instead of electroless nickel immersion gold (ENIG).

The eWeLink app is my biggest gripe about the s26.  In order to install the app, it requires numerous permissions that have nothing to do with controlling the s26.  Things like access to device location and contacts are an invasion of privacy, and a liability for itead if someone hacks their database and gets access to all the personal information from eWeLink users.  The only permission the app should need is local wifi network access.

After holding my nose over the permission requirements, it didn't get much better once the app was installed.  It does not work in landscape mode, and once you create an account, it doesn't remember the email when it gets you to log in to the newly created account.  The menus and some of the fonts are rather small, especially on an older phone with a 4" screen.

I also tested out the firmware update function in the app.  My s26 came with v1.6 firmware, which I updated to v1.8.  The new firmware version is supposed to support direct device control over the LAN without an active internet connection.  After the upgrade the device info indicated it was running the new firmware, but direct LAN control did not work.  Without an active internet connection, when I tried turning the s26 on, I got an "operation failed" message.

My last gripe is about the cumbersome process of setting up devices to work with Google home devices.  Having to first setup an account for the Sonoff devices, then having to link that account with the google home account is cumbersome and error-prone.  The google home app should be able to scan for devices the same way it can scan the local network and find a Chromecast.  And for people that are using Sonoff devices without Google home or Amazon, the app should allow users to control devices locally without having to set up any account.

Friday, July 6, 2018

Testing the IBM Cloud


Although I have a Google Cloud free account, I recently decided to try out IBM's Cloud Lite account.  I wasn't just interested in learning, I was also wondering if it could be a viable backup to my Google cloud account.  I'm not concerned with reliability, rather I'm concerned with dependability, since free services could be discontinued or otherwise shutdown.

The Cloud lite description says it includes, "256 MB of instantaneous Cloud Foundry runtime memory, plus 2 GB with Kubernetes Clusters".  I hadn't heard of Kubernetes before, but from a quick review of their web site, it appears to be a platform for deploying scalable docker images. For comparison, the Google compute platform which provides a Linux (Ubuntu 16.04 in my case) VM with 512MB of RAM and 30GB of disk.  I prefer the simplicity and familiarity of a Linux VM with full root access, but I thought there still should be a way to run a LAMP image with Kubernetes.

The IBM Cloud dashboard allows you to choose from the available services based on your account options.  Choosing the IBM Cloud Kubernetes Service from the dashboard links to another page to create a cluster.  However clicking on the create cluster button brings up a new page with the message: "Kubernetes clusters are not available with your current account type."  I opened up a support ticket about it, but after 10 days there has been no action on the ticket.

Since the Kubernetes wasn't working, I decided to try Cloud Foundry Apps.  In order to use cloud foundry, it is necessary to download their commandline tools.  With the Google cloud you get access to a development shell separate from your compute instance VM, and that shell has all the google cloud tools pre-installed.  This is one way the Google cloud is easier and simpler to use.

Instead of setting up a bare cloud foundry app, I decided to use their boilerplate Flask application.  The setup process in the web dashboard lets you choose a subdomain of mybluemix.net, so I chose http://rd-flask.mybluemix.net/.  In order to modify the app, IBM's docs instruct you to download the sample code, make changes locally, then push the changes to the could using their CLI tools.  However, at least in the case of the Python Flask app, there was no download instructions, and no link to the sample code.  After going through the CLI docs, I found it is possible to ssh into the vm instance for your app using "bluemix cf ssh". Yet any changes I made to the code online were wiped whenever I restarted the service.

After some more research, I realized this problem is several months old. In the end, I was able to find an earlier version of the template code, fork it, and updated it to match the code running in the app container. The repo I forked specified "python-2.7.11" in runtime.txt, but the cloud foundry only supports 2.7.12 & 2.7.13 (along 3.x). At first I changed it to 2.7.12 in my fork, but then I removed the runtime.txt so it will use the default version of python. I also added some brief instructions on uploading it to the IBM cloud. You'll find the fork at https://github.com/nerdralph/Flask-Demo.

Saturday, June 23, 2018

Using shiftIn with a 74165 shift register


The Arduino shiftIn function is written to be used with a CD4021.  The 74165 shift register is another inexpensive and widely available parallel-input shift register that works slightly different than the 4021.  The logic diagram of the 74HC165 is depicted in the figure above, and shows Qh is connected directly to the output of the 8th flip-flop.  This means that the state of Qh needs to be sampled before the first clock pulse.  With the 4021, the serial output is sampled after the rising edge of the clock.  This can also be determined by looking at the source for shiftIn, which sets the clock pin high before reading the serial data pin:
  for (i = 0; i < 8; i++)  {
    if (bitOrder == LSBFIRST)
      digitalWrite(dataPin, !!(val & (1 << i)));
    else  
      digitalWrite(dataPin, !!(val & (1 << (7 - i))));
      
    digitalWrite(clockPin, HIGH);
    digitalWrite(clockPin, LOW);    
  }

By setting the clock pin high, setting the latch (shift/load) pin high, and then calling shiftIn, the first call to digitalWrite(clockPin, HIGH) will have no effect, since it is already high.  Here's the example code:
  // set CLK high before shiftIn so it works with 74165
  digitalWrite(CLK, HIGH);
  digitalWrite(LATCH, HIGH);
  uint8_t input = shiftIn(DATA, CLK, MSBFIRST);
  digitalWrite(LATCH, LOW);

Like much of the Arduino core, the shiftIn function was not well written.  For a functionally equivalent but faster and smaller version, have a look at MicroCore.