Nerd Ralph

Don't use a $5 range outlet for EV charging

2024-07-06T17:20:00.000-07:00

Many level 2 home chargers plug into a NEMA 14-50R outlet, which is the same type a kitchen range would plug into. A 40A EVSE is considered a continuous load under the electrical code, and therefore needs a circuit rated for 25% more current. Even though the 14-50R outlet is technically a 50A outlet, when used as a kitchen range plug, they are often wired on a 40A circuit. This is permitted by the Canadian Electrical Code rule 26-744 5). Cheap 14-50R outlets intended for a kitchen range are not designed for continuous 40A use, as a range is not a continuous load. A 40A EVSE needs an outlet designed for high-current continuous loads to avoid overheating and melting the outlet.

I purchased two different heavy-duty 14-50R to evaluate them for EVSE use. The Leviton 1450R is designed specifically for EVSE use. The legrand 3894 is a heavy-duty outlet for ranges and EV chargers. I paid $60 for the Leviton 1450R at Home Depot, and $23 for the 3894 from Wesco. Given the price difference, and the fact that the Leviton 1450R is designed specifically for EVSE use, I was expecting the Leviton to be a better quality outlet. I was not disappointed.

The Leviton 1450R, at 207g, is heavier than the legrand 3894 at 153g. The front steel plate is thicker at 2mm, while the plate on the legrand is 1.4mm thick. The circular receptacle has a diameter of 2 7/16" on the Leviton and 2 1/8" on the levtion. That will be important to note when choosing a face plate. The most obvious electrical difference with the Leviton is the solid copper back electrical terminals. The legrand appears to use common brass, which has a resistance about 3.5 times as high as copper. The lower resistance copper terminals means lower temperatures while in use.

I was not able to measure the thickness of the metal used for the receptacle contacts where the plug fingers are inserted, as that would require a destructive teardown. Given the overall robust construction of the Leviton 1450R, I think it likely uses heavier gauge metal to reduce resistance, and to maintain a stronger mechanical contact with the plug.

For installation, the Leviton specifies 50 lbs of torque on the terminal screws, while the legrand specifies 20 lbs. Neither specifies the use of oxide inhibitor, but I'd suggest some DE-OX grease for extra protection.

Despite the high quality of the Leviton 1450R, it's not my preferred solution for an EV charger. A hard-wired charger is cheaper since it avoids the cost of the outlet, and will usually make better electrical contact than any plug-in solution. If you want a 14-50R outlet in your garage to support future plans for a home level 2 charger, then the Leviton 1450R would be a good choice.

2026 Update

I no longer recommend a hard-wiring an EV charger. I think the best choice is a good quality 6-50R or 14-50R outlet, with the charging limited to 30A for daily use. Charging at 30A generates 44% less heat than charging at 40A, reducing the chances of overheating. If your charger dies, plugging in a new charger is a easier than disconnecting and reconnecting a a hardwired charger. And if you move, it's easier to take your charger with you.

Hyundai Level 2 EV Charging Efficiency about 88%

2024-01-21T14:50:00.000-08:00

Unlike PV inverters, EV on-board chargers usually don't have efficiency specifications published by manufacturers. Studies done on charger efficiency are limited in the number of vehicles that can be tested. I decided to test the charging efficiency of a Canadian 2023 model Kia Sportage PHEV.

I performed the test with the Kyungshin EVSE that was provided with the vehicle, set to 12 amps. The EVSE was plugged into a 240V outlet via a 18m, 10 AWG extension cord. The L14-30R outlet is wired to the electrical panel with 6/3 aluminum cable. Measurements were made at the electrical panel using a Peacefair PZEM-016, logged with a python program I wrote. I initially took measurements with a clamp-style meter, but the readings were too variable, and could not account for power factor.

Considering the connectors and wire resistance, I estimate the losses between the electrical panel and the EVSE to be about 1%. For a typical home level 2 30-amp charger installation, the losses would be much higher, likely over 2%.

The Sportage PHEV battery capacity is 13.8 kWh. To charge from 20% to 100%, the PZEM-016 recorded 12.55 kWh of energy. The energy stored in the battery was 80% of 13.8, or 11.04 kWh. The efficiency is therefore 11.04/12.55 = 0.8797, or about 88%.

Level 2 EV Charging Deep Dive

2023-11-26T14:20:00.000-08:00

Electric vehicles have an on-board charger, which converts an AC input voltage to DC to charge the batteries. These chargers are designed to accept 120 volt input for level 1 charging, and 208/240 volt input for level 2 charging. The specs for the Hyundai OBC shown above indicate it can accept a wide range of 70 to 285 Vac, allowing it to work with almost any power grid in the world. Note that it is the OBC, not the EVSE, that rectifies and boosts the voltage to charge the battery. That's why an EVSE with a 120V 5-15R plug can be connected to a 240V source.

Although the Hyundai OBC is rated for 7.2 kW of output power, getting much more than 6 kW of input power has been difficult. The first reason is that power to commercial buildings is usually 3-phase 120/208V, so charging stations usually provide 208V. At the maximum input of 32 amps, that's 6656 watts. Having used both ChargePoint and Flo charging stations, I've noticed the majority of them are limited to 30 amps for level 2 charging. Those stations rated for 30 amps use 10 AWG flexible cord, which is limited to 30 amps according to table 12 of the Canadian Electrical Code. 30 amps at 208 volts is 6240 watts. Charging at more than 30 amps requires a more expensive larger cable.

I've also found the OBC doesn't seem to pull the full amperage advertized by the EVSE. The signalling used for J1772 charging doesn't communicate a precise amperage available to the OBC. It transmits a sequence of pulses, and the duty cycle timing of the pulses indicates the available amperage. I think the OBC reduces the current by a safety margin to allow for imprecise timing of the control signal pulses. When connected to a Kyungshin IC-CPD set to 12 amps, the on-board charger draws about 11 amps.

Lastly, some charging stations don't always provide the power that they advertize. Several of the ChargePoint chargers I've encountered are Leviton 4000 units. They support 16 amps per head, or charging from a single head at 30 amps. These stations are usually listed on the ChargePoint network as 6.6 kW, but with a 208V supply, you'll never see more than 6.2 kW. When both heads are being used, your vehicle will charge at no more than 3.3 kW.

I think home charging makes a lot of sense, but I see limited value in public level 2 chargers. I am not aware of any public chargers in Nova Scotia that accept payment by credit card. They require users to first set up an account and install an app in order to active chargers. When you do get a charger working, at a charging rate of 6 kW, you can't get much of a charge while you shop at a store or eat at a restaurant. I've seen a few businesses that offer free charging for customers, but after the novelty factor of free charging wears off, I wonder how much use they will get. Since charging at home costs 18.5 c/kWh including GST, getting free charging while you shop for a half hour only saves you 50c.

240V EV Charging for $5

2023-11-08T12:53:00.004-08:00

We recently purchased a PHEV which came with a portable home charger/EVSE. It plugs into a NEMA 5-15R outlet, and supports a maximum charging rate of 12 amps. At 120 volts, the maximum charge rate is 1440 watts, and the charge rate reported by the vehicle is usually 1.3 kW. The vehicle supports level 2 charging at up to 7.2 kW, but I didn't want to spend $400 to $500 for a good quality 30 amp level 2 EVSE.

The label on the portable EVSE listed an input of 12 amps and 120 volts, however I suspected 240 volts would be fine. The EVSE just passes through the AC power, generating a PWM signal on a control wire to indicate the amount of current the vehicle's on-board charger can draw. Of course, it's possible some home EVSEs for the North American market are built as cheaply as possible, and may not handle 240V. I am confident our Kyungshin IC-CPD is built to accept 240V. On the vehicle side, I checked the on-board charger label and saw that it has a wide input voltage, with a rating of 70-285Vac. I have a 14-30R 240 volt outlet in my garage, which is the same type of outlet an electric dryer uses, giving me an available source for 240V power.

To make an adapter for the portable EVSE, I used the cord I cut off a broken dryer, and a 5-15R connector. I used an Eaton 4887, which costs about $5 at local electrical suppliers. The Leviton 515CV is another option. The specs for the 4887 lists an input wire size of 12 to 18 AWG, however the 10 AWG stranded copper wires on the dryer cord were just able to fit.

With my portable EVSE adapter hack, the vehicle now charges twice as fast. It's probably more efficient too. The output of the on-board charger is 240-430 Vdc, and boost converter efficiency increases with a smaller difference between the input and output voltages.

Calculating Copper Wire Characteristics

2023-10-13T07:41:00.001-07:00

For the early of my life I've relied on tables or similar references to look up things like copper ampacities and resistance. Now I just remember a few constants, and can calculate what I need to know.

The photo above is a 12 AWG copper wire, commonly used in building wiring in the US and Canada. It has a diameter of .0808 inches or 2.052 mm. When used for building wire, it is typically limited to carrying 20 amps of current by breakers or fuses. For circuits carrying more current, 10 AWG wire with a diameter of 2.587 mm can be used. For a change of 2 AWG in wire size, the change in diameter is always 1.26, and therefore the change in cross-sectional area is 1.26^2.

The cube of 1.26 is 2.0, so an increase in size of 3 AWG will double the cross-sectional area of the wire, and reduce the linear resistance by half. The resistance of 10 AWG wire is 1 ohm per thousand feet (304.8 m), so the resistance of 16 AWG wire, often used in extension cords, is 4 ohms per thousand feet at room temperature. The amount of heat generated in a wire is calculated with the formula P=I^2xR. If 10 amps is flowing through 1000 feet of 16 AWG wire, the power dissipated will be 10^2 x 4 or 400 watts. If the wire is 10 AWG, and the current is 20 amps, the power dissipated will be the same 400 watts. However voltage drop will be lower, since V = I x R. The resistance of 10 AWG wire is a quarter of 16 AWG, so the voltage drop in the 10 AWG wire with 20 amps will be half of the voltage drop of the 16 AWG wire with 10 amps.

The resistance of copper increases with temperature, by 0.393% per degree C, so increasing the temperature by 25 C will increase the resistance by almost 10%. Calculating temperature increase due to power dissipation is quite complicated, so it is common in electrical codes to consider an ambient temperature of 30 C, and a temperature rise in the wire of no more than 30 C. Because of that, most wire sold in Canada that is CSA certified will use insulation rated for at least 60 C. The most common category of wire used in residential construction, NMD90, has an insulation temperature rating of 90 C.

On a final note, a wire labeled 16 AWG might not really be 16 AWG. I generally trust the electrical distributors like Rexel and Wesco, however before using some battery wire from a discount hardware store I'd inspect it carefully first.

MODBUS communication with Solis 4G-US inverters

2023-05-13T14:37:00.002-07:00

Solis single-phase inverters have a circular RS485 connector supporting MODBUS communication. The connectors can be difficult to find for sale outside China, so using a wifi data logger stick is a more straightforward way of communicating with the inverters. IGEN Tech is the OEM for the Solis wifi data loggers, which IGEN also sells under the SOLARMAN brand. While the same circular connector is used by many other inverter manufacturers such as Solax, RENAC, and KSTAR, the logger firmware is customized to read and report the MODBUS registers for a particular manufacturer.

The LSW-3 series of wifi loggers allow external programs to perform MODBUS queries via a TCP connection port 8899. I believe this is a variant of the MODBUS/TCP protocol that is assigned TCP port 502.

To perform MODBUS queries, I used pysolarmanv5. To connect to the logger pysolarmanv5 requires the logger serial number and IP address. Initially I read the serial number off the label of the logger, and looked up the IP address from the admin page of my router. Later I noticed solarman_scan.py, which sends a broadcast UDP packet which the data logger replies to. I sometimes had to run it more than once before the logger responded to the scan packet.

The Solis 4G-US series inverters are Sunspec MODBUS certified, and have the well-known 32-bit ‘SunS’ identifier (0x53756e53) at address 40001. This means it should be possible to read the registers by decoding the SunSpec information models and inverter device IDs. SunSpec shares some example code, however I haven't been able to figure it all out.

I couldn't figure out the Solis registers using SunSpec, but I was able to find the register documentation from Ginlong. The AC output power and DC input power are 32-bit registers at address 3005 using MODBUS function code 4 (input registers). I wrote a python program to read the output and input power and calculate the efficiency. It also reads the inverter temperature, and outputs the data every 5 minutes. I also wrote a small AWK program to calculate the weighted average of multiple samples. The code can be found in my github repo.

Over multiple days of output in the spring of 2023 including sunny and cloudy days, I observed an overall efficiency of 94% for a Solis 1P4K-4G-US. For a Solis 1P6K-4G-US I observed an average efficiency of 95%. This compares to respective advertised CEC weighted efficiencies of 97.5% and 97.0%.

KSTAR Single Phase String Inverters

2023-04-10T06:43:00.003-07:00

KSTAR New Energy makes single phase grid-tied inverters ranging from 1 kW to 10 kW. I tested a 3000S, a 5000D, and a 6000D that were produced in KSTAR's factory outside of Shenzhen. Their single phase inverters are marketed for locations with a 230 V line to neutral (L-N) grid. They also work with the split phase 240 V line to line grid that is typical in the US and Canada. They do not have UL 1741 certification, so they would require special engineering approval to be used for permanent installations with most US and Canada power utilities.

Residential inverters used in the US and Canada usually have an attached junction box with terminal connections for DC and AC wiring. In the rest of the world, inverters usually have MC4 connectors for the DC string input, and a watertight three-pin plug connection for the AC output. It is much more convenient having the plug connections when testing inverters and PV panels. It also avoids potential electrical code concerns when DC wiring up to 600 V and 240 Vac are in the same junction box.

The KSTAR inverters all included MC4 crimp connectors for terminating the DC strings. The AC connector will accept SOOW or SJOW cable with a outside diameter of up to 16 mm. I used 3-wire 12 AWG SOOW cable that is rated for up to 25 Amps.

The 3000S has a single string input, and a "nominal" output power of 3 kW. It is a light inverter, with a stated weight of 8 kg. Out of the box, the measured weight was 7.3 kg. The light weight makes it very easy for a single person to install. When hooked up to a test string of 10 72-cell panels, the efficiency was 85-86%. This is much lower than the spec efficiency of 97% or the 96% efficiency at nominal 380 V listed on the inspection and test sheet that was included with the inverter. With input power of 3070 W and input voltage of 367.7 V, the output power was 2620 W, for an efficiency of 85.3%. KSTAR sales and engineering were unable to explain the low efficiency.

The 5000D and 6000D have the same external dimensions and connections on the bottom. The weight of the 5000D is 11.74 kg, while the 6000D weighs 12.48 kg. This suggests the 6000D has different internal circuitry, likely larger inductors and capacitors, to support the higher power rating.

The efficiency of the 5000D and 6000D inverters ranged between 89 and 91%. The screenshot of monitor data below shows a total input power of 6240 W with AC output power of 5570 W, for an efficiency of 89.3%. This test was done with a large difference between the PV1 and PV2 voltages to represent typical residential PV installations which are not optimized for the inverter's 380 nominal string voltage.

The KStar inverters are reasonably priced and easy to install, but the low efficiency makes them unattractive compared to Growatt and Ginlong Solis inverters.

DC Wiring Losses in String and Microinverter Solar PV Arrays

2022-10-05T11:39:00.000-07:00

There are two common ways of wiring solar PV arrays. Each panel can be connected to a microinverter, with each microinverter connected in parallel to an AC bus. Alternatively, panels can be connected in series, with one or more DC strings connected to an inverter. Although there is debate over which design is best, at Solar Si, we prefer string inverters. This is an analysis of DC wiring losses with an array of 8 72-cell LONGi PV modules of about 450 Watts each.

There are two sources of wiring resistance in the array. The first is from the wire itself, and the second is from the connectors. The 12 AWG wire used for the panel output cables has a resistance of 5.2 mOhm/m. The MC4 connectors are specified to have a contact resistance of less than 0.5 mOhm. While this may be the resistance when tested in a clean and dry factory, test results in warm and humid conditions show a much higher resistance. Reliability Model Development for Photovoltaic Connector Lifetime Prediction Capabilities indicate resistance in the field is likely to be around 2.5 mOhm.

For the string array, the panels are arranged in the portrait configuration, with the inverter situated 1m from the array. The panels are about 1.06 m wide, making the length of the array 8.5 m. Each panel has a 20cm and a 40cm negative and positive output cable. Unlike the 12 AWG wire used for the PV panel output cables, in Canada, field wiring for PV strings is almost always done with 10 AWG RPVU wire. This has a resistance of 3.28 mOhm/m, and a total of 10.5 m are used for the array.

With 8 panels, there are 7 connections between panels, plus two connections at the ends mating with the RPVU wire. The DC connections on the inverter are usually not MC4, but for simplicity the resistance is assumed to be the same. Adding the positive and negative connections connections to the inverter, the total comes to 11. Here's the calculations for the total resistance:

10.5 m * 3.28 mOhm/m = 34.4 mOhm
12 AWG 0.6 m panel cables * 8 = 4.8m, * 5.2 = 25 mOhm
11 contacts/string * 2.5 mOhm = 27.5 mOhm
total: 86.9 mOhm

For the microinverter array, the optional 1.4 m PV panel output cables will be needed in order for the cables to reach the corresponding microinverter. This increased the total length of 12 AWG wire to 22.4 m. Here's the calculations for the total resistance:

12 AWG 2.8 m panel cables * 8 = 22.4 m, * 5.2 = 116 mOhm
16 contacts * 2.5 mOhm = 40 mOhm
total: 156 mOhm

Although the microinverter configuration higher resistance losses, they are not significant. During peak power output, DC current is about 10 Amps. Using P = I^2 * R, power losses are around 0.5%. Most of the time the array output current is much less than 10 Amps, so the average power loss is much lower. There are additional losses from the AC bus connectors, which are also not significant.

In conclusion, power losses are higher with microinverters than string inverters, but they are not significant. The justification for choosing string inverters lies more with the cost savings in material and labor. For an array with 16 panels, the cost of a 6 kW inverter with 2 string inputs is less than half the cost of 16 Enphase IQ7A microinverters.

Pi ethernet gadget with reverse SSH proxy

2021-04-16T16:47:00.003-07:00

I love my Pi Zeros. I think every hacker should have one in their toolbox. When I got my firs Pi Zero several years ago, I used a USB-TTL serial adapter to connect to the console UART on pins 8 and 10 of the Pi header. Once I learned how to setup the Zero as an ethernet gadget, things were a bit easier. However updating software was still a cumbersome process of downloading files to the host computer and then using scp to transfer them to the Pi. This blog post documents how to setup the Pi to use a SSH reverse proxy so utilities like git and apt work.

When I got my first Pi Zero, I chose the Pi OS Lite image. I decided to update to the March 4, 2021 release, and this time I used the Pi OS with desktop because it includes development tools like git. I followed the ethernet gadget setup instructions, modifying config.txt, cmdline.txt, and creating an empty file called "ssh". The next step is to configure the multicast DNS component of Zeroconf. As mentioned in the Adafruit instructions, if you are using Windows, the easiest way to do this is installing Apple's Bonjour service.

To use a reverse proxy over ssh, Windows users can't use putty as that feature is not supported. OpenSSH supports reverse socks5 proxies as of version 7.6. For connecting from Windows, I installed MSYS2, including OpenSSH 8.4. On Windows 10, WSL is probably the easiest option. To connect to the Pi and enable a reverse socks5 proxy on port 1080, enter, "ssh -R 1080 pi@raspberrypi.local".

Once connected to the Pi, set "http_proxy" to "socks5h://localhost:1080". The "h" at the end is important as it means the client will do hostname (DNS) resolution through the proxy. I added the following line to .profile to set it every time I login:

export http_proxy="socks5h://localhost:1080"

Programs such as git and curl will automatically use the socks proxy when the http_proxy environment variable is set. Note that github defaults to showing https URLs for repositories, which need to be changed to "http://" for the proxy to work.

The last configuration I recommend is setting the current date, since the Pi does not have a battery-backed RTC. I normally use ntpdate from the ntp project for manually setting the date and time on Linux, but it does not work with a socks proxy. After some searching I found a suggestion of using the HTTP Date: field from a reliable internet server. The command I use is:

date -s "`curl -sI google.com | grep "^Date:" | cut -d' ' -f3-7`"

Once the Pi Zero is configured and has the proper date and time set, I recommend running "apt update". If everything is working properly, it will use the socks5 reverse proxy to connect to the raspbian servers and update the local apt repository cache.

Honey, I shrunk the Arduino core!

2021-04-03T15:16:00.002-07:00

One of my gripes about the Arduino AVR core is that it is not an example of efficient embedded programming. One of the foundations of C++ (PDF) is zero-overhead abstractions, yet the Arduino core has a very significant overhead. The Arduino basic blink example compiles to almost 1kB, with most of that space taken up by code that is never used. Rewriting the AVR core is a task I'm not ready to tackle, but after writing picoCore, I realized I could use many of the same optimization techniques in an Arduino library. The result is ArduinoShrink, a library that can dramatically reduce the compiled size of Arduino projects. In this post I'll explain some of the techniques I used to achieve the coding trifecta of faster, better, and smaller.

The Arduino core is actually a static library that is linked with the project code. As Eli explains in this post on static linking, libraries like libc usually have only one function per .o in order to avoid linking in unnecessary code. The Arduino doesn't use that kind of modular approach, however by making use of gcc's "-ffunction-sections" option, it does mitigate the amount of code bloat due to the non-modular approach.

With ArduinoShrink, I wrote more modular, self-contained code. For example, the Arduino delay() function calls micros(), which relies on the 32-bit timer0 interrupt overflow counter. I simplified the delay function so that it only needs the 8-bit timer value. If the user code never calls micros() or millis(), the timer0 ISR code never gets linked in. By using a more efficient algorithm and writing the code in AVR assembler, I reduced the size of the delay function to 12 instructions taking 24 bytes of flash.

In order to minimize code size and maximize speed, almost half of the code is in AVR assembler. Despite improvements in compiler optimization techniques over the past decades, on architectures like the AVR I can almost always write better assembler code than what the compiler generates. That's especially true for interrupt service routines, such as the timer0 interrupt used to maintain the counters for millis() and micros(). My assembler version of the interrupt uses only 56 bytes of flash, and is faster than the Arduino ISR written in C.

One part that is still written in C is the digitalWrite() function. The Arduino core uses a set of tables in flash to map a given pin number to an IO port and bit, making for a lot of code to have digitalWrite(13, LOW) clear PORTB5. Making use of Bill's discovery that these flash memory table lookups can be resolved at compile time, digitalWrite(13, LOW) compiles to a single instruction: "cbi PORTB, 5".

ArduinoShrink is also designed to significantly reduce interrupt latency. The original timer0 interrupt takes around 5us to run, during which time any other interrupts are delayed. The first instruction in my ISR is 'sei', which allows other interrupts to run, reducing the latency impact to a few cycles more than the hardware minimum. The official Arduino core disables interrupts in several places, such as when reading the millis counter. My solution is to detect if the millis counter has been updated and re-read it, thereby avoiding any interrupt latency impact.

The only limitation compared to the official AVR core is that the compiler must be able to resolve the pin number for the digital IO functions at compile time. Although the pin may hard-coded, even with LTO enabled, avr-gcc is not always able to recognize the pin is a compile-time constant. Since AVR is not a priority target for GCC optimizations, I can't rely on compiler improvements to resolve this limitation. Therefore I plan to write a version of digitalWrite that is much smaller and faster, even when avr-gcc can't figure out the pin at compile time.

Although ArduinoShrink should be compatible with any Arduino sketch, given some of the compiler tricks I've used it's not unlikely I've missed a potential error. If you do find what you think is a bug, open an issue in the github repository.

Writing USB firmware on the CH55x MCUs

2021-03-02T18:53:00.002-08:00

Over the last several months, I've been familiarizing myself with the CH552 and CH551 MCUs. Most recently, I've been learning how to program the USB serial interface engine on these devices. The USB interface is powerful and flexible enough to implement many different kinds of USB devices, from HID to CDC serial. The highlights are:

support for endpoints 0 through 4, both IN and OUT
64-byte maximum packet size
DMA to/from xram only
multiple USB interrupt triggers

One of the first requirements for writing USB firmware is writing the descriptors. The examples from WCH are difficult to use as a template due to the descriptors being uint8_t arrays instead of structures. There are USB structure and constant definitions in ch554_usb.h, which I recommend using instead of arrays. For instance, I changed the CDC serial example from :

__code uint8_t DevDesc[] = {0x12,0x01,0x10,0x01,0x02,0x00,0x00,DEFAULT_ENDP0_SIZE,

0x86,0x1a,0x22,0x57,0x00,0x01,0x01,0x02,

0x03,0x01

};

to:

__code USB_DEV_DESCR DevDesc = {

.bLength = 18,

.bDescriptorType = USB_DESCR_TYP_DEVICE,

.bcdUSBH = 0x01, .bcdUSBL = 0x10,

.bDeviceClass = USB_DEV_CLASS_COMMUNIC,

.bDeviceSubClass = 0,

.bDeviceProtocol = 0,

.bMaxPacketSize0 = DEFAULT_ENDP0_SIZE,

.idVendorH = 0x1a, .idVendorL = 0x86,

.idProductH = 0x57, .idProductL = 0x22,

.bcdDeviceH = 0x01, .bcdDeviceL = 0x00,

.iManufacturer = 1, // string descriptors

.iProduct = 2,

.iSerialNumber = 0,

.bNumConfigurations = 1

};

Once the descriptors are written, the code to handle device enumeration is mostly boilerplate and can be copied from one of the examples. During the firmware development stage, I recommend adding a call to disconnectUSB() near the start of main(). It's a function I added to debug.h which forces the host to re-enumerate the device. This way I don't have to unplug and re-connect the USB module after flashing new firmware.

Setting up the DMA buffer pointers requires special attention when multiple IN and OUT endpoints are used. Even though five endpoints are supported, there are only four DMA buffer pointer registers: UEP[0-3]_DMA. When the bits bUEP4_RX_EN and bUEP4_TX_EN are set in the UEP4_1_MOD SFR, the EP4 OUT buffer is UEP0_DMA + 64, and the EP4 IN buffer is UEP0_DMA + 128. Endpoints 1-3 have even more complex buffer configurations, with optional double-buffering for IN and OUT using 256 bytes for four buffers starting from the UEPn_DMA pointer.

When I first started writing USB firmware for the CH551 and CH552, I was concerned that it may be difficult to meet the tight timing requirements, particularly for control and bulk packets that can have multiple in a single 1ms frame. For example, with small data packets, the time between the end of one OUT transfer and the end of the next OUT transfer can be less than 20uS. If the USB interrupt handler is too slow, the 2nd OUT transfer could overwrite the DMA buffer before processing of the first has completed. This situation is avoided by setting bUC_INT_BUSY in the USB_CTRL SFR. When this bit is set, the SIE will NAK any packets while the UIF_TRANSFER flag is set. Therefore I recommend setting bUC_INT_BUSY, and clear UIF_TRANSFER at the end of the interrupt handler.

I am currently working on the CMSIS_DAP example. It implements the DAPv1 (HID) protocol supporting SWD transfers, and works well with OpenOCD and pyOCD. I'm working on adding CDC/ACM for serial UART communication. The first step is creating the descriptors for the composite CDC + HID device. The second step will be integrating the usb_device_cdc code. The final step, although not absolutely necessary, will be optimizing the CDC code for baud rates up to 1mbps. The current code uses transmit and receive ring buffers with data copied to and from the IN and OUT DMA buffers. With double-buffering, the transmit and receive ring buffers can be omitted. The UART interrupt will copy directly between SBUF and the appropriate USB DMA buffer.

Quirks of the CH55x MCUs

2021-01-26T06:49:00.002-08:00

Over the past several months, I've been been learning to use the CH551 and CH552 MCUs. Learning generic 8051 programming was the easy part, as there is lots of old documentation available, with Philips having written some of the best. The learning curve for WCH's additions to the MCS-51 architecture has been steeper, requiring careful reading of the datasheets, and reading the SDK headers and examples. I've found that the CH55x chips have some quirks that I've never encountered on any other MCUs.

The GPIO modes are controlled by two registers: MOD_OC and DIR_PU. The register values are explained in the datasheet and in ch554.h in the SDK. Figure 10.2.1 in the datasheet shows a schematic diagram for the GPIO. Modes 0, 1, and 2 are for high-Z input, push-pull, and open-drain respectively. Mode 3, "standard 8051 mode" is the most complicated. It's an open drain mode with internal pullup, but with the output driven high for two cycles when the GPIO changes from a 0 to a 1. This ensures a fast signal rise time. The part that took me the longest to figure out was the operation of the pullup. The GPIO diagram shows 70k and 10k, but section 10 of the datasheet does not explain their operation. Therefore I've highlighted a part of the schematic in green. When the pin input schmitt trigger output is 1, the inverter in the top right of the diagram will output a low signal to turn on the pFET activating the 10k pullup. When port input value is 0, only the weak 70k pullup is active.

The pullups aren't actually implemented as resistors on the IC. They are specially-designed FETs with a high drain-source resistance (RDS). Since RDS varies with gate-source voltage (Vgs), the pullup resistance will vary inversely with Vcc. Using a 5V supply, the pullup resistance will be close to the 70k shown in the schematic. Using a 3.3V supply, the pullup resistance is close to 125k. Although it is not obvious, this information can be found in section 18 of the datasheet, with the specifications for IUP5 and IUP3. These numbers are the amount of current a grounded pin will source when the pullup is enabled.

The reset pin has an internal pulldown, which seems to be weak like the GPIO pullups. At times when working with a CH552 running at 3V3, the chip reset when I inadvertently touched the RST pin with my finger. This was easily solved by keeping the RST pin shorted to ground.

The last issue I encountered is more of a documentation issue than a quirk. The maximum reliable clock speed of an IC is depended on the supply voltage. All of the AVR MCUs I've worked with have a graph in the datasheet showing the voltage required to ensure safe operation at a given speed. For the CH55x MCUs, there is a subtle difference in the electrical specs at section 18 of the datasheet. At 5V, total supply current at 24MHz is specified, whereas the specs for 3.3V specify total operating current at 16Mhz. When I tried running a CH552T at 24MHz with a 3.3V supply, it never worked. The same part worked perfectly at 16MHz.

Despite the quirks, I think the CH55x MCUs are still a good value. Current quantity 10 pricing at LCSC is 36c for the CH552T, and 26c for the CH551G. I recently purchased a small tube of the CH552T, and have plans to test the touch, ADC, PWM, and SPI peripherals.

GD32E230: a better STM32F0?

2021-01-19T12:09:00.002-08:00

On my last LCSC order, I bought a few GD32E230 chips, specifically the GD32E230K8T6. I chose the LQFP parts since I have lots of QFP32 breakout boards that I've used for other QFP32 parts. Gigadevice is much better than many other Chinese MCU manufacturers when it comes to providing English documents. After my past endeavors trying to understand datasheets from WCH and CHK, going through the Gigadevice documentation was rather pleasant.

Although Gigadevice makes no mention of any STM32 compatibility, but the first clue is the matching pinouts of the STM32F030 and GD32E230. To prepare for testing, I tinned the pads on a couple of breakout boards, applied some flux, and laid the chips on the pads. I laid the modules on a cast-iron skillet, and heated it up to about 240C. The solder reflowed well, however I noticed some browning of the white silkscreen. Next time I'll limit the temperature to 220C. After testing for continuity and fixing a solder bridge, I was ready to try SWD. I connected 3.3V power and the SWD lines, and ran "pyocd cmd -v":

0000710:INFO:board:Target type is cortex_m
0000734:INFO:dap:DP IDR = 0x0bf11477 (v1 MINDP rev0)
0000759:INFO:ap:AHB5-AP#0 IDR = 0x04770025 (AHB5-AP var2 rev0)
0000799:INFO:rom_table:AHB5-AP#0 Class 0x1 ROM table #0 @ 0xe00ff000 (designer=4 3b part=4cb)
0000812:INFO:rom_table:[0]<e000e000:SCS-M23 class=9 designer=43b part=d20 devtyp e=00 archid=2a04 devid=0:0:0>
0000823:INFO:rom_table:[1]<e0001000:DWT class=9 designer=43b part=d20 devtype=00 archid=1a02 devid=0:0:0>
0000841:INFO:rom_table:[2]<e0002000:BPU class=9 designer=43b part=d20 devtype=00 archid=1a03 devid=0:0:0>
0000848:INFO:cortex_m_v8m:CPU core #0 is Cortex-M23 r1p0
0000859:INFO:dwt:2 hardware watchpoints
0000866:INFO:fpb:4 hardware breakpoints, 0 literal comparators

I did little probing around the chip memory. The GD32E23x user manual shows SRAM at 0x20000000, like STM32 parts. The contents looked like random values, which I could overwrite using the pyocd "ww' command. Writing to 0x20002000 resulted in a memory fault, indicating the part does not have any "bonus" RAM beyond 8kB.

Next, I tried using the built-in serial bootloader. After connecting BOOT0 to VDD and connecting power, PA9 and PA10 were pulled high, indicative of the UART being activated. However my first attempt at using stm32flash was not successful:

After attaching my oscilloscope, and writing a small bootloader protocol test program, I was able to determine that the responses did seem to conform to the STM32 bootloader protocol. I did notice that the baud rate from the GD32E230 was only 110kbps, so it wasn't perfectly matching the 115.2kbps speed of the 0x7F byte sent for baud rate detection. To avoid the potential for data corruption, I switched to 57.6kbps. Before resorting to debugging the source for stm32flash, my test of stm32loader gave better results:

$ stm32loader -V -p com39

Open port com39, baud 115200

Activating bootloader (select UART)

*** Command: Get

Bootloader version: 0x10

Available commands: 0x0, 0x2, 0x11, 0x21, 0x31, 0x43, 0x63, 0x73, 0x82, 0x92, 0x6

Bootloader version: 0x10

*** Command: Get ID

Chip id: 0x440 (STM32F030x8)

Supply -f [family] to see flash size and device UID, e.g: -f F1

Next, I was ready to try flashing a basic program. I first checked for GD32E support in libopencm3. No luck. Then as I read through the user manual, I noticed GPIOA starts at 0x4800 0000 on AHB2, the same as STM32F0 devices. The register names didn't match the STM32, but the function and offsets were the same. For example on the GD32E, the register to clear individual GPIOA bits is called GPIOA_BC, rather than GPIOA_BRR as it is called on the STM32. The clock control registers, called RCU on the GD32E, also matched the STM32 RCC registers. Since it was looking STM32F0 compatible, I tried flashing my blink example with stm32loader, and it worked!

The LED was flashing faster than it did with the STM32F030. A little searching revealed that the ARM Cortex-M23, like the M0+, has a 2-stage pipeline. The STM32F030 with it's M0 core has a 3-stage pipeline. My delay busy loop needs to be four cycles per iteration, and on the M23, the bne instruction only takes two cycles. My solution is adding a nop instruction based on an optional compile flag.

One problem I have yet to resolve with the GD32E is support for the bootloader Go/0x21 command. With the STM32F0, I left BOOT0 high, and used DTR to toggle nRST before uploading new code. The stm32flash "-g 0" option made the target run the uploaded code after flashing was complete. I went back to debugging stm32flash, and discovered that it is hard-coded to use the "Get Version"/0x01 command, and silently fails if the bootloader responds with a NAK. After a few mods to the source, I was able to build a version that works with the GD32E230, however the Go command still doesn't work. Perhaps a task for a later date will be to hook up a debug probe to see what the E230 is doing when it gets the Go command.

Overall, I'm quite happy with the GD32E230K8T6. They cost less than half the equivalent STM32 parts, and are even cheaper than other Chinese STM32 clones I've seen. They are lower power and their maximum clock speed is 50% faster than the STM32F0. In addition to the shorter 2-stage pipeline, the GD32E devices support single-cycle IO, making them faster for bit-banged communications than the STM32F0 which takes 2 cycles to write to a GPIO pin. The GD32E230 also has some new features, which might be worth discussing in a future blog post.

Trying to test a "ten cent" tiny ARM-M0 MCU part 2

2021-01-02T16:51:00.000-08:00

After my first look at the HK32F030MF4P6, I wondered if the HK part, unlike the STM32F030 it is modeled after, does not have 5V tolerant IO. I changed the solder jumpers to 3V3 on the CH552 module I'm using as a CMSIS-DAP adapter, which caused it to stop working. This was because the CH552 requires a 5V supply in order to run reliably at 24Mhz. After re-flashing the CMSIS-DAP firmware set to run at 16MHz, the module worked, and I was finally able to talk to the HK MCU via SWD.

In the screen shot above, I chose the stm32f051 target because pyocd does not have the HK MCU nor the STM32F030 among it's builtin targets. For basic SWD communications, the target option is not even necessary. With the target specified, it's possible to specify peripheral registers by name, rather than having to specify a memory address to read or write.

In the screen shot above, I'm using the "connect_mode" option to bring the nRST line low on the target device when entering debug mode. Usually this is not necessary for SWD, however some of the probing I did would cause the MCU to crash. This required a power cycle or reset to restore communications via SWD.

The first tests I did with the HK MCU were to probe the flash and RAM. The HK datasheet shows the flash at address 0. In the STM32F0, the flash is at address 0x8000000, and is mapped to address 0 when the boot0 pin is low. Although the HK MCU doesn't have a boot0 pin, data at address 0x8000000 is mirrored at address 0 as well. What was most unusal about the HK MCU is that the flash was not erased to all 0xFF as is typical with other flash-based MCUs. Most of the flash contents was zeros, except for some data at address 0x400, which was the same on the 2 MCUs I checked:

By writing to memory starting at 0x20000000 using the 'ww' command, I discovered that the MCUs I received have 4kB or RAM, rather than the 2kB specified in the datasheet. Writing to 0x20001000 (beyond 4kB) results in a crash.

For writing and erasing the flash, I initially tried using the pyOCD 'erase' and 'flash' commands. Since the MCU flash interface is not part of Cortex-M specification, the flash interface peripheral will vary from one MCU vendor to the next. The flash interface on the STM32F051 is almost identical to the flash interface on the STM32F030, however the 'erase' and 'flash' commands caused the HK MCU to crash when I ran them. Testing on a genuine STM32F030 crashed as well, and after some debugging and reading through the pyOCD code, I realized the STM32F051 flash routines need 8kB of RAM. Even after downloading and installing the STM32F0 device pack, I could not erase or flash the HK MCU.

Next I reviewed the STM32F030 programming manual, and tried to access the flash peripheral registers directly. This was when I found a pyOCD bug with the wreg command. I was able to unlock the flash by writing the magic sequence of 0x45670123 followed by 0xCDEF89AB to flash.keyr. I tried erasing the first page at address 0, and although flash.sr and flash.cr updated as expected, the memory contents did not change. What did work was erasing the page at address 0x8000000, which cleared the contents at address 0 as well. I still find it strange that the erase operation sets all bits to 0 instead of 1. The HK datasheet says a flash page is 128 bytes, and erasing a page resulted in 128 bytes set to all zero.

I was only partially successful in writing data to the flash. Writing to 0x8000000 did not work, however writing a 16-bits to address 0 using the 'wh' command was successful. Trying to write 16-bits to address 2 updated the flash.ar and flash.sr as expected, but did not change the data. Writing to any 4-byte aligned address in the erased page worked, but writing to addresses that were only 2-byte aligned left all 16 bits at zero. I tried writing bytes with 'wb' and full words with 'ww', both of which crashed the MCU, likely from a hard fault interrrupt. I even made sure there isn't a bug with the 'wh' command by writing 16-bits at a time to RAM.

While searching the CHK website for more documentation, I found a page with IAR device packs. Although pyOCD uses Kiel device packs, I downloaded the HK32F0 pack, which is a self-extracting RAR file, which saves the uncompressed files in AppData\Local\Temp\RarSFX0.

Since .pack files are just zip files with a different extension, I zipped the files back up as a .pack file. However pyOCD couldn't read it: "0000731:CRITICAL:__main__:CMSIS-Pack './HK32F0.pack' is missing a .pdsc file". Manually examining the files confirmed some of my earlier discoveries, such as flash at address 0x8000000, remapped to address zero. I found a file named HK32F030M.svd, which contains XML definitions of the peripheral registers. pyOCD's builtin devices appear to use svd files, so it may be possible to add HKD32F0 support to pyOCD.

Copies of the IAR support pack, datasheet, and pyocd page erase sequence can be found in my github repository.

Trying to test a "ten cent" tiny ARM-M0 MCU

2020-12-13T16:24:00.007-08:00

A few months ago, while browsing LCSC, I found a surprisingly cheap ARM M0 MCU. At the time it was 16.6c in single-unit quantities, with no higher-volume pricing listed. From the datasheet LCSC has posted, there was enough information in English to tell that it has 2kB RAM, 16kB flash, and runs up to 32MHz with a 1.8V to 3.6V power supply. Although the part number suggests it may be a clone or is compatible with the STM32F030, it's not. The part number for the STM32F030 clone is HK32F030F4P6.

Some additional searching brought me to some Chinese web sites that advertised the chip as a 32-bit replacement for the STM8S003. The pinout matches the STM8S003F3P6, so in theory it is a drop-in replacement for the 8S003. Unlike the STM32F0, it has no serial bootloader, so programming has to be done via SWD. And with no bootloader support, there's no need to be able to remap the flash from 0x0800000 to 0x0000000 like the STM32. A small change to the linker script should be all it takes to handle that difference. Even though I wasn't sure how or if I'd be able to program the chips, I went ahead and ordered a few of them. I already had some TSSOP20 breakout boards, so the challenge would be in the software, and the programming hardware.

Since I'm cheap, I didn't want to buy a dedicated DAPlink programmer. I have a STM32F103 "blue pill", so I considered converting it to a black magic probe. But since I've been playing with the CH554 series of chips, I decided to try running CMSIS-DAP firmware on a CH552. If you're not familiar with CMSIS-DAP and SWD, I recommend Chris Coleman's blog post. Before I tried it with with the HK32F030MF4P6, I needed to try it with a known good target. Since I had recently been working with a STM32F030, that's what I chose to try first.

The two main alternatives for open-source CMSIS-DAP software for downloading, running, and debugging target firmware are OpenOCD and pyOCD. pyOCD is much simpler to use than OpenOCD; after installing it with pip, 'pyocd list' found my CH552 CMSIS-DAP:

However that's as far as I could get with pyOCD. There seems to be a bug in the CMSIS-DAP firmware or pyOCD around the handling of the DAP_INFO message. Fixing the bug may be a project for another day, but for the time being I decided to figure out how to use OpenOCD.

To use OpenOCD, you need to create a configuration file with information about your debug adapter and target. It's all documented, however it's very complicated given that OpenOCD does a whole lot more than pyOCD. It's also complicated by the fact that since the release of v0.10.0, there have been updates that have made material changes to the configuration file syntax. I had a working configuration file on Windows that wouldn't work on Linux. On Linux I was running OpenOCD v0.10.0-4, but on windows I was running v0.10.0-15. After installing the xPack project OpenOCD build on Linux, the same config file worked on both Linux and Windows, which I named "cmsis-dap.cfg":

adapter driver cmsis-dap

transport select swd
adapter speed 100

swd newdap chip cpu -enable
dap create chip.dap -chain-position chip.cpu
target create chip.cpu cortex_m -dap chip.dap

init
dap info

With dupont jumpers connecting SWCLK, SWDIO, VDD, and VSS on my STM32F030 breakout board, here's the output from openocd.

After making the same connections (factoring the different pinout) to the HK32F030MF4P6, I was getting no response from the MCU. Before connecting, I had done the usual checks for shorts and continuity, making sure all my solder connections were good. Next I tried just connecting VDD and VSS, while I probed each pin. Pin 2, SWDIO, was pulled high to 3V3, as was nRST. All other pins were low, close to 0V. The STM32F030 pulls SWDIO and nRST high too. I tried reconnecting SWDIO and SWCLK, and connecting a line to control nRST. I added "reset_config trst_and_srst" to my config file, and still didn't get a response. Looking at the debug output from openocd (-d flag) shows the target isn't responding to SWD commands:

Debug: 179 99 cmsis_dap_usb.c:728 cmsis_dap_swd_read_process(): SWD ack not OK @ 0 JUNK Debug: 180 99 command.c:626 run_command(): Command 'dap init' failed with error code -4

Since the datasheet says that after reset, pin 2 functions as SWDIO, and pin 11 functions as SWCLK, I'm at a bit of an impasse. I'll try hooking up my oscilloscope to the SWDIO and SWCLK lines to make sure the signals are clean. I've read that in some ARM MCUs, DAP works while the device is in reset, so I'll peruse the openocd docs to figure out how to hold nRST low while communicating with the target. And of course, suggestions are welcome.

Before I finish this post, I wanted to explain the reference to a "ten cent" MCU. LCSC does not list volume pricing for the part, but when I searched for the manufacturer's name, "Shenzhen Hangshun Chip Technology Development", I found an article about the company. In the article, the company president, Liu Jiping, refers to the 10c ($0.1) price. I suspect that pricing is for quantities over 1000. Assuming these chips can actually be programmed with a basic SWD adapter, then even paying 20c for a 20-pin, 32MHz M0 MCU looks like a good deal to me.

Read part 2 to find out how I got SWD working.

STM32 Starting Small

2020-12-07T14:40:00.001-08:00

For software development, I often prefer to work close to the hardware. Libraries that abstract away the hardware not only use up limited flash memory, they add to the potential sources of bugs in your code. For a basic test of STM32 library bloat, I compiled the buttons example from my TM1638NR library in the Arduino 1.8.13 IDE using stm32duino for a STM32F030 target. The flash required was just over 8kB, or slightly more than half of the 16kB of flash specification on the STM32F030F4P6 MCU. While I wasn't ready to write my own tiny Arduino core for the STM32F, I was determined to find a more efficient way of programming small ARM Cortex-M devices.

After a bit of searching, looking at Bill Westfield's Miimalist ARM project, libopencm3, and other projects, I found most of what I was looking for in a series of STM32 bare metal programming posts by William Ransohoff. However instead of using an ST-Link programmer, I decided to use a standard USB-TTL serial dongle to communicate with the ROM bootloader on the STM32.

To enable the bootloader, the STM32 boot0 pin must be pulled high during power-up. then the bootloader will wait for communication over the USART Tx and Rx lines. On the STM32F030F4P6, the Tx line is PA9, and the Rx line is PA10. In order reset the chip before flashing, I also connected the DTR line from my serial module to NRST (pin 4) on the MCU as shown in the following wiring diagram:

For flashing the MCU, I decided on stm32flash. While installation on Debian Linux is as simple as, "apt install stm32flash", I had some difficulty finding a recent Windows build. So I ended up building it myself. Although my build defaults to 115.2kbps, I found 230.4kbps completely reliable. At 460.8kbps and 500kbps, I encountered intermittent errors, so I stuck with 230.4kbps. After making the necessary connections, and before flashing any code to the MCU, do a test to confirm the MCU is detected.

One thing to note about stm32flash is that it does not detect the amount of flash and RAM on the target MCU. The numbers come from a hard-coded table based on the device ID reported. The official flash size in kB is stored in the system ROM at address 0x1FFFF7CC. On my STM32F030F4P6, the value read from that address is 0x0010, reflecting the spec of 16kB flash for the chip. My testing revealed that it actually has 32kB of usable flash.

I used William's STM32F0 GPIO example as a template to create a tiny blinky example that uses less than 300 bytes of flash. Most of that is for the vector table, which on the Cortex-M0 has 48 entries of 4 bytes each. To save space, I embedded the reset handler in an unused part of the vector table. Since the blinky example doesn't use any interrupts, all but the initial stack pointer at vector 0 and the reset handler at vector 1 could technically be omitted. I plan to re-use the vector table code for other projects, so I did not prune it down to the minimum.

The blinky example will toggle PA9 at a frequency of 1Hz. That is the UART Tx pin on the MCU, which is connected to the Rx pin on the USB-TTL dongle. This means when the example runs, the Rx LED on the USB-TTL dongle will flash on and off.

I think my next step in Cortex-M development will be to experiment with libopencm3. It appears to have a reasonably lightweight abstraction of GPIO and some peripherals, so it should be easier to write code that is portable across multiple different ARM MCUs.

LGT8F328P EDMINI board

2020-10-05T07:18:00.001-07:00

Earlier this year I purchased a EDMINI board from Electrodragon. It uses a LGT8F328P chip, which supports the AVR instruction set. The instruction set timings and peripheral registers vary slightly from the ATmega328P, so it is not 99% compatible as claimed by Electrodragon. I bought one to see just how compatible it is, and possibly to port some of my AVR libraries to the LGT MCU.

The module arrived in an anti-static bag, inside a padded envelope. After connecting 5V power to the board, the D13 LED blinked on and off every second, suggesting that it comes with the Arduino blink sketch pre-loaded. I then hooked up a USB-TTL adapter, installed the LGT board file in the Arduino IDE, and tried flashing a modified blink sketch to the board. The upload failed, and after some debugging I found that the reset was not working on the MCU. Neither pressing and holding the reset button nor grounding RST would reset the board. After contacting Electrodragon, Chao agreed replace the board, with two new boards. He told me that they see a higher than average failure rate with the LGT8F328P chips.

In addition to Chao's frank comment about reliability, another concern I had about the LGT parts was the lack of markings on the chip. I suspect LGT sells the parts without markings so vendors can label them with their own brand. This also makes it easier for more nefarious manufacturers to label them as an ATmega328p.

When the new boards arrived, the first thing I did was make sure the reset button worked. After pressing reset the LED flashes quickly three times for the bootloader, and then flashes on and off every second. However when I tried uploading sketch using the Arduino IDE, the upload still failed. After some more debugging, I found I could upload if I pressed the reset button just before uploading. This meant the bootloader was working, but auto-reset (toggling the DTR line) was not. These boards use the same auto-reset circuit as an Arduino Pro Mini:

A negative pulse on DTR will cause a voltage drop on RST, which is supposed to reset the target. When the target power is 5V and 3V3 TTL signals are used, toggling DTR will cause RST to drop from 5V to about 1.7V (5 - 3.3). With the ATmega328P and most other AVR MCUs, 2V is low enough to reset the chip. The LGT8F328P, however requires a lower voltage to reset. In some situations this can be a good thing, as it means the LGT MCU is less likely to reset due to electromagnetic interference.

The EDMINI board has a 3V3 regulator which can be selected by a solder jumper. This is mentioned on the Electrodragon site, but it is not clearly documented which pads need to be shorted to switch from 5V to 3V3. After a bit of debugging I was able to run the board at 3V3, and was able to use the auto-reset feature.

I do most of my AVR development using command line tools, not the Arduino IDE. I compiled a small program that toggles every pin on PORTB using avr-gcc 5.4.0, and flashed it to the EDMINI board using avrdude. Nothing happened. Since the Arduino blink sketch worked, I know that the LED on PB5 was working. My conclusion is that the LGT Arduino core must do some setup to enable PORTB. This is common on modern MCUs such as the ARM Cortex, but on AVRs like the ATmega328p, writing 255 to the PORTB and DDRB registers is all it takes to drive every pin on port B high.

I won't be doing any development work with the LGT MCUs. Although they are cheaper and can run a bit faster than authentic AVR parts, their compatibility is rather limited. Any code that relies on the standard AVR instruction set timing, such as my picoUART library, will not work. The 8F328P cannot be programed with a USBasp, as the native programming interface is SWD, not Atmel's SPI-based protocol. For a cheap and powerful MCU, the CH551 looks much more interesting.

Recording the Reset Pin

2020-09-17T11:09:00.000-07:00

The AVR reset pin has many functions. In addition to being used as an external reset signal, it can be used for debugWire, and it is used for SPI and for high-voltage programming. Other than for when it is used as an external reset signal, the datasheet specifications are somewhat ambiguous. I recently started working on an updated firmware for the USBasp, and wanted to find out more details about the SPI programming mode. The image above is one of many recordings I made from programming tests of AVR MCUs.

When I first started capturing the programming signals, I observed seemingly random patterns on the MISO line before programming was enabled. Although the datasheet lists the target MISO line as being an output, it only switches to output mode after the first two bytes of the "Programming Enable" instruction, 0xAC 0x53, are received and recognized. Prior to that the pin floats, and the seemingly random patterns I observed were caused by the signals on the MOSI and SCK lines inducing a voltage on the MISO line. I enabled the pullup resistor on the programmer side in order to keep the MISO line high until the PE instruction was recognized by the target.

One of the steps in the datasheet's serial programming alorithm that doesn't make sense to me is step 2, which says, "Wait for at least 20 ms and enable Serial Programming by sending the Programming Enable serial instruction to pin MOSI." It's clear from the capture image above that a wait time of less than 100 us worked in this case. I did a number of experiments with different targets (t13, t85, m8a) with and without the CKDIV8 fuse set, and found a delay of 64 us was always sufficient. Nevertheless, I still used a 20 ms delay in the USBasp firmware.

Another observation I made was of a repeatable delay between the 8th rising edge of the SCK signal on the second byte and MISO going low. After multiple tests, I found that delay is between 2 and 3 of the target clock cyles. A close-up of the 0x53 byte shows this clearly:

The 2-3 clock ccyle delay seems to correspond with the datasheet's specification of the minimum low and high periods for the SCK signal of 2 clock cycles when the target is running at less than 12Mhz. However I found I couldn't consistently get a target running at 8MHz to enter programming mode with a SCK clock of 1.5MHz. Additional logs of the programming sequence revealed something interesting when multiple PE instructions are sent at less than 1/8th of the target clock rate, with a positive pulse on RST for synchronization. In those sequences, the delay was smaller between the 8th rising edge of the SCK signal on the second byte and MISO going low for the second and subsequent times the PE instruction is sent. It seems you need to use a slower SCK frequency to get the target into programming mode, but after that, the frequency can be increased to 1/4 of the target clock.

Using what I learned, I have implemented automatic SCK speed negotiation and a higher default SCK clock speed. The speed negotiation starts with 1.5MHz for SCK, and makes 3 attempts to enter programming mode. If that fails, the next slower speed (750kHz) is tried three times, and so on until a speed is found where the target responds. For subsequent communications with the target, the speed is doubled, since the slowest speed is only needed the first time the PE command is received after power-up. The firmware also supports a maximum SCK frequency of 3MHz, vs 1.5MHz for the original firmware.

The higher speeds don't make a large difference in flash/verify times since the overhead of the vUSB code tends to dominate beyond a SCK frequency of 750kHz or so. Reading the 8kB of flash on an ATtiny85 takes around 3 seconds. By optimizing the low-speed USB code, such as was done by Tim with u-wire, it should be possible to double that speed.

Flashing AVRs at high speed

2020-09-06T10:26:00.003-07:00

I've written a few bootloaders for AVR MCUs, which necessarily need to modify the flash while running. The typical 4ms to write or erase a page depends on the speed of the internal RC oscillator. Here's a quote from section 6.6.1 of the ATtiny88 datasheet:

Note that this oscillator is used to time EEPROM and Flash write accesses, and the write times will be affected accordingly. If the EEPROM or Flash are written, do not calibrate to more than 8.8 MHz. Otherwise, the EEPROM or Flash write may fail.

I wondered how running the RC oscillator well above 8.8MHz would impact erasing and writing flash In the past I read about tests showing the endurance of AVR flash and EEPROM is many times more than the spec, but I couldn't find any tests done while running the AVR at high speed. I did come across a post from an old grouch on AVRfreaks warning not to do it, so now I had to try.

The result is a program I called flashabuse, which you'll see later is a bit of a misnomer. What the program does is set OSCCAL to 255, then repeatedly erase, verify, write, and verify a page of flash. I chose to test just one page of flash for a couple reasons. First, testing all 128 pages of flash on an ATtiny88 would take much more time. The second is that I would only risk damaging one page, and an ATtiny88 with 127 good pages of flash is still useful.

The results were very positive. My little program was completing about 192 cycles per second, taking 2.6ms for each page erase or page write. I let it run for an hour and a half, so it successfully completed 1 million cycles. Not bad considering Atmel's design specification is a minimum of 10,000 cycles.

So why does the flash work fine at high speed? I think it has to do with how floating-gate flash memory works. Erasing and writing the flash requires removing and adding a charge to the floating gate using high voltages. Atmel likely uses timing margins well in excess of the 10% indicated in the datasheet, so even half the typical 4ms is more than enough to ensure error-free operation. I even think writing at high speed puts less wear on the flash because it exposes the gate to high voltages for a shorter period of time.

Addendum

I received some feedback questioning whether the faster write time may reduce retention due to reduced charge on the floating gate. As I mentioned above, Atmel likely used a very large timing margin when designing the flash memory. Chris Lamont, who tested flash retention on a PIC32, stated that retention failure is "extremely unlikely".

The retention specs for the ATtiny88 are, "20 years at 85°C / 100 years at 25°C". As this Micron technical note (PDF) shows, retention specs are based on models, not actual testing. Micron's JESD47I PCHTDR testing is done at 125C for 1000 hours, and requires 0 failures. TEKMOS states, "As a very rough rule of thumb, the data retention time halves for every 10C rise in temperature." Extrapolating from a 100-year retention at 25C, retention at 255C, a typical reflow soldering peak temperature, would be only 6 minutes.

In an attempt to show that retention is not impacted by repeated fast flashing, I performed two additional tests. For the first test, I baked the subject MCU for 12 hours at 150C, then performed 100,000 fast write/erase cycles. Next, 0x55 was written to the test page, and repeatedly verified for 2 hours. This test passed with no errors. For the second test, I filled the 8kB of flash with zeros to put a charge on the floating gate for every bit. I then baked the subject MCU for 12 hours at 150C, then verified that all bits remained at zero. This test passed with all 65,536 bits reading zero. I did, however have a failure of one solder joint, likely due to the stress of thermal cycling.

For those who are ~~particularly concerned~~ paranoid about flash retention, one solution is refereshing the flash. For an AVR MCU, it would be simple to refesh the flash on every bootup with a small segment of code in .init1. The code would copy each page into the page buffer, then perform a write on the page. This would refresh all the 0 bits, and extend the retention life for another 20 to 100 years.

Hacker's Intro to USB hardware

2020-08-27T08:04:00.003-07:00

Low-speed 1.5Mbps and full-speed 12Mbps USB, while more complicated than a UART, are still hacker-friendly. As the standard approaches 25 years old, I've decided to document some of the more useful highlights I've learned.

While some USB devices will have accessible PCB pads where you can probe signals, it's best to have some breakouts and pass-thru cables with test points. I've found broken micro-USB cables to be a cheap option. I cut the micro-b end off, strip the wires, and solder them to some protoboard with 4 pin headers for the ground 5V, D+ and D- connections. A crude USB voltage tester can be made with a couple silicon diodes and white or blue LED in series, powered by the 5V line. In the 20mA range, a 1N4148 has a vF of about 0.8V, so a 3.4V LED will be brightly lit if 5V is present. I've also made a custom USB-A extension cable with a section of the D+ and D- wires exposed for easy attachment of alligator clips.

Although USB power is 5V, typically at up to 500mA, the signalling is 3.3V. At the host, the data pins are pulled to ground with a resistance between 15k and 22k, so a typical host will use 18.5k Ohms. At the device, the D+(full-speed) or D-(low-speed) pin is pulled up to 3.3V to signal to the host that a device is attached. The spec (pdf) shows this being done with a 1.5k pullup to 3.6V, creates a 18.5k/20k divider, resulting in 3.6V * 0.925 or 3.33V. I've found a 10k pullup to 5V works just fine, and many devices use a 1.5k pullup to 3.3V. since the spec requires a minimum of 2.7V for detection to work. For a connected low-speed device (like a mouse), D+ will be near 0V, and D- will be near 3.3V. For a full-speed device, the polarity will be reversed. High-speed devices use low-swing 400mV signalling with both D+ and D- at 0V when idle.

The frequency counter on a multimeter can be used to tell if a device is alive, or if the host has failed to recognize it. For a device that has been enumerated by a host, the host will send a keepalive signal to the device. For a low-speed device, this is a single-ended 0 (SE0) where D- is pulled low for 1.3us every ms. Therefore, a frequency of at least 1kHz will be detected on the D- line.

You can get a USB device to reconnect without unplugging it by forcing a bus reset. This can be done by shorting the D+(full-speed) or D-(low-speed). To avoid releasing the magic smoke by accidentally shorting the wrong connection, I suggest using 100-150 ohm resistor, which is still more than sufficient to reset the bus.

Getting started with the WCH CH551 and CH552

2020-07-02T08:11:00.004-07:00

When I first read about the CH554 series of MCUs, I thought it would be interesting to test out some day. Part of the attraction is that it's based on the 8051, which is a well-documented an widely used architecture. The first assembly language I learned almost 40 years ago was for the 6502, so learning to program the 8-bit CISC should be relatively easy.

Instead of purchasing the bare chips for pennies at LCSC and putting together a breakout board, I bought a couple modules from Electrodragon. I had learned that the CH551, CH552, and CH554 all used the same die. I bought the CH551 and CH552 modules with the intention of eventually trying to hack them into working as a CH554.

For testing the modules, in addition to the CH554 SDK for SDCC on Linux, I've used Ch55xduino on Windows. One thing not in the Ch55xduino documentation is driver setup. The windoze version I'm using is 7E, and when I first inserted the CH551 module, I got a driver error.

Using Zadig to set the driver to libusb-win32 solved the problem.

The CH55xduino documenation also lacks pinout documentation for anything other than the reverence board. To help, I've copied the pinouts from the CH552 datahseet.

The CH55x bootloader supports DFU, which is what the CH55xduino uploader uses the first time code is uploaded to the module. Once the first sketch is uploaded, the CH55xduino core includes a CDC serial stack. With my CH551 module no longer appearing as a DFU device, I had to use Zadig again to change the CDC Serial device to use the USB Serial (CDC) driver. After that, the module appears as a COM port.

With the COM port selected in the Arduino IDE, subsequent uploads enter the bootloader by switching the baud rate to 1200bps. If no COM port is selected, the upload tool looks for a CH55x device in DFU bootloader mode. To enter the bootloader, it is necessary to pull the USB D+ pin up to 3.3V when power is applied. The Electrodragon boards have a pinout for an upload jumper, which when shorted will connect the D+ pin (P3.6/UDP)to 3.3V through a 10k resistor. On one of my modules I soldered pin headers and use a jumper to force it into upload mode. On the other, I just used a low-value (270Ohm) through-hole resistor pushed into the holes.

Currently CH55xduino is not optimized for size, with a basic blink sketch requiring 5333 bytes of flash. Officially, the CH551 is only supposed to have 10kB of available flash, so the CH55xduino overhead means less than 5kB is left for user code. The CH551 actually seems to have 12kB available for flashing user code, which I think will be plenty if the CH55xduino core gets some optimization work. Since I like to do low-level embedded coding, I'll be using SDCC from the command line most of the time. The blink example in the CH554 SDK for SDCC compiles to 700 bytes, and I was able to get that down to 232 bytes after leaving out the UART initialization in debug.c. With a bit more optimization I think I can get the blink example down to 100 bytes or so.

One small surprise I found during my testing is that the Electrodragon CH551 and CH552 modules use different pins for the user LED. On the CH551, use P3.0, working in open-drain mode so the LED light up when P3.0 is low. On the CH552, drive P1.4 high to light the LED. This is documented on the Electrodragon web site, but it is easy to forget when switching between the two modules.

I've already started to learn how to configure the standard MCS-51 UART, and have figured out how to directly manipulate the ports using the SFRs (Special Function Registers). Once I've mastered how to program these cheap little devices, I'll follow up with another blog post revealing the details.

Postscript

I recently found out that these modules do not fit well in a solderless breadboard. The row spacing for the 0.1" headers on the CH551 is about 0.47", so the pins have to bend out slightly to plug into the breadboard. On the CH552 module, the row spacing is about 0.52", so the pins have to bend out slightly to fit.

A full-duplex tiny AVR software UART

2020-06-11T21:09:00.000-07:00

I've written a few software UARTs for AVR MCUs. All of them have bit-banged the output, using cycle-counted assembler busy loops to time the output of each bit. The code requires interrupts to be disabled to ensure accurate timing between bits. This makes it impossible to receive data at the same time as it is being transmitted, and therefore the bit-banged implementations have been half-duplex. By using the waveform generator of the timer/counter in many AVR MCUs, I've found a way to implement a full-duplex UART, which can simultaneously send and receive at up to 115kbps when the MCU is clocked at 8Mhz.

I expect most AVR developers are familiar with using PWM, where the output pin is toggled at a given duty cycle, independent of the code execution. The technique behind my full-duplex UART is using the waveform generation mode so the timer/counter hardware sets the OC0A pin at the appropriate time for each bit to be transmitted. TIM0_COMPA interrupt runs after each bit is output. The ISR determines if the next bit is a 0 or a 1. For a 1 bit, TCCR0A is configured to set OC0A on compare match. For a 0 bit, TCCR0A is configured to clear OC0A on compare match. The ISR also updates OCR0A with the appropriate timer count for the next bit. To allow for simultaneous receiving, the TIM0_COMPA transmit ISR is made interruptible (the first instruction is "sei").

The receiving is handled by PCINT0, which triggers on the received start bit, and TIM0_COMPB interrupt which runs for each received bit. I wrote this ISR in assembler in order to ensure the received bit is read at the correct time, taking into consideration interrupt latency. If any other interrupts are enabled, they must be interruptible (ISR_NOBLOCK if written in C). I've implemented a two-level receive FIFO, which can be queried with the rx_data_ready() function. A byte can be read from the FIFO with rx_read().

The code is written to work with the ATtiny13, ATtiny85, and ATtiny84. Only PCINT0 is supported, which on the t84 means that the receive pin must be on PORTA. With a few modifications to the code, PCINT1 could be used for receiving on PORTB with the t84. The total time required for both the transmit and the receive ISRs is 52 cycles. Adding an average interrupt overhead of 7 cycles for each ISR means that there must be at least 66 cycles between bits. At 8Mhz this means the maximum baud rate is 8,000,000/66 = 121kbps. The lowest standard baud rate that can be used with an 8Mhz clock is 9600bps.

The wgmuart application implements an example echo program running at the default baud rate of 57.6kbps. In addition to echoing back each character received, it prints out a period '.' every second along with toggling an LED.

I've published the code on github.

Measuring AVR interrupt latency

2020-04-27T14:38:00.000-07:00

One thing I like about AVR MCUs is that their datasheets are relatively short and simple. It's also one of the things I don't like, because the datasheets often lack important details. Understanding external interrupt latency is one things that is lacking complete and clear details. I decided to investigate the interrupt latency of the ATtiny13 and the ATtiny85. The datasheet's description of interrupt response time and external interrupts is identical for both parts.

Interrupt Response Time

The ATtiny13 datasheet section 4.7.1, under the heading "Interrupt Response Time", says, "The interrupt execution response for all the enabled AVR interrupts is four clock cycles minimum. After four clock cycles the Program Vector address for the actual interrupt handling routine is executed. [...] The vector is normally a jump to the interrupt routine, and this jump takes three clock cycles. [...] If an interrupt occurs when the MCU is in sleep mode, the interrupt execution response time is increased by four clock cycles."

While section 4.7.1 is reasonably detailed, it has one significant error, and another important omission. The error is the sentence, "The vector is normally a jump to the interrupt routine, and this jump takes three clock cycles". All AVRs with less than 8KB of flash, like the ATtiny13, have no jump instruction. They only have a relative jump "rjmp", which takes two clock cycles. This is obviously a copy/paste error from the datasheet of an AVR with more than 8KB of flash. Anyone familiar with the AVR instruction set would likely catch this simple error. The omission from section 4.7.1 is much harder to recognize until you carefully examine section 9.2 and figure 9-1 in the datasheet.

Figure 9-1 shows a circuit which appears to add a latency of two clock cycles to pin change interrupts. There is no written description for the circuit, and the external interrupt details in section 9.2 of the datasheet state, "Pin change interrupts on PCINT[5:0] are detected asynchronously." Since pin change interrupts can be used to wake the part from power-down sleep mode when all clocks are disabled, they must be detected asynchronously during power-down sleep. To determine when they are detected synchronously requires testing.

To test the interrupt latency I wrote a program in assembler that can generate low pulses of different lengths using PWM. I chose not to write the program in C because I want to be able to measure the interrupt latency down to a single cycle. On the t13, PB1 is the pin for INT0, PCINT1, and OC0B. By using OC0B to generate a low pulse on PB1, I'll be able to trigger INT0 and PCINT1 without any external connections. When the interrupt is triggered, it should take four cycles to execute the code at the interrupt vector. That code is an rjmp to the interrupt function, and that rjmp takes two additional clock cycles. For the best-case latency, the first instruction in the interrupt function will execute six cycles after the interrupt is triggered.

The first instruction of the interrupt function checks the state of the pin that triggered the interrupt (the "sbic" instruction). If the pin is low, it skips the next instruction, then goes into an infinite loop. If the pin is high, it toggles the LED pin. Since the PWM is configured to generate a low pulse, if the pulse has ended before the sbic, the LED will light up to indicate the interrupt response time was too slow. The length of the pulse is one cycle longer than the value stored in OCR0B, which is done at lines 28 and 29. My testing consisted mainly of modifying the OCR0B value, then building and flashing the modified code to the AVR.

Results

As expected INT0 latency is 4 clock cycles from the end of the currently executing instruction. This means that if the interrupt occurs during the first cycle of a call instruction which takes 3 cycles, the interrupt response time will be 6 cycles. For pin change interrupts, the latency is 6 cycles, indicating the synchronizer circuit adds 2 cycles of latency. In idle sleep mode, both INT0 and PCINT latency is 8 cycles, indicating pin change interrupts operate asynchronously when the CPU clock is not running.

Better asserts in C with link-time optimization

2020-04-08T12:11:00.000-07:00

I've been a fan of link-time optimization for several years. I've been a fan of efficient programming for even longer. I was an early fan of C++ because features like function overloading made it easier to move decisions done at run-time in C to compile-time with C++. As C++ has become more complex over the decades, I've become less of a C++ fan, and appreciate the simplicity of C.

For small embedded systems like 8-bit AVRs and ARM M0, run-time error checking with assert() has minimal usefulness compared to UNIX, where a a core dump will help pinpoint the error location and the state of the program at the time of the error. Even if the usability problems were solved, real-time embedded systems may not be able to afford the performance costs of run-time error checking.

Both C++ and C support static assertions. Anyone who has tried to use static_assert likely has encountered "expression in static assertion is not constant" errors for anything but the simplest of checks. The limitations of static_assert is well documented elsewhere, so I will not go into further details in this post.

I had long understood that LTO allowed the compiler to evaluate expressions in code at build time, I never realized it's potential for static error checking. The idea came to me when looking at a fellow embedded developer's code for fast Arduino digital IO. In particular, Bill's code introduced me to the gcc error function attribute. The documentation describes the attribute as follows:

If the error or warning attribute is used on a function declaration and a call to such a function is not eliminated through dead code elimination or other optimizations, an error or warning (respectively) that includes message is diagnosed. This is useful for compile-time checking ...

Despite the fact that it seems the error attribute was introduced to address some of the limitations of static asserts, it doesn't seem to be commonly used. After some experimentation, I came up with a basic example.
pll.c:
__attribute((error("")))
void constraint_error(char * details);

volatile unsigned pll_mult;

void set_pll_mult(unsigned multiplier)
{
if (multiplier > 8) constraint_error("multlier out of range");
pll_mult = multiplier;
}

main.c:
extern void set_pll_mult(unsigned multiplier);

int main()
{
set_pll_mult(9);
}

$ gcc -Os -flto -o main *.c
In function 'set_pll_mult.constprop',
inlined from 'main' at main.c:6:5:
pll.c:9:25: error: call to 'constraint_error' declared with attribute error:
if (multiplier > 8) constraint_error("multlier out of range");
^
When set_pll_mult() is called with an argument greater than 8, a compile error occurs. When it is compiled with a valid multiplier, the "if (multiplier > 8)" statement is eliminated by the optimizer. One drawback to the technique is that the caller (main.c in this case) is not identified when the called function is not inlined. Increasing the optimization level to O3 may help to get the function inlined.

Building a better bit-bang UART - picoUART

2020-02-12T18:05:00.001-08:00

Over the past years, one of my most popular blog posts has been a soft UART for AVR MCUs. I've seen variations of my soft UART code used in other projects. When MicroCore recently integrated a modified version of my old bit-bang UART code, it got me thinking about how I could improve it.

There were a few limitations to my earlier UART code. One was that it didn't support baud rates below 19.2kbps at 8Mhz or baud rates below 38.4kbps at 16Mhz. It was also problematic for people that tried to integrate it into C/C++ libraries, as the code was written in AVR assembler. Another problem that was recently brought to my attention by James Sleeman, was that the UART receive didn't work well at moderately high baud rates such as 57.6kbps. Since my AVR skills had improved over time, I was confident I could make tangible improvements to the code I wrote in 2014.

The screen shot above is from picoUART running on an ATtiny13, at a baud rate of 230.4kbps. The new UART has several improvements over my old code. To understand the improvements, it helps to understand how an asynchronous serial TTL UART works first.

Most embedded systems use 81N communication, which means 8 data bits, 1 stop bit, and no parity. Each frame begins with a low start bit, so the total frame is 1 start bit + 8 data bits + 1 stop bit for a total of 10 bits. Frames can be sent back-to-back with no idle time between them. The data is sent at a fixed baud rate, and when either the receiver or transmitter varies from the chosen baud rate, errors can occur.

When it comes to the timing error tolerance of asynchronous serial communications, I've often read that somewhere between 2% and 3.5% timing error is acceptable. I've also read many "experts" claim that a micro-controller needs an accurate external crystal oscillator in order to avoid UART timing errors. The truth is that UART timing can be off by a total of over 5% without encountering errors. By total, I mean the sum of the errors for both ends, so if a transmitter is 2% fast, and the receiver is 2% slow, the 81N data frames can still be received error-free. The timing on a USB-TTL UART adapter is usually accurate to within 0.1%, so if I am sending data from an AVR that is running 3% slow, my PL2303HX adapter still receives it error-free.

If a frame is being transmitted at 57.6kbps, each bit should last 1000ms/57.6 = 17.36us. That means 17.36us after bringing the line low for the start bit, the least significant bit needs to be sent. A receiver will wait for the start bit to begin, wait another 17.36, and then wait for the middle of the first bit to sample the line. If the line is high, the bit is a 1, and it it is low, the bit is a zero. So the receiver will sample the first bit 1.5 * 17.36 = 26.04us after the line goes low to signal the start bit. The last(8th) bit will be sampled after 8.5 *17.36 = 147.56us. If the transmitter is to slow, and is still transmitting the 7th bit, it will cause a communication error, as the receiver will interpret the 7th bit as actually being the 8th bit. If the transmitter is still sending the 7th bit after 147.56us, then it is sending at 8/8.5 or 0.941 * 57.6 = 54.2kbps. Since many UARTs check for a valid stop bit, the maximum timing error is usually 9/9.5 or 94.7% of the baud rate.

The transmit timing of my earlier soft UART implementations is accurate to within 3 clock cycles. This was because the delay loop takes 3 clock cycles - one for decrement and two for the branch:

ldi delayArg, TXDELAY

TxDelay:

dec delayArg

brne TxDelay

And since delayArg is an 8-bit register, the maximum delay added to the transmission of each bit is 2^8 * 3 = 768 cycles. On a MCU running at 8Mhz, that limited the lowest baud rate to around 8000/768 or 10.4kbps. To allow for lower bit rates, picoUART needed to support longer delays. I also wanted to support more accurate timing, so picoUART uses __builtin_avr_delay_cycles during the transmission of each bit. The exact number of cycles to wait is calculated by some inline functions, which is a better way of doing the calculations than the macros I had used before. Writing picoUART in C made the timing calculations more difficult, since compiler has some flexibility in how the code is compiled to AVR machine instructions. In order to get avr-gcc to generate the exact sequence of instructions that I wanted, I had to use one inline asm statement. When I used a C "while" loop instead of the asm goto "brne" instruction, the loop was one cycle longer due to a superfluous compare instruction. Future versions of the compiler may have improved optimization and omit the compare, which would slightly impact the timing.

As with the transmit code, picoUART's receive code is accurate to within one cycle. Unlike my earlier UART code, picoUART returns after reading the 8th bit instead of waiting for the stop bit. Because of this change, picoUART begins by waiting for the line to be high before waiting for the start bit. Without the initial wait for high, back-to-back calls to purx() could lead an error when the 8th bit of one frame is 0(low) and gets interpreted as the start bit of the next frame. This change approximately triples the amount of time for the AVR to process each byte in a continuous stream of data.

My earlier UART code had two incompatible versions. One version used open-drain communication, where the transmit line is pulled high by an external resistor, and pulled low by the AVR. This version supported using a single wire for both receive and transmit. While it also worked with separate pins, some users found it inconvenient to add the pull-up resistor. Instead they would choose the "push-pull" version, where the AVR drives the line high and pulls it low. With picoUART a single version works for both use cases, because it works in "push-pull" mode only during transmit. When not actively transmitting, the IO pin is set to input mode with the internal pull-up activated.

I've tried to help both the noobs and experienced AVR developers. The noob can download a release zip file to add as an Arduino library. If you are an old AVR developer like me that prefers a keyboard over a mouse, you'll find a basic Makefile with the echo example. The default baud rate is 115.2kbps, although it is capable of accurate timing at much higher speeds such as 1mbps for an AVR running at 8Mhz. The default transmit is on PB0, with PB1 for receive. The defaults can be changed in pu_config.h, or with build flags like "-DPU_BAUD_RATE=230400L".