Tuesday, January 26, 2021

Quirks of the CH55x MCUs

Over the past several months, I've been been learning to use the CH551 and CH552 MCUs.  Learning generic 8051 programming was the easy part, as there is lots of old documentation available, with Philips having written some of the best.  The learning curve for WCH's additions to the MCS-51 architecture has been steeper, requiring careful reading of the datasheets, and reading the SDK headers and examples.  I've found that the CH55x chips have some quirks that I've never encountered on any other MCUs.

The GPIO modes are controlled by two registers: MOD_OC and DIR_PU.  The register values are explained in the datasheet and in ch554.h in the SDK.  Figure 10.2.1 in the datasheet shows a schematic diagram for the GPIO.  Modes 0, 1, and 2 are for high-Z input, push-pull, and open-drain respectively.  Mode 3, "standard 8051 mode" is the most complicated.  It's an open drain mode with internal pullup, but with the output driven high for two cycles when the GPIO changes from a 0 to a 1.  This ensures a fast signal rise time.  The part that took me the longest to figure out was the operation of the pullup.  The GPIO diagram shows 70k and 10k, but section 10 of the datasheet does not explain their operation.  Therefore I've highlighted a part of the schematic in green.  When the pin input schmitt trigger output is 1, the inverter in the top right of the diagram will output a low signal to turn on the pFET activating the 10k pullup.  When port input value is 0, only the weak 70k pullup is active.

The pullups aren't actually implemented as resistors on the IC.  They are specially-designed FETs with a high drain-source resistance (RDS).  Since RDS varies with gate-source voltage (Vgs), the pullup resistance will vary inversely with Vcc.  Using a 5V supply, the pullup resistance will be close to the 70k shown in the schematic.  Using a 3.3V supply, the pullup resistance is close to 125k.  Although it is not obvious, this information can be found in section 18 of the datasheet, with the specifications for IUP5 and IUP3.  These numbers are the amount of current a grounded pin will source when the pullup is enabled.

The reset pin has an internal pulldown, which seems to be weak like the GPIO pullups.  At times when working with a CH552 running at 3V3, the chip reset when I inadvertently touched the RST pin with my finger.  This was easily solved by keeping the RST pin shorted to ground.

The last issue I encountered is more of a documentation issue than a quirk.  The maximum reliable clock speed of an IC is depended on the supply voltage.  All of the AVR MCUs I've worked with have a graph in the datasheet showing the voltage required to ensure safe operation at a given speed.  For the CH55x MCUs, there is a subtle difference in the electrical specs at section 18 of the datasheet.  At 5V, total supply current at 24MHz is specified, whereas the specs for 3.3V specify total operating current at 16Mhz.  When I tried running a CH552T at 24MHz with a 3.3V supply, it never worked.  The same part worked perfectly at 16MHz.

Despite the quirks, I think the CH55x MCUs are still a good value.  Current quantity 10 pricing at LCSC is 36c for the CH552T, and 26c for the CH551G.  I recently purchased a small tube of the CH552T, and have plans to test the touch, ADC, PWM, and SPI peripherals.

Tuesday, January 19, 2021

GD32E230: a better STM32F0?


On my last LCSC order, I bought a few GD32E230 chips, specifically the GD32E230K8T6.  I chose the LQFP parts since I have lots of QFP32 breakout boards that I've used for other QFP32 parts.  Gigadevice is much better than many other Chinese MCU manufacturers when it comes to providing English documents.  After my past endeavors trying to understand datasheets from WCH and CHK, going through the Gigadevice documentation was rather pleasant.

Although Gigadevice makes no mention of any STM32 compatibility, but the first clue is the matching pinouts of the STM32F030 and GD32E230.  To prepare for testing, I tinned the pads on a couple of breakout boards, applied some flux, and laid the chips on the pads.  I laid the modules on a cast-iron skillet, and heated it up to about 240C.  The solder reflowed well, however I noticed some browning of the white silkscreen.  Next time I'll limit the temperature to 220C.  After testing for continuity and fixing a solder bridge, I was ready to try SWD.  I connected 3.3V power and the SWD lines, and ran "pyocd cmd -v":

0000710:INFO:board:Target type is cortex_m
0000734:INFO:dap:DP IDR = 0x0bf11477 (v1 MINDP rev0)
0000759:INFO:ap:AHB5-AP#0 IDR = 0x04770025 (AHB5-AP var2 rev0)
0000799:INFO:rom_table:AHB5-AP#0 Class 0x1 ROM table #0 @ 0xe00ff000 (designer=4 3b part=4cb)
0000812:INFO:rom_table:[0]<e000e000:SCS-M23 class=9 designer=43b part=d20 devtyp e=00 archid=2a04 devid=0:0:0>
0000823:INFO:rom_table:[1]<e0001000:DWT class=9 designer=43b part=d20 devtype=00 archid=1a02 devid=0:0:0>
0000841:INFO:rom_table:[2]<e0002000:BPU class=9 designer=43b part=d20 devtype=00 archid=1a03 devid=0:0:0>
0000848:INFO:cortex_m_v8m:CPU core #0 is Cortex-M23 r1p0
0000859:INFO:dwt:2 hardware watchpoints
0000866:INFO:fpb:4 hardware breakpoints, 0 literal comparators

I did little probing around the chip memory.  The GD32E23x user manual shows SRAM at 0x20000000, like STM32 parts.  The contents looked like random values, which I could overwrite using the pyocd "ww' command.  Writing to 0x20002000 resulted in a memory fault, indicating the part does not have any "bonus" RAM beyond 8kB.

Next, I tried using the built-in serial bootloader.  After connecting BOOT0 to VDD and connecting power, PA9 and PA10 were pulled high, indicative of the UART being activated.  However my first attempt at using stm32flash was not successful:

After attaching my oscilloscope, and writing a small bootloader protocol test program, I was able to determine that the responses did seem to conform to the STM32 bootloader protocol.  I did notice that the baud rate from the GD32E230 was only 110kbps, so it wasn't perfectly matching the 115.2kbps speed of the 0x7F byte sent for baud rate detection.  To avoid the potential for data corruption, I switched to 57.6kbps.  Before resorting to debugging the source for stm32flash, my test of stm32loader gave better results:
$ stm32loader -V -p com39
Open port com39, baud 115200
Activating bootloader (select UART)
*** Command: Get
    Bootloader version: 0x10
    Available commands: 0x0, 0x2, 0x11, 0x21, 0x31, 0x43, 0x63, 0x73, 0x82, 0x92, 0x6
Bootloader version: 0x10
*** Command: Get ID
Chip id: 0x440 (STM32F030x8)
Supply -f [family] to see flash size and device UID, e.g: -f F1

Next, I was ready to try flashing a basic program.  I first checked for GD32E support in libopencm3.  No luck.  Then as I read through the user manual, I noticed GPIOA starts at 0x4800 0000 on AHB2, the same as STM32F0 devices.  The register names didn't match the STM32, but the function and offsets were the same.  For example on the GD32E, the register to clear individual GPIOA bits is called GPIOA_BC, rather than GPIOA_BRR as it is called on the STM32.  The clock control registers, called RCU on the GD32E, also matched the STM32 RCC registers.  Since it was looking STM32F0 compatible, I tried flashing my blink example with stm32loader, and it worked!

The LED was flashing faster than it did with the STM32F030.  A little searching revealed that the ARM Cortex-M23, like the M0+, has a 2-stage pipeline.  The STM32F030 with it's M0 core has a 3-stage pipeline.  My delay busy loop needs to be four cycles per iteration, and on the M23, the bne instruction only takes two cycles.  My solution is adding a nop instruction based on an optional compile flag.

One problem I have yet to resolve with the GD32E is support for the bootloader Go/0x21 command.  With the STM32F0, I left BOOT0 high, and used DTR to toggle nRST before uploading new code.  The stm32flash "-g 0" option made the target run the uploaded code after flashing was complete.  I went back to debugging stm32flash, and discovered that it is hard-coded to use the "Get Version"/0x01 command, and silently fails if the bootloader responds with a NAK.  After a few mods to the source, I was able to build a version that works with the GD32E230, however the Go command still doesn't work.  Perhaps a task for a later date will be to hook up a debug probe to see what the E230 is doing when it gets the Go command.

Overall, I'm quite happy with the GD32E230K8T6.  They cost less than half the equivalent STM32 parts, and are even cheaper than other Chinese STM32 clones I've seen.  They are lower power and their maximum clock speed is 50% faster than the STM32F0.  In addition to the shorter 2-stage pipeline, the GD32E devices support single-cycle IO, making them faster for bit-banged communications than the STM32F0 which takes 2 cycles to write to a GPIO pin.  The GD32E230 also has some new features, which might be worth discussing in a future blog post.

Saturday, January 2, 2021

Trying to test a "ten cent" tiny ARM-M0 MCU part 2

After my first look at the HK32F030MF4P6, I wondered if the HK part, unlike the STM32F030 it is modeled after, does not have 5V tolerant IO.  I changed the solder jumpers to 3V3 on the CH552 module I'm using as a CMSIS-DAP adapter, which caused it to stop working.  This was because the CH552 requires a 5V supply in order to run reliably at 24Mhz.  After re-flashing the CMSIS-DAP firmware set to run at 16MHz, the module worked, and I was finally able to talk to the HK MCU via SWD.

In the screen shot above, I chose the stm32f051 target because pyocd does not have the HK MCU nor the STM32F030 among it's builtin targets.  For basic SWD communications, the target option is not even necessary.  With the target specified, it's possible to specify peripheral registers by name, rather than having to specify a memory address to read or write.

In the screen shot above, I'm using the "connect_mode" option to bring the nRST line low on the target device when entering debug mode.  Usually this is not necessary for SWD, however some of the probing I did would cause the MCU to crash.  This required a power cycle or reset to restore communications via SWD.

The first tests I did with the HK MCU were to probe the flash and RAM.  The HK datasheet shows the flash at address 0.  In the STM32F0, the flash is at address 0x8000000, and is mapped to address 0 when the boot0 pin is low.  Although the HK MCU doesn't have a boot0 pin, data at address 0x8000000 is mirrored at address 0 as well.  What was most unusal about the HK MCU is that the flash was not erased to all 0xFF as is typical with other flash-based MCUs.  Most of the flash contents was zeros, except for some data at address 0x400, which was the same on the 2 MCUs I checked:

By writing to memory starting at 0x20000000 using the 'ww' command, I discovered that the MCUs I received have 4kB or RAM, rather than the 2kB specified in the datasheet.  Writing to 0x20001000 (beyond 4kB) results in a crash.

For writing and erasing the flash, I initially tried using the pyOCD 'erase' and 'flash' commands.  Since the MCU flash interface is not part of Cortex-M specification, the flash interface peripheral will vary from one MCU vendor to the next.  The flash interface on the STM32F051 is almost identical to the flash interface on the STM32F030, however the 'erase' and 'flash' commands caused the HK MCU to crash when I ran them.  Testing on a genuine STM32F030 crashed as well, and after some debugging and reading through the pyOCD code, I realized the STM32F051 flash routines need 8kB of RAM.  Even after downloading and installing the STM32F0 device pack, I could not erase or flash the HK MCU.

Next I reviewed the STM32F030 programming manual, and tried to access the flash peripheral registers directly.  This was when I found a pyOCD bug with the wreg command.  I was able to unlock the flash by writing the magic sequence of 0x45670123 followed by 0xCDEF89AB to flash.keyr.  I tried erasing the first page at address 0, and although flash.sr and flash.cr updated as expected, the memory contents did not change.  What did work was erasing the page at address 0x8000000, which cleared the contents at address 0 as well.  I still find it strange that the erase operation sets all bits to 0 instead of 1.  The HK datasheet says a flash page is 128 bytes, and erasing a page resulted in 128 bytes set to all zero.

I was only partially successful in writing data to the flash.  Writing to 0x8000000 did not work, however writing a 16-bits to address 0 using the 'wh' command was successful.  Trying to write 16-bits to address 2 updated the flash.ar and flash.sr as expected, but did not change the data.  Writing to any 4-byte aligned address in the erased page worked, but writing to addresses that were only 2-byte aligned left all 16 bits at zero.  I tried writing bytes with 'wb' and full words with 'ww', both of which crashed the MCU, likely from a hard fault interrrupt.  I even made sure there isn't a bug with the 'wh' command by writing 16-bits at a time to RAM.

While searching the CHK website for more documentation, I found a page with IAR device packs.  Although pyOCD uses Kiel device packs, I downloaded the HK32F0 pack, which is a self-extracting RAR file, which saves the uncompressed files in AppData\Local\Temp\RarSFX0.

Since .pack files are just zip files with a different extension, I zipped the files back up as a .pack file.  However pyOCD couldn't read it: "0000731:CRITICAL:__main__:CMSIS-Pack './HK32F0.pack' is missing a .pdsc file".  Manually examining the files confirmed some of my earlier discoveries, such as flash at address 0x8000000, remapped to address zero.  I found a file named HK32F030M.svd, which contains XML definitions of the peripheral registers.  pyOCD's builtin devices appear to use svd files, so it may be possible to add HKD32F0 support to pyOCD.

Copies of the IAR support pack, datasheet, and pyocd page erase sequence can be found in my github repository.

Sunday, December 13, 2020

Trying to test a "ten cent" tiny ARM-M0 MCU


A few months ago, while browsing LCSC, I found a surprisingly cheap ARM M0 MCU.  At the time it was 16.6c in single-unit quantities, with no higher-volume pricing listed.  From the datasheet LCSC has posted, there was enough information in English to tell that it has 2kB RAM, 16kB flash, and runs up to 32MHz with a 1.8V to 3.6V power supply.  Although the part number suggests it may be a clone or is compatible with the STM32F030, it's not.  The part number for the STM32F030 clone is HK32F030F4P6.

Some additional searching brought me to some Chinese web sites that advertised the chip as a 32-bit replacement for the STM8S003.  The pinout matches the STM8S003F3P6, so in theory it is a drop-in replacement for the 8S003.  Unlike the STM32F0, it has no serial bootloader, so programming has to be done via SWD.  And with no bootloader support, there's no need to be able to remap the flash from 0x0800000 to 0x0000000 like the STM32.  A small change to the linker script should be all it takes to handle that difference.  Even though I wasn't sure how or if I'd be able to program the chips, I went ahead and ordered a few of them.  I already had some TSSOP20 breakout boards, so the challenge would be in the software, and the programming hardware.

Since I'm cheap, I didn't want to buy a dedicated DAPlink programmer.  I have a STM32F103 "blue pill", so I considered converting it to a black magic probe.  But since I've been playing with the CH554 series of chips, I decided to try running CMSIS-DAP firmware on a CH552.  If you're not familiar with CMSIS-DAP and SWD, I recommend Chris Coleman's blog post.  Before I tried it with with the HK32F030MF4P6, I needed to try it with a known good target.  Since I had recently been working with a STM32F030, that's what I chose to try first.

The two main alternatives for open-source CMSIS-DAP software for downloading, running, and debugging target firmware are OpenOCD and pyOCD.  pyOCD is much simpler to use than OpenOCD; after installing it with pip, 'pyocd list' found my CH552 CMSIS-DAP:

However that's as far as I could get with pyOCD.  There seems to be a bug in the CMSIS-DAP firmware or pyOCD around the handling of the DAP_INFO message.  Fixing the bug may be a project for another day, but for the time being I decided to figure out how to use OpenOCD.

To use OpenOCD, you need to create a configuration file with information about your debug adapter and target.  It's all documented, however it's very complicated given that OpenOCD does a whole lot more than pyOCD.  It's also complicated by the fact that since the release of v0.10.0, there have been updates that have made material changes to the configuration file syntax.  I had a working configuration file on Windows that wouldn't work on Linux.  On Linux I was running OpenOCD v0.10.0-4, but on windows I was running v0.10.0-15.  After installing the xPack project OpenOCD build on Linux, the same config file worked on both Linux and Windows, which I named "cmsis-dap.cfg":

adapter driver cmsis-dap

transport select swd
adapter speed 100

swd newdap chip cpu -enable
dap create chip.dap -chain-position chip.cpu
target create chip.cpu cortex_m -dap chip.dap

dap info

With dupont jumpers connecting SWCLK, SWDIO, VDD, and VSS on my STM32F030 breakout board, here's the output from openocd.

After making the same connections (factoring the different pinout) to the HK32F030MF4P6, I was getting no response from the MCU.  Before connecting, I had done the usual checks for shorts and continuity, making sure all my solder connections were good.  Next I tried just connecting VDD and VSS, while I probed each pin.  Pin 2, SWDIO, was pulled high to 3V3, as was nRST.  All other pins were low, close to 0V.  The STM32F030 pulls SWDIO and nRST high too.  I tried reconnecting SWDIO and SWCLK, and connecting a line to control nRST.  I added "reset_config trst_and_srst" to my config file, and still didn't get a response.  Looking at the debug output from openocd (-d flag) shows the target isn't responding to SWD commands:

Debug: 179 99 cmsis_dap_usb.c:728 cmsis_dap_swd_read_process(): SWD ack not OK @ 0 JUNK Debug: 180 99 command.c:626 run_command(): Command 'dap init' failed with error code -4

Since the datasheet says that after reset, pin 2 functions as SWDIO, and pin 11 functions as SWCLK, I'm at a bit of an impasse.  I'll try hooking up my oscilloscope to the SWDIO and SWCLK lines to make sure the signals are clean.  I've read that in some ARM MCUs, DAP works while the device is in reset, so I'll peruse the openocd docs to figure out how to hold nRST low while communicating with the target.  And of course, suggestions are welcome.

Before I finish this post, I wanted to explain the reference to a "ten cent" MCU.  LCSC does not list volume pricing for the part, but when I searched for the manufacturer's name, "Shenzhen Hangshun Chip Technology Development", I found an article about the company.  In the article, the company president, Liu Jiping, refers to the 10c ($0.1) price.  I suspect that pricing is for quantities over 1000.  Assuming these chips can actually be programmed with a basic SWD adapter, then even paying 20c for a 20-pin, 32MHz M0 MCU looks like a good deal to me.

Read part 2 to find out how I got SWD working.

Monday, December 7, 2020

STM32 Starting Small


For software development, I often prefer to work close to the hardware.  Libraries that abstract away the hardware not only use up limited flash memory, they add to the potential sources of bugs in your code.  For a basic test of STM32 library bloat, I compiled the buttons example from my TM1638NR library in the Arduino 1.8.13 IDE using stm32duino for a STM32F030 target.  The flash required was just over 8kB, or slightly more than half of the 16kB of flash specification on the STM32F030F4P6 MCU.  While I wasn't ready to write my own tiny Arduino core for the STM32F, I was determined to find a more efficient way of programming small ARM Cortex-M devices.

After a bit of searching, looking at Bill Westfield's Miimalist ARM project, libopencm3, and other projects, I found most of what I was looking for in a series of STM32 bare metal programming posts by William Ransohoff.  However instead of using an ST-Link programmer, I decided to use a standard USB-TTL serial dongle to communicate with the ROM bootloader on the STM32.

To enable the bootloader, the STM32 boot0 pin must be pulled high during power-up. then the bootloader will wait for communication over the USART Tx and Rx lines.  On the STM32F030F4P6, the Tx line is PA9, and the Rx line is PA10.  In order reset the chip before flashing, I also connected the DTR line from my serial module to NRST (pin 4) on the MCU as shown in the following wiring diagram:

For flashing the MCU, I decided on stm32flash.  While installation on Debian Linux is as simple as, "apt install stm32flash", I had some difficulty finding a recent Windows build.  So I ended up building it myself.  Although my build defaults to 115.2kbps, I found 230.4kbps completely reliable.  At 460.8kbps and 500kbps, I encountered intermittent errors, so I stuck with 230.4kbps.  After making the necessary connections, and before flashing any code to the MCU, do a test to confirm the MCU is detected.

One thing to note about stm32flash is that it does not detect the amount of flash and RAM on the target MCU.  The numbers come from a hard-coded table based on the device ID reported.  The official flash size in kB is stored in the system ROM at address 0x1FFFF7CC.  On my STM32F030F4P6, the value read from that address is 0x0010, reflecting the spec of 16kB flash for the chip.  My testing revealed that it actually has 32kB of usable flash.

I used William's STM32F0 GPIO example as a template to create a tiny blinky example that uses less than 300 bytes of flash.  Most of that is for the vector table, which on the Cortex-M0 has 48 entries of 4 bytes each.  To save space, I embedded the reset handler in an unused part of the vector table.  Since the blinky example doesn't use any interrupts, all but the initial stack pointer at vector 0 and the reset handler at vector 1 could technically be omitted.  I plan to re-use the vector table code for other projects, so I did not prune it down to the minimum.

The blinky example will toggle PA9 at a frequency of 1Hz.  That is the UART Tx pin on the MCU, which is connected to the Rx pin on the USB-TTL dongle.  This means when the example runs, the Rx LED on the USB-TTL dongle will flash on and off.

I think my next step in Cortex-M development will be to experiment with libopencm3.  It appears to have a reasonably lightweight abstraction of GPIO and some peripherals, so it should be easier to write code that is portable across multiple different ARM MCUs.

Monday, October 5, 2020

LGT8F328P EDMINI board

Earlier this year I purchased a EDMINI board from Electrodragon.  It uses a LGT8F328P chip, which supports the AVR instruction set.  The instruction set timings and peripheral registers vary slightly from the ATmega328P, so it is not 99% compatible as claimed by Electrodragon.  I bought one to see just how compatible it is, and possibly to port some of my AVR libraries to the LGT MCU.

The module arrived in an anti-static bag, inside a padded envelope.  After connecting 5V power to the board, the D13 LED blinked on and off every second, suggesting that it comes with the Arduino blink sketch pre-loaded.  I then hooked up a USB-TTL adapter, installed the LGT board file in the Arduino IDE, and tried flashing a modified blink sketch to the board.  The upload failed, and after some debugging I found that the reset was not working on the MCU.  Neither pressing and holding the reset button nor grounding RST would reset the board.  After contacting Electrodragon, Chao agreed replace the board, with two new boards.  He told me that they see a higher than average failure rate with the LGT8F328P chips.

In addition to Chao's frank comment about reliability, another concern I had about the LGT parts was the lack of markings on the chip.  I suspect LGT sells the parts without markings so vendors can label them with their own brand.  This also makes it easier for more nefarious manufacturers to label them as an ATmega328p.  

When the new boards arrived, the first thing I did was make sure the reset button worked.  After pressing reset the LED flashes quickly three times for the bootloader, and then flashes on and off every second.  However when I tried uploading sketch using the Arduino IDE, the upload still failed.  After some more debugging, I found I could upload if I pressed the reset button just before uploading.  This meant the bootloader was working, but auto-reset (toggling the DTR line) was not.  These boards use the same auto-reset circuit as an Arduino Pro Mini:

A negative pulse on DTR will cause a voltage drop on RST, which is supposed to reset the target.  When the target power is 5V and 3V3 TTL signals are used, toggling DTR will cause RST to drop from 5V to about 1.7V (5 - 3.3).  With the ATmega328P and most other AVR MCUs, 2V is low enough to reset the chip.  The LGT8F328P, however requires a lower voltage to reset.  In some situations this can be a good thing, as it means the LGT MCU is less likely to reset due to electromagnetic interference.

The EDMINI board has a 3V3 regulator which can be selected by a solder jumper.  This is mentioned on the Electrodragon site, but it is not clearly documented which pads need to be shorted to switch from 5V to 3V3.  After a bit of debugging I was able to run the board at 3V3, and was able to use the auto-reset feature.

I do most of my AVR development using command line tools, not the Arduino IDE.  I compiled a small program that toggles every pin on PORTB using avr-gcc 5.4.0, and flashed it to the EDMINI board using avrdude.  Nothing happened.  Since the Arduino blink sketch worked, I know that the LED on PB5 was working.  My conclusion is that the LGT Arduino core must do some setup to enable PORTB.  This is common on modern MCUs such as the ARM Cortex, but on AVRs like the ATmega328p, writing 255 to the PORTB and DDRB registers is all it takes to drive every pin on port B high.

I won't be doing any development work with the LGT MCUs.  Although they are cheaper and can run a bit faster than authentic AVR parts, their compatibility is rather limited.  Any code that relies on the standard AVR instruction set timing, such as my picoUART library, will not work.  The 8F328P cannot be programed with a USBasp, as the native programming interface is SWD, not Atmel's SPI-based protocol.  For a cheap and powerful MCU, the CH551 looks much more interesting.

Thursday, September 17, 2020

Recording the Reset Pin


The AVR reset pin has many functions.  In addition to being used as an external reset signal, it can be used for debugWire, and it is used for SPI and for high-voltage programming. Other than for when it is used as an external reset signal, the datasheet specifications are somewhat ambiguous.  I recently started working on an updated firmware for the USBasp, and wanted to find out more details about the SPI programming mode.  The image above is one of many recordings I made from programming tests of AVR MCUs.

When I first started capturing the programming signals, I observed seemingly random patterns on the MISO line before programming was enabled.  Although the datasheet lists the target MISO line as being an output, it only switches to output mode after the first two bytes of the "Programming Enable" instruction, 0xAC 0x53, are received and recognized.  Prior to that the pin floats, and the seemingly random patterns I observed were caused by the signals on the MOSI and SCK lines inducing a voltage on the MISO line.  I enabled the pullup resistor on the programmer side in order to keep the MISO line high until the PE instruction was recognized by the target.

One of the steps in the datasheet's serial programming alorithm that doesn't make sense to me is step 2, which says, "Wait for at least 20 ms and enable Serial Programming by sending the Programming Enable serial instruction to pin MOSI."  It's clear from the capture image above that a wait time of less than 100 us worked in this case.  I did a number of experiments with different targets (t13, t85, m8a) with and without the CKDIV8 fuse set, and found a delay of 64 us was always sufficient.  Nevertheless, I still used a 20 ms delay in the USBasp firmware.

Another observation I made was of a repeatable delay between the 8th rising edge of the SCK signal on the second byte and MISO going low.  After multiple tests, I found that delay is between 2 and 3 of the target clock cyles.  A close-up of the 0x53 byte shows this clearly:

The 2-3 clock ccyle delay seems to correspond with the datasheet's specification of the minimum low and high periods for the SCK signal of 2 clock cycles when the target is running at less than 12Mhz.  However I found I couldn't consistently get a target running at 8MHz to enter programming mode with a SCK clock of 1.5MHz.  Additional logs of the programming sequence revealed something interesting when multiple PE instructions are sent at less than 1/8th of the target clock rate, with a positive pulse on RST for synchronization.  In those sequences, the delay was smaller between the 8th rising edge of the SCK signal on the second byte and MISO going low for the second and subsequent times the PE instruction is sent.  It seems you need to use a slower SCK frequency to get the target into programming mode, but after that, the frequency can be increased to 1/4 of the target clock.

Using what I learned, I have implemented automatic SCK speed negotiation and a higher default SCK clock speed.  The speed negotiation starts with 1.5MHz for SCK, and makes 3 attempts to enter programming mode.  If that fails, the next slower speed (750kHz) is tried three times, and so on until a speed is found where the target responds.  For subsequent communications with the target, the speed is doubled, since the slowest speed is only needed the first time the PE command is received after power-up.  The firmware also supports a maximum SCK frequency of 3MHz, vs 1.5MHz for the original firmware.

The higher speeds don't make a large difference in flash/verify times since the overhead of the vUSB code tends to dominate beyond a SCK frequency of 750kHz or so.  Reading the 8kB of flash on an ATtiny85 takes around 3 seconds.  By optimizing the low-speed USB code, such as was done by Tim with u-wire, it should be possible to double that speed.