Tuesday, January 19, 2021

GD32E230: a better STM32F0?

 

On my last LCSC order, I bought a few GD32E230 chips, specifically the GD32E230K8T6.  I chose the LQFP parts since I have lots of QFP32 breakout boards that I've used for other QFP32 parts.  Gigadevice is much better than many other Chinese MCU manufacturers when it comes to providing English documents.  After my past endeavors trying to understand datasheets from WCH and CHK, going through the Gigadevice documentation was rather pleasant.

Although Gigadevice makes no mention of any STM32 compatibility, but the first clue is the matching pinouts of the STM32F030 and GD32E230.  To prepare for testing, I tinned the pads on a couple of breakout boards, applied some flux, and laid the chips on the pads.  I laid the modules on a cast-iron skillet, and heated it up to about 240C.  The solder reflowed well, however I noticed some browning of the white silkscreen.  Next time I'll limit the temperature to 220C.  After testing for continuity and fixing a solder bridge, I was ready to try SWD.  I connected 3.3V power and the SWD lines, and ran "pyocd cmd -v":

0000710:INFO:board:Target type is cortex_m
0000734:INFO:dap:DP IDR = 0x0bf11477 (v1 MINDP rev0)
0000759:INFO:ap:AHB5-AP#0 IDR = 0x04770025 (AHB5-AP var2 rev0)
0000799:INFO:rom_table:AHB5-AP#0 Class 0x1 ROM table #0 @ 0xe00ff000 (designer=4 3b part=4cb)
0000812:INFO:rom_table:[0]<e000e000:SCS-M23 class=9 designer=43b part=d20 devtyp e=00 archid=2a04 devid=0:0:0>
0000823:INFO:rom_table:[1]<e0001000:DWT class=9 designer=43b part=d20 devtype=00 archid=1a02 devid=0:0:0>
0000841:INFO:rom_table:[2]<e0002000:BPU class=9 designer=43b part=d20 devtype=00 archid=1a03 devid=0:0:0>
0000848:INFO:cortex_m_v8m:CPU core #0 is Cortex-M23 r1p0
0000859:INFO:dwt:2 hardware watchpoints
0000866:INFO:fpb:4 hardware breakpoints, 0 literal comparators

I did little probing around the chip memory.  The GD32E23x user manual shows SRAM at 0x20000000, like STM32 parts.  The contents looked like random values, which I could overwrite using the pyocd "ww' command.  Writing to 0x20002000 resulted in a memory fault, indicating the part does not have any "bonus" RAM beyond 8kB.

Next, I tried using the built-in serial bootloader.  After connecting BOOT0 to VDD and connecting power, PA9 and PA10 were pulled high, indicative of the UART being activated.  However my first attempt at using stm32flash was not successful:

After attaching my oscilloscope, and writing a small bootloader protocol test program, I was able to determine that the responses did seem to conform to the STM32 bootloader protocol.  I did notice that the baud rate from the GD32E230 was only 110kbps, so it wasn't perfectly matching the 115.2kbps speed of the 0x7F byte sent for baud rate detection.  To avoid the potential for data corruption, I switched to 57.6kbps.  Before resorting to debugging the source for stm32flash, my test of stm32loader gave better results:
$ stm32loader -V -p com39
Open port com39, baud 115200
Activating bootloader (select UART)
*** Command: Get
    Bootloader version: 0x10
    Available commands: 0x0, 0x2, 0x11, 0x21, 0x31, 0x43, 0x63, 0x73, 0x82, 0x92, 0x6
Bootloader version: 0x10
*** Command: Get ID
Chip id: 0x440 (STM32F030x8)
Supply -f [family] to see flash size and device UID, e.g: -f F1

Next, I was ready to try flashing a basic program.  I first checked for GD32E support in libopencm3.  No luck.  Then as I read through the user manual, I noticed GPIOA starts at 0x4800 0000 on AHB2, the same as STM32F0 devices.  The register names didn't match the STM32, but the function and offsets were the same.  For example on the GD32E, the register to clear individual GPIOA bits is called GPIOA_BC, rather than GPIOA_BRR as it is called on the STM32.  The clock control registers, called RCU on the GD32E, also matched the STM32 RCC registers.  Since it was looking STM32F0 compatible, I tried flashing my blink example with stm32loader, and it worked!

The LED was flashing faster than it did with the STM32F030.  A little searching revealed that the ARM Cortex-M23, like the M0+, has a 2-stage pipeline.  The STM32F030 with it's M0 core has a 3-stage pipeline.  My delay busy loop needs to be four cycles per iteration, and on the M23, the bne instruction only takes two cycles.  My solution is adding a nop instruction based on an optional compile flag.

One problem I have yet to resolve with the GD32E is support for the bootloader Go/0x21 command.  With the STM32F0, I left BOOT0 high, and used DTR to toggle nRST before uploading new code.  The stm32flash "-g 0" option made the target run the uploaded code after flashing was complete.  I went back to debugging stm32flash, and discovered that it is hard-coded to use the "Get Version"/0x01 command, and silently fails if the bootloader responds with a NAK.  After a few mods to the source, I was able to build a version that works with the GD32E230, however the Go command still doesn't work.  Perhaps a task for a later date will be to hook up a debug probe to see what the E230 is doing when it gets the Go command.

Overall, I'm quite happy with the GD32E230K8T6.  They cost less than half the equivalent STM32 parts, and are even cheaper than other Chinese STM32 clones I've seen.  They are lower power and their maximum clock speed is 50% faster than the STM32F0.  In addition to the shorter 2-stage pipeline, the GD32E devices support single-cycle IO, making them faster for bit-banged communications than the STM32F0 which takes 2 cycles to write to a GPIO pin.  The GD32E230 also has some new features, which might be worth discussing in a future blog post.

Saturday, January 2, 2021

Trying to test a "ten cent" tiny ARM-M0 MCU part 2

After my first look at the HK32F030MF4P6, I wondered if the HK part, unlike the STM32F030 it is modeled after, does not have 5V tolerant IO.  I changed the solder jumpers to 3V3 on the CH552 module I'm using as a CMSIS-DAP adapter, which caused it to stop working.  This was because the CH552 requires a 5V supply in order to run reliably at 24Mhz.  After re-flashing the CMSIS-DAP firmware set to run at 16MHz, the module worked, and I was finally able to talk to the HK MCU via SWD.

In the screen shot above, I chose the stm32f051 target because pyocd does not have the HK MCU nor the STM32F030 among it's builtin targets.  For basic SWD communications, the target option is not even necessary.  With the target specified, it's possible to specify peripheral registers by name, rather than having to specify a memory address to read or write.

In the screen shot above, I'm using the "connect_mode" option to bring the nRST line low on the target device when entering debug mode.  Usually this is not necessary for SWD, however some of the probing I did would cause the MCU to crash.  This required a power cycle or reset to restore communications via SWD.

The first tests I did with the HK MCU were to probe the flash and RAM.  The HK datasheet shows the flash at address 0.  In the STM32F0, the flash is at address 0x8000000, and is mapped to address 0 when the boot0 pin is low.  Although the HK MCU doesn't have a boot0 pin, data at address 0x8000000 is mirrored at address 0 as well.  What was most unusal about the HK MCU is that the flash was not erased to all 0xFF as is typical with other flash-based MCUs.  Most of the flash contents was zeros, except for some data at address 0x400, which was the same on the 2 MCUs I checked:

By writing to memory starting at 0x20000000 using the 'ww' command, I discovered that the MCUs I received have 4kB or RAM, rather than the 2kB specified in the datasheet.  Writing to 0x20001000 (beyond 4kB) results in a crash.

For writing and erasing the flash, I initially tried using the pyOCD 'erase' and 'flash' commands.  Since the MCU flash interface is not part of Cortex-M specification, the flash interface peripheral will vary from one MCU vendor to the next.  The flash interface on the STM32F051 is almost identical to the flash interface on the STM32F030, however the 'erase' and 'flash' commands caused the HK MCU to crash when I ran them.  Testing on a genuine STM32F030 crashed as well, and after some debugging and reading through the pyOCD code, I realized the STM32F051 flash routines need 8kB of RAM.  Even after downloading and installing the STM32F0 device pack, I could not erase or flash the HK MCU.

Next I reviewed the STM32F030 programming manual, and tried to access the flash peripheral registers directly.  This was when I found a pyOCD bug with the wreg command.  I was able to unlock the flash by writing the magic sequence of 0x45670123 followed by 0xCDEF89AB to flash.keyr.  I tried erasing the first page at address 0, and although flash.sr and flash.cr updated as expected, the memory contents did not change.  What did work was erasing the page at address 0x8000000, which cleared the contents at address 0 as well.  I still find it strange that the erase operation sets all bits to 0 instead of 1.  The HK datasheet says a flash page is 128 bytes, and erasing a page resulted in 128 bytes set to all zero.

I was only partially successful in writing data to the flash.  Writing to 0x8000000 did not work, however writing a 16-bits to address 0 using the 'wh' command was successful.  Trying to write 16-bits to address 2 updated the flash.ar and flash.sr as expected, but did not change the data.  Writing to any 4-byte aligned address in the erased page worked, but writing to addresses that were only 2-byte aligned left all 16 bits at zero.  I tried writing bytes with 'wb' and full words with 'ww', both of which crashed the MCU, likely from a hard fault interrrupt.  I even made sure there isn't a bug with the 'wh' command by writing 16-bits at a time to RAM.

While searching the CHK website for more documentation, I found a page with IAR device packs.  Although pyOCD uses Kiel device packs, I downloaded the HK32F0 pack, which is a self-extracting RAR file, which saves the uncompressed files in AppData\Local\Temp\RarSFX0.

Since .pack files are just zip files with a different extension, I zipped the files back up as a .pack file.  However pyOCD couldn't read it: "0000731:CRITICAL:__main__:CMSIS-Pack './HK32F0.pack' is missing a .pdsc file".  Manually examining the files confirmed some of my earlier discoveries, such as flash at address 0x8000000, remapped to address zero.  I found a file named HK32F030M.svd, which contains XML definitions of the peripheral registers.  pyOCD's builtin devices appear to use svd files, so it may be possible to add HKD32F0 support to pyOCD.

Copies of the IAR support pack, datasheet, and pyocd page erase sequence can be found in my github repository.


Sunday, December 13, 2020

Trying to test a "ten cent" tiny ARM-M0 MCU

 

A few months ago, while browsing LCSC, I found a surprisingly cheap ARM M0 MCU.  At the time it was 16.6c in single-unit quantities, with no higher-volume pricing listed.  From the datasheet LCSC has posted, there was enough information in English to tell that it has 2kB RAM, 16kB flash, and runs up to 32MHz with a 1.8V to 3.6V power supply.  Although the part number suggests it may be a clone or is compatible with the STM32F030, it's not.  The part number for the STM32F030 clone is HK32F030F4P6.

Some additional searching brought me to some Chinese web sites that advertised the chip as a 32-bit replacement for the STM8S003.  The pinout matches the STM8S003F3P6, so in theory it is a drop-in replacement for the 8S003.  Unlike the STM32F0, it has no serial bootloader, so programming has to be done via SWD.  And with no bootloader support, there's no need to be able to remap the flash from 0x0800000 to 0x0000000 like the STM32.  A small change to the linker script should be all it takes to handle that difference.  Even though I wasn't sure how or if I'd be able to program the chips, I went ahead and ordered a few of them.  I already had some TSSOP20 breakout boards, so the challenge would be in the software, and the programming hardware.

Since I'm cheap, I didn't want to buy a dedicated DAPlink programmer.  I have a STM32F103 "blue pill", so I considered converting it to a black magic probe.  But since I've been playing with the CH554 series of chips, I decided to try running CMSIS-DAP firmware on a CH552.  If you're not familiar with CMSIS-DAP and SWD, I recommend Chris Coleman's blog post.  Before I tried it with with the HK32F030MF4P6, I needed to try it with a known good target.  Since I had recently been working with a STM32F030, that's what I chose to try first.

The two main alternatives for open-source CMSIS-DAP software for downloading, running, and debugging target firmware are OpenOCD and pyOCD.  pyOCD is much simpler to use than OpenOCD; after installing it with pip, 'pyocd list' found my CH552 CMSIS-DAP:

However that's as far as I could get with pyOCD.  There seems to be a bug in the CMSIS-DAP firmware or pyOCD around the handling of the DAP_INFO message.  Fixing the bug may be a project for another day, but for the time being I decided to figure out how to use OpenOCD.

To use OpenOCD, you need to create a configuration file with information about your debug adapter and target.  It's all documented, however it's very complicated given that OpenOCD does a whole lot more than pyOCD.  It's also complicated by the fact that since the release of v0.10.0, there have been updates that have made material changes to the configuration file syntax.  I had a working configuration file on Windows that wouldn't work on Linux.  On Linux I was running OpenOCD v0.10.0-4, but on windows I was running v0.10.0-15.  After installing the xPack project OpenOCD build on Linux, the same config file worked on both Linux and Windows, which I named "cmsis-dap.cfg":

adapter driver cmsis-dap

transport select swd
adapter speed 100

swd newdap chip cpu -enable
dap create chip.dap -chain-position chip.cpu
target create chip.cpu cortex_m -dap chip.dap

init
dap info

With dupont jumpers connecting SWCLK, SWDIO, VDD, and VSS on my STM32F030 breakout board, here's the output from openocd.

After making the same connections (factoring the different pinout) to the HK32F030MF4P6, I was getting no response from the MCU.  Before connecting, I had done the usual checks for shorts and continuity, making sure all my solder connections were good.  Next I tried just connecting VDD and VSS, while I probed each pin.  Pin 2, SWDIO, was pulled high to 3V3, as was nRST.  All other pins were low, close to 0V.  The STM32F030 pulls SWDIO and nRST high too.  I tried reconnecting SWDIO and SWCLK, and connecting a line to control nRST.  I added "reset_config trst_and_srst" to my config file, and still didn't get a response.  Looking at the debug output from openocd (-d flag) shows the target isn't responding to SWD commands:

Debug: 179 99 cmsis_dap_usb.c:728 cmsis_dap_swd_read_process(): SWD ack not OK @ 0 JUNK Debug: 180 99 command.c:626 run_command(): Command 'dap init' failed with error code -4


Since the datasheet says that after reset, pin 2 functions as SWDIO, and pin 11 functions as SWCLK, I'm at a bit of an impasse.  I'll try hooking up my oscilloscope to the SWDIO and SWCLK lines to make sure the signals are clean.  I've read that in some ARM MCUs, DAP works while the device is in reset, so I'll peruse the openocd docs to figure out how to hold nRST low while communicating with the target.  And of course, suggestions are welcome.


Before I finish this post, I wanted to explain the reference to a "ten cent" MCU.  LCSC does not list volume pricing for the part, but when I searched for the manufacturer's name, "Shenzhen Hangshun Chip Technology Development", I found an article about the company.  In the article, the company president, Liu Jiping, refers to the 10c ($0.1) price.  I suspect that pricing is for quantities over 1000.  Assuming these chips can actually be programmed with a basic SWD adapter, then even paying 20c for a 20-pin, 32MHz M0 MCU looks like a good deal to me.


Read part 2 to find out how I got SWD working.


Monday, December 7, 2020

STM32 Starting Small

 

For software development, I often prefer to work close to the hardware.  Libraries that abstract away the hardware not only use up limited flash memory, they add to the potential sources of bugs in your code.  For a basic test of STM32 library bloat, I compiled the buttons example from my TM1638NR library in the Arduino 1.8.13 IDE using stm32duino for a STM32F030 target.  The flash required was just over 8kB, or slightly more than half of the 16kB of flash specification on the STM32F030F4P6 MCU.  While I wasn't ready to write my own tiny Arduino core for the STM32F, I was determined to find a more efficient way of programming small ARM Cortex-M devices.

After a bit of searching, looking at Bill Westfield's Miimalist ARM project, libopencm3, and other projects, I found most of what I was looking for in a series of STM32 bare metal programming posts by William Ransohoff.  However instead of using an ST-Link programmer, I decided to use a standard USB-TTL serial dongle to communicate with the ROM bootloader on the STM32.

To enable the bootloader, the STM32 boot0 pin must be pulled high during power-up. then the bootloader will wait for communication over the USART Tx and Rx lines.  On the STM32F030F4P6, the Tx line is PA9, and the Rx line is PA10.  In order reset the chip before flashing, I also connected the DTR line from my serial module to NRST (pin 4) on the MCU as shown in the following wiring diagram:

For flashing the MCU, I decided on stm32flash.  While installation on Debian Linux is as simple as, "apt install stm32flash", I had some difficulty finding a recent Windows build.  So I ended up building it myself.  Although my build defaults to 115.2kbps, I found 230.4kbps completely reliable.  At 460.8kbps and 500kbps, I encountered intermittent errors, so I stuck with 230.4kbps.  After making the necessary connections, and before flashing any code to the MCU, do a test to confirm the MCU is detected.

One thing to note about stm32flash is that it does not detect the amount of flash and RAM on the target MCU.  The numbers come from a hard-coded table based on the device ID reported.  The official flash size in kB is stored in the system ROM at address 0x1FFFF7CC.  On my STM32F030F4P6, the value read from that address is 0x0010, reflecting the spec of 16kB flash for the chip.  My testing revealed that it actually has 32kB of usable flash.

I used William's STM32F0 GPIO example as a template to create a tiny blinky example that uses less than 300 bytes of flash.  Most of that is for the vector table, which on the Cortex-M0 has 48 entries of 4 bytes each.  To save space, I embedded the reset handler in an unused part of the vector table.  Since the blinky example doesn't use any interrupts, all but the initial stack pointer at vector 0 and the reset handler at vector 1 could technically be omitted.  I plan to re-use the vector table code for other projects, so I did not prune it down to the minimum.

The blinky example will toggle PA9 at a frequency of 1Hz.  That is the UART Tx pin on the MCU, which is connected to the Rx pin on the USB-TTL dongle.  This means when the example runs, the Rx LED on the USB-TTL dongle will flash on and off.

I think my next step in Cortex-M development will be to experiment with libopencm3.  It appears to have a reasonably lightweight abstraction of GPIO and some peripherals, so it should be easier to write code that is portable across multiple different ARM MCUs.


Monday, October 5, 2020

LGT8F328P EDMINI board


Earlier this year I purchased a EDMINI board from Electrodragon.  It uses a LGT8F328P chip, which supports the AVR instruction set.  The instruction set timings and peripheral registers vary slightly from the ATmega328P, so it is not 99% compatible as claimed by Electrodragon.  I bought one to see just how compatible it is, and possibly to port some of my AVR libraries to the LGT MCU.

The module arrived in an anti-static bag, inside a padded envelope.  After connecting 5V power to the board, the D13 LED blinked on and off every second, suggesting that it comes with the Arduino blink sketch pre-loaded.  I then hooked up a USB-TTL adapter, installed the LGT board file in the Arduino IDE, and tried flashing a modified blink sketch to the board.  The upload failed, and after some debugging I found that the reset was not working on the MCU.  Neither pressing and holding the reset button nor grounding RST would reset the board.  After contacting Electrodragon, Chao agreed replace the board, with two new boards.  He told me that they see a higher than average failure rate with the LGT8F328P chips.

In addition to Chao's frank comment about reliability, another concern I had about the LGT parts was the lack of markings on the chip.  I suspect LGT sells the parts without markings so vendors can label them with their own brand.  This also makes it easier for more nefarious manufacturers to label them as an ATmega328p.  

When the new boards arrived, the first thing I did was make sure the reset button worked.  After pressing reset the LED flashes quickly three times for the bootloader, and then flashes on and off every second.  However when I tried uploading sketch using the Arduino IDE, the upload still failed.  After some more debugging, I found I could upload if I pressed the reset button just before uploading.  This meant the bootloader was working, but auto-reset (toggling the DTR line) was not.  These boards use the same auto-reset circuit as an Arduino Pro Mini:

A negative pulse on DTR will cause a voltage drop on RST, which is supposed to reset the target.  When the target power is 5V and 3V3 TTL signals are used, toggling DTR will cause RST to drop from 5V to about 1.7V (5 - 3.3).  With the ATmega328P and most other AVR MCUs, 2V is low enough to reset the chip.  The LGT8F328P, however requires a lower voltage to reset.  In some situations this can be a good thing, as it means the LGT MCU is less likely to reset due to electromagnetic interference.

The EDMINI board has a 3V3 regulator which can be selected by a solder jumper.  This is mentioned on the Electrodragon site, but it is not clearly documented which pads need to be shorted to switch from 5V to 3V3.  After a bit of debugging I was able to run the board at 3V3, and was able to use the auto-reset feature.

I do most of my AVR development using command line tools, not the Arduino IDE.  I compiled a small program that toggles every pin on PORTB using avr-gcc 5.4.0, and flashed it to the EDMINI board using avrdude.  Nothing happened.  Since the Arduino blink sketch worked, I know that the LED on PB5 was working.  My conclusion is that the LGT Arduino core must do some setup to enable PORTB.  This is common on modern MCUs such as the ARM Cortex, but on AVRs like the ATmega328p, writing 255 to the PORTB and DDRB registers is all it takes to drive every pin on port B high.

I won't be doing any development work with the LGT MCUs.  Although they are cheaper and can run a bit faster than authentic AVR parts, their compatibility is rather limited.  Any code that relies on the standard AVR instruction set timing, such as my picoUART library, will not work.  The 8F328P cannot be programed with a USBasp, as the native programming interface is SWD, not Atmel's SPI-based protocol.  For a cheap and powerful MCU, the CH551 looks much more interesting.

Thursday, September 17, 2020

Recording the Reset Pin

 


The AVR reset pin has many functions.  In addition to being used as an external reset signal, it can be used for debugWire, and it is used for SPI and for high-voltage programming. Other than for when it is used as an external reset signal, the datasheet specifications are somewhat ambiguous.  I recently started working on an updated firmware for the USBasp, and wanted to find out more details about the SPI programming mode.  The image above is one of many recordings I made from programming tests of AVR MCUs.

When I first started capturing the programming signals, I observed seemingly random patterns on the MISO line before programming was enabled.  Although the datasheet lists the target MISO line as being an output, it only switches to output mode after the first two bytes of the "Programming Enable" instruction, 0xAC 0x53, are received and recognized.  Prior to that the pin floats, and the seemingly random patterns I observed were caused by the signals on the MOSI and SCK lines inducing a voltage on the MISO line.  I enabled the pullup resistor on the programmer side in order to keep the MISO line high until the PE instruction was recognized by the target.

One of the steps in the datasheet's serial programming alorithm that doesn't make sense to me is step 2, which says, "Wait for at least 20 ms and enable Serial Programming by sending the Programming Enable serial instruction to pin MOSI."  It's clear from the capture image above that a wait time of less than 100 us worked in this case.  I did a number of experiments with different targets (t13, t85, m8a) with and without the CKDIV8 fuse set, and found a delay of 64 us was always sufficient.  Nevertheless, I still used a 20 ms delay in the USBasp firmware.

Another observation I made was of a repeatable delay between the 8th rising edge of the SCK signal on the second byte and MISO going low.  After multiple tests, I found that delay is between 2 and 3 of the target clock cyles.  A close-up of the 0x53 byte shows this clearly:


The 2-3 clock ccyle delay seems to correspond with the datasheet's specification of the minimum low and high periods for the SCK signal of 2 clock cycles when the target is running at less than 12Mhz.  However I found I couldn't consistently get a target running at 8MHz to enter programming mode with a SCK clock of 1.5MHz.  Additional logs of the programming sequence revealed something interesting when multiple PE instructions are sent at less than 1/8th of the target clock rate, with a positive pulse on RST for synchronization.  In those sequences, the delay was smaller between the 8th rising edge of the SCK signal on the second byte and MISO going low for the second and subsequent times the PE instruction is sent.  It seems you need to use a slower SCK frequency to get the target into programming mode, but after that, the frequency can be increased to 1/4 of the target clock.

Using what I learned, I have implemented automatic SCK speed negotiation and a higher default SCK clock speed.  The speed negotiation starts with 1.5MHz for SCK, and makes 3 attempts to enter programming mode.  If that fails, the next slower speed (750kHz) is tried three times, and so on until a speed is found where the target responds.  For subsequent communications with the target, the speed is doubled, since the slowest speed is only needed the first time the PE command is received after power-up.  The firmware also supports a maximum SCK frequency of 3MHz, vs 1.5MHz for the original firmware.

The higher speeds don't make a large difference in flash/verify times since the overhead of the vUSB code tends to dominate beyond a SCK frequency of 750kHz or so.  Reading the 8kB of flash on an ATtiny85 takes around 3 seconds.  By optimizing the low-speed USB code, such as was done by Tim with u-wire, it should be possible to double that speed.

Sunday, September 6, 2020

Flashing AVRs at high speed

 

I've written a few bootloaders for AVR MCUs, which necessarily need to modify the flash while running.  The typical 4ms to write or erase a page depends on the speed of the internal RC oscillator.  Here's a quote from section 6.6.1 of the ATtiny88 datasheet:

Note that this oscillator is used to time EEPROM and Flash write accesses, and the write times will be affected accordingly. If the EEPROM or Flash are written, do not calibrate to more than 8.8 MHz. Otherwise, the EEPROM or Flash write may fail.

I wondered how running the RC oscillator well above 8.8MHz would impact erasing and writing flash  In the past I read about tests showing the endurance of AVR flash and EEPROM is many times more than the spec, but I couldn't find any tests done while running the AVR at high speed.  I did come across a post from an old grouch on AVRfreaks warning not to do it, so now I had to try.

The result is a program I called flashabuse, which you'll see later is a bit of a misnomer.  What the program does is set OSCCAL to 255, then repeatedly erase, verify, write, and verify a page of flash.  I chose to test just one page of flash for a couple reasons.  First, testing all 128 pages of flash on an ATtiny88 would take much more time.  The second is that I would only risk damaging one page, and an ATtiny88 with 127 good pages of flash is still useful.

The results were very positive.  My little program was completing about 192 cycles per second, taking 2.6ms for each page erase or page write.  I let it run for an hour and a half, so it successfully completed 1 million cycles.  Not bad considering Atmel's design specification is a minimum of 10,000 cycles.

So why does the flash work fine at high speed?  I think it has to do with how floating-gate flash memory works.  Erasing and writing the flash requires removing and adding a charge to the floating gate using high voltages.  Atmel likely uses timing margins well in excess of the 10% indicated in the datasheet, so even half the typical 4ms is more than enough to ensure error-free operation.  I even think writing at high speed puts less wear on the flash because it exposes the gate to high voltages for a shorter period of time.

Addendum

I received some feedback questioning whether the faster write time may reduce retention due to reduced charge on the floating gate.  As I mentioned above, Atmel likely used a very large timing margin when designing the flash memory.  Chris Lamont, who tested flash retention on a PIC32, stated that retention failure is "extremely unlikely".

The retention specs for the ATtiny88 are, "20 years at 85°C / 100 years at 25°C".  As this Micron technical note (PDF) shows, retention specs are based on models, not actual testing.  Micron's JESD47I PCHTDR testing is done at 125C for 1000 hours, and requires 0 failures.  TEKMOS states, "As a very rough rule of thumb, the data retention time halves for every 10C rise in temperature."  Extrapolating from a 100-year retention at 25C, retention at 255C, a typical reflow soldering peak temperature, would be only 6 minutes.

In an attempt to show that retention is not impacted by repeated fast flashing, I performed two additional tests.  For the first test, I baked the subject MCU for 12 hours at 150C, then performed 100,000 fast write/erase cycles.  Next, 0x55 was written to the test page, and repeatedly verified for 2 hours.  This test passed with no errors.  For the second test, I filled the 8kB of flash with zeros to put a charge on the floating gate for every bit.  I then baked the subject MCU for 12 hours at 150C, then verified that all bits remained at zero.  This test passed with all 65,536 bits reading zero.  I did, however have a failure of one solder joint, likely due to the stress of thermal cycling.

For those who are particularly concerned paranoid about flash retention, one solution is refereshing the flash.  For an AVR MCU, it would be simple to refesh the flash on every bootup with a small segment of code in .init1.  The code would copy each page into the page buffer, then perform a write on the page.  This would refresh all the 0 bits, and extend the retention life for another 20 to 100 years.