I've written a few software UARTs for AVR MCUs. All of them have bit-banged the output, using cycle-counted assembler busy loops to time the output of each bit. The code requires interrupts to be disabled to ensure accurate timing between bits. This makes it impossible to receive data at the same time as it is being transmitted, and therefore the bit-banged implementations have been half-duplex. By using the waveform generator of the timer/counter in many AVR MCUs, I've found a way to implement a full-duplex UART, which can simultaneously send and receive at up to 115kbps when the MCU is clocked at 8Mhz.
I expect most AVR developers are familiar with using PWM, where the output pin is toggled at a given duty cycle, independent of the code execution. The technique behind my full-duplex UART is using the waveform generation mode so the timer/counter hardware sets the OC0A pin at the appropriate time for each bit to be transmitted. TIM0_COMPA interrupt runs after each bit is output. The ISR determines if the next bit is a 0 or a 1. For a 1 bit, TCCR0A is configured to set OC0A on compare match. For a 0 bit, TCCR0A is configured to clear OC0A on compare match. The ISR also updates OCR0A with the appropriate timer count for the next bit. To allow for simultaneous receiving, the TIM0_COMPA transmit ISR is made interruptible (the first instruction is "sei").
The receiving is handled by PCINT0, which triggers on the received start bit, and TIM0_COMPB interrupt which runs for each received bit. I wrote this ISR in assembler in order to ensure the received bit is read at the correct time, taking into consideration interrupt latency. If any other interrupts are enabled, they must be interruptible (ISR_NOBLOCK if written in C). I've implemented a two-level receive FIFO, which can be queried with the rx_data_ready() function. A byte can be read from the FIFO with rx_read().
The code is written to work with the ATtiny13, ATtiny85, and ATtiny84. Only PCINT0 is supported, which on the t84 means that the receive pin must be on PORTA. With a few modifications to the code, PCINT1 could be used for receiving on PORTB with the t84. The total time required for both the transmit and the receive ISRs is 52 cycles. Adding an average interrupt overhead of 7 cycles for each ISR means that there must be at least 66 cycles between bits. At 8Mhz this means the maximum baud rate is 8,000,000/66 = 121kbps. The lowest standard baud rate that can be used with an 8Mhz clock is 9600bps.
The wgmuart application implements an example echo program running at the default baud rate of 57.6kbps. In addition to echoing back each character received, it prints out a period '.' every second along with toggling an LED.
I've published the code on github.