Monday, July 28, 2014

RK2928 wireless TV dongle

I recently purchased a wireless TV dongle for $18 (10% off the regular $20 price).  Now they're even selling for $16 on Aliexpress.  For power a microUSB-USB cable is included to plug it into a USB port on the TV or into a USB power supply.  If your TV supplies 5V power to the HDMI connector (most TVs don't), the dongle will draw power directly from the HDMI port.

It came with a single sheet double-sided "user guide".  There's no reference to the manufacturer, though after some searching I found it is functionally identical to the Mocreo MCast.  I found the setup somewhat confusing, as the dongle works in either miracast or DLNA/AirPlay mode.  Pressing the Fn button switches between modes.

The miracast mode is used to mirror your tablet or phone display to the TV.  Android 4.2 and above supports miracast.  In 4.4, it's a display option called "Cast screen".  The dongle appears as a wifi access point (with an SSID of Lollipop), and to use miracast you must connect to this access point.  This would be quite useful for presentations.  I used to do corporate training, and a dongle that can plug into the back of a projector avoids the problems associated with long VGA or HDMI cables.  Miracast is not fast enough for smooth video playback - for that you need UPnP/DLNA.

To setup DLNA, it is necessary to first connect to the Lollipop access point, and then browse to the IP address (192.168.49.1) of the dongle.  The configuration page allows you to scan for your wifi router, and provide the password to connect.  When you are done, you'll have to switch your tablet connection to your wifi router.

At this point, if you don't have a UPnP/DLNA server and control point, you won't be able to do much with the dongle, since it's not Chromecast compatible.  XBMC is a popular DLNA server, and even Windows 7 includes a DLNA server.  Media players like the old Seagate Theatre+ will also work as a DLNA server if you have an attached hard drive.

Once you have a server, you'll also need a controller aka control point app for your android device.  The manual that came with my dongle recommended iMedia Share, but this app only supports sharing media that is already on your tablet.  Finding a decent app was rather frustrating, as the first couple free apps I tried, such as Allcast, are basically a teaser for the paid app.

After some searching I found Controldlna which does (mostly) work.  I was able to browse my DLNA server, and direct the dongle to stream video from the DLNA server.  The play/pause function in controldlna was flaky (frequently stopping the video rather than pause), so I had to use the dongle's web page controls.  Similar to the dongle's setup page, there's a page that has play/pause/stop buttons.

Playback of a 1mbps h.264 encoded HD video was very smooth.  There was a problem with the aspect ratio though.  The video was 2.25:1 aspect ratio, but the dongle displayed it at full screen 16:9, making the video look vertically stretched.

What is lacking is the ability to browse online videos (like youtube) and direct the dongle to play them.  The DLNA protocol supports arbitrary URLs, so the only barrier to playing online video is a control point app that allows selecting videos from the web.  If I can't find one, it may be time to see how my Java coding experience translates into writing Android apps.

The dongle has lots of potential, but the software is lacking at this point.  Although it's not something for your average person who wants to watch digital video on their TV, for the technical folks I think it's worth the money.

Tuesday, July 15, 2014

Testing 433Mhz RF antennas with RTL-SDR

A couple months ago I picked up a RTL2832U dongle to use with SDR#.  I've been testing 433Mhz RF modules, and wanted to figure out what kind of wire antenna works best.

Antenna theory is rather complicated, and designing an efficient antenna involves a number of factors including matching the output impedance of the transmitter.  Since I don't have detailed specs on the RF transmitter modules, I decided to try a couple different antenna designs, and use RTL-SDR to measure their performance.

I started with a ~16.5cm (6.5" for those who are stuck on imperial measurements) long piece of 24awg copper wire (from a spool of ethernet cable).  One-quarter wavelength would be 17.3cm (300m/433.9Mhz), however a resonant quarter-wave monopole antenna is supposed to be slightly shorter.  I started up SDR#, turned off AGC and set the gain fixed at 12.5dB.  The signal peaked at almost -10db:

The next thing I tried was coiling the antenna around a pen in order to make it a helical antenna.  This made the performance a lot (>10dB) worse:

I also tried a couple uncommon variations like a loop and bowtie antenna.  All were worse than the monopole.

The last thing I tried was a dipole, by adding another 16.5cm piece of wire soldered to the ground pin on the module.This gave the best performance of all, nearly 10dB better than the monopole.  An impedance-matched half-wave dipole is supposed to have about 3dB wrose gain than a quarter-wave monopole.  Given the improvement, I suspect the output impedance on the 433Mhz transmit modules is much closer to the ~70Ohm impedance of a half-wave dipole than it is to the ~35Ohm impedance of a quarter-wave monopole.

Have any other ideas on how to improve the antenna design?  Leave a comment.

Last minute update: I tried a 1/4-wave monopole wire antenna on the RTL dongle, and got 2-3dB better signal reception at 433Mhz than the stock antenna.  I tried a full-wave (69cm) wire antenna, and it performed better than the stock antenna, but slightly worse than the 1/4-wave monopole.

Controlling HD44780 displays

This post is a follow-on to my earlier post, What's up with HD44780 LCD displays?


A lesson in reading datasheets

A search for details on programming the HD44780 will result in many different ways of doing it.  I think one of the reasons is that datasheets are often ambiguous.  I'd say the HD44780U datasheet is not only ambiguous, it's not well organized.  When trying to understand the bus timing characteristics, you have to flip back and forth between the tables on pg. 52 and the diagrams on pg. After doing that too many times, I added the timing to the diagram:






When working with MCUs clocked up to 20Mhz, the minimum instruction time is 50ns.  Therefore if one instruction sets the R/W line low, and the next instruction sets E high, there will be at least 50ns between the two events.  Therefore when writing control code, it is safe to ignore tAS, tAH, and tH, which I've written in green.  If the next instruction after setting E high sets it back to low, the Pulse Width for E High (PW-EH) will be only 50ns.  To ensure the minimum pulse width is met, E should be kept high for at least 5 instruction times at 20Mhz or at least 4 instruction times at 16Mhz.

When analyzing the datasheet, it's helpful to engage in critical thinking, even if the author of the datasheet didn't!  The datasheet shows what timing is sufficient, however it's not completely clear on what timing is necessary.  For example, look at the RS line.  The timing diagram indicates it is sufficient to set the RS line 40ns before the E pulse, and hold it for 10ns after the E pulse.  Having a good idea of how the chip works based on the block diagram on pg 3, I'd say it's not necessary to assert RS until just before the falling edge of the E pulse.

One of the most frequent problems people seem to have controlling these devices is the initialization sequence.  In this matter the datasheet is not only unclear on what is necessary, it is even contradictory in places.  After reading about the problems people have encountered and experimenting with the devices myself, I believe I can condense what's sufficient to initialize the devices down to 6 steps:
  1. wait 15ms
  2. send command (1 E pulse) to set 8-bit interface mode
  3. wait 65us
  4. send command (1 E pulse) to set 8-bit interface mode
  5. wait 65us
  6. send command (1 E pulse) to set 4-bit interface mode
The first 15ms wait is from pg 23 of the datasheet referring to start-up initialization taking 10ms. This is probably dependent upon the internal oscillator frequency, which is typically 270kHz.  Page 55 of the datasheet shows how the frequency depends upon the voltage and the value of the external oscillation resistor Rf.
The modules have a 91k resistor on the back in the form of a small SMD part marked 913 (91 x 10^3).  With 5V power the minimum frequency would be 200kHz, or 35% slower than the typical timing listed in the datasheet.  I've put a red dot on the 3V graph to show what can happen if you try to a module at 3V that has 91k for Rf; the frequency could be as low as 150kHz, so commands could take almost twice as long as the typical values listed in the datasheet.  I bet this is one of the reasons people sometimes have problems using these displays - if the controlling code is based on the minimum frequency at 5V and the device is run at 3V, it may fail to work.

Peter Fleury's HD44780 library, and some others wait longer than the datasheet specified times to cover these differences.  For instructions the typical time required is 37us, so waiting 65us should be a safe value.  I based this on the 200kHz minimum frequency at 5V with 30% added for an extra margin of error.

The reason for steps 2-6 is because the device could be in 4 or 8-bit mode when it powers up.  The datasheet says, "If the electrical characteristics conditions listed under the table Power Supply Conditions Using Internal Reset Circuit are not met, the internal reset circuit will not operate normally and will fail to initialize the HD44780U."  All the devices I've seen do not initialize as described on pg. 23 of the datasheet.  I suspect they use cheaper clones of the HD44780U that didn't bother with the internal reset circuit.

If the device starts up in 4-bit mode, it will take two pulses on the E line to read 8 bits of an instruction.  The lower 4 bits of the instruction to set 4 or 8-bit mode do not matter.  By setting the high nibble (D4-D7) to the instruction to set 8-bit mode and then toggling E twice, it will either be processed as the same instruction twice in 8-bit mode, or one instruction in 4-bit mode.

Another HD44780 AVR library was written by Jeorg Wunsch.  The delays between instructions is 37us, so it is likely to have timing problems with displays that are running at less than the typical 270kHz frequency.  Both Peter's and Jeorg's code can write a nibble of data at a time.  Here's the code from Peter's library:
        dataBits = LCD_DATA0_PORT & 0xF0;
        LCD_DATA0_PORT = dataBits |((data>>4)&0x0F);

Corruption can occur if an ISR runs which changes the state of one of the high bits of LCD_DATA_PORT while that section of code executes.  In my LCD control code if LCD_ISR_SAFE is defined, interrupts are disabled while the nibble is written.  Another difference with my library code is it doesn't use the R/W (read/write) line.  Since the initialization code can't read the busy flag and has to used timed IO, there's almost no extra code to make all of the IO timed.  Overall the code is much smaller without reading the busy flag, and needs 6 instead of 7 IO pins to control the LCD.  Just short the RW line (pin 5 on the LCD module) to ground.  No power is wasted since there's no pull-up resistor on the RW line.

I did not use any explicit delay between turning on and off the E line.  Looking at the disassembled code, the duration of the E pulse is 7 CPU cycles, which would ensure the E pulse is at least 230ns even on an AVR overclocked to 30Mhz.   In addition to the control code, I've written a small test program.

test_lcd.c
lcd.c

Thursday, July 10, 2014

What's up with HD44780 LCD displays?

There's lots of projects and code online for using character LCD displays based on these controllers, particularly the ones with 2 rows of 16 characters (1602).  They're low power (~1mA @5V), and for only $2 each, they're the cheapest LCD modules I've found.  The controllers are over 20 years old, so as a mature technology you might think there's not much new to learn about them.  Well after experimenting with them for a few days, I've discovered a few things that I haven't seen other people discuss.


Before getting into software, the first thing you need to do after applying power is set the contrast voltage (pin3).  The amount of contrast is based on the difference between the supply voltage(VDD) and VE.  The modules have a ~10K pullup resistor on VE (pin3), so with nothing attached to it there is no display.  If VE is grounded when VDD is 5V, the contrast can be too high and you may only see black blocks.  With a simple 1N4148 diode between VE and ground, there's 0.6V on VE, and a good combination of contrast and viewing angle.

Like many other projects, I chose to use the display in 4-bit (nibble) mode, saving 4 pins on the Pro Mini.  There's also more software available to drive these displays in nibble mode than byte mode.  I like to keep wiring simple, so I spent some time figuring out the easiest way to connect the LCD module to my Pro Mini board.  After noticing I could line up D4-D7 on the module with pins 4-7 of the Pro Mini, here's what I came up with (1602 module on the left and the Pro Mini on the right):
It fits on a mini-breadboard and only requires 3 jumper wires - one for ground, one for power, and one for RS (connecting to pin 2 on the Pro Mini).  If you use the pro mini bootloader to program the chip, you may have to temporarily unplug the LCD since it connects to the UART lines.  If you use a breadboard programming cable to flash the AVR using SPI, then you can leave the module in.

These modules are also available with LED backlights powered from pin 15 and 16.  Those pins line up with pins 8 and 9 on the Pro Mini, which could be used to control the backlight.

Power Usage

A datasheet I found for a 1602 LCD module lists the power consumption as 1.1mA at 3V.  To measure the actual power usage, I put a 68-Ohm resistor in series with the 5V supply, and connected a 270 Ohm resistor between Gnd & VE.  The voltage drop on the power line was 45mV, and solving for I in V=IR means 0.66mA of current.  The voltage drop across the VE resistor was 120mV, so 2/3 of the power consumption is from the VE current, with an internal pullup resistance of 11.2K.  Most circuits I've seen for these modules recommend a 10K trimpot for controlling VE, which would add another 500uA (5V/10K) to the power consumption.

The 10K pullup resistors on the data and RS lines are another factor in power consumption.  If the AVR pins are left in output mode, four data lines and RS set low will draw a total of 2.5mA.  A good HD44780 library will set the AVR pins on those lines high (or to input mode) when not in use.  Speaking of software, it's a good point to finish this post and start on my next post about AVR software to control these displays.

Wednesday, July 9, 2014

Writing AVR interrupt service routines in assembler with avr-gcc

For writing AVR assembler code, there are two free compilers to choose from.  The Atmel AVR Assembler and the Gnu assembler.  While they both support the same instruction set, there are some differences in assembler syntax that differ between the two.  The assembler is included with gcc packages like the Atmel AVR toolchain, so if you're already writing AVR programs in C or C++, then you don't need to install anything extra in order to start writing in assembler.

Documents you should refer to for writing interrupts are the avr-libc manual entry for interrupt.h and the data sheet for the AVR MCU you are using.  I'll be using an Arduino Pro Mini clone to write a timer interrupt, which uses the Atmega328p MCU.

The purpose of the ISR I'm wiring is to maintain a system clock that counts each second.  I'll use timer/counter2, which supports clocking off an external 32kHz watch crystal or the system clock oscillator.  For now I'll write the code based on running off the Pro Mini's 16Mhz external crystal.

With a 16Mhz system clock and a 8-bit timer, it's impossible to generate an interrupt every second.  Using a prescaler of 256 and a counter reload at 250, an interrupt will be generated every 4ms, or 250 times a second.  Every 250th time the ISR gets called it will need to increment a seconds counter.  The efficiency of the setup code doesn't matter, so I've written it in C:
    // normal mode, clear counter when count reached
    TCCR2A = (1<<WGM21);
    TCCR2B = CSDIV256;

    OCR2A = 250;                // reset timer when count reached
    TIMSK2 = (1<<OCIE2A);       // enable interrupt
    sei();

The first lines for the assembler source, as I do with all my AVR assembler programs, will be the following:
#define __SFR_OFFSET 0
#include <avr/io.h>

This will avoid having to use the _SFR_IO_ADDR macro when using IO registers.  So instead of having to write:
in r0, _SFR_IO_ADDR(GPIOR0)
I can write:
in r0, GPIOR0

The ISR needs to keep a 1-byte counter to count 250 interrupts before adding a second.  There's almost 32 million seconds in a year, so a 4-byte counter is needed.  These counters could be stored in registers or RAM.  In the avr-libc assembler demo project 3 registers are dedicated to ISR use, making them unavailable to the C compiler.  Instead of tying up 5 registers, the ISR will use the .lcomm directive to reserve space in RAM.  The seconds timer (__system_time) will be marked global so it can be accessed outside the ISR.
; 1 byte variable in RAM
.lcomm ovfl_count, 1

; 4 byte (long) global variable in RAM
.lcomm __system_time, 4
.global __system_time

As an 8-bit processor, the AVR cannot increment a 32-bit second counter in a single operation.  It does have a 16-bit add instruction (adiw), but not 32-bit.  So it will have to be done byte-by byte.  Since it doesn't have an instruction for add immediate, the quickest way to add one to a byte is to subtract -1 from it.  For loading and storing the bytes between RAM and registers, the 4-byte lds and sts instruction could be used.  Loading Z with a pointer allows the 2-byte ld and st instructions to be used, and making use of the auto-increment version of the instructions allows a single load/store combination to be used in a loop.  With that in mind, here's the smallest (and fastest) code I could come up with to increment a 32-bit counter stored in RAM:
    ldi ZL, lo8(__system_time)
    ldi ZH, hi8(__system_time)
loop:
    ld r16, Z
    sbci r16, -1                    ; subtract -1 = add 1
    st Z+, r16
    brcc loop

Since the 8-bit overflow counter and the seconds counter are sequential in memory, a reload of the Z counter can be avoided:
    ldi ZL, lo8(ovfl_count)
    ldi ZH, hi8(ovfl_count)
    ld r16, Z
    cpi r16, 250
    brne loop
    clr r16                    ; reset counter
loop:
    sbci r16, -1                    ; subtract -1 = add 1
    st Z+, r16
    ld r16, Z
    brcc loop

For testing the timer, the low byte of the seconds count is written to PORTB.  If it works, the LED on the Pro Mini's pin 13 (PB5) will toggle every 2^5 = 32 seconds.
    DDRB = 0xff;                // output mode
    while (1) {
        PORTB = __system_time & 0xff;
    }

After compiling, linking, and flashing the code to the Pro Mini, it didn't work - the LED on PB5 never flashed.  I looked at the disassembled code and couldn't find anything wrong.  To make sure the while loop was running, I added a line to toggle PB0 after the write to PORTB.  Flashing the new code and attaching an LED to PB0 confirmed the loop was running.  Adding code to set a pin inside the ISR confirmed it was running.  The problem was the counter wasn't incrementing.  After going over the AVR  instruction set again, I realized the mistake had to do with the inverted math.  When the value of ovfl_count in r16 is less than 250, the branch if not equal is taken, continuing execution at the sbci instruction.  However, since the carry flag is set by the cpi instruction when r16 is less than 250, the sbci instruction subtracts -1 and subtracts carry for  a net result of 0.  The solution I came up with was changing the loop to count from 6 up to 0:
    cpi r16, 0
    brne loop
    ldi r16, 6                    ; skip counts 1-6

With that change, it worked!
video
I cleaned up the code and posted t32isr.S and timer-asm.c to my google code repository.  In a future post I'll add some code to compute the date and time from the seconds count.

Friday, July 4, 2014

3 outlet 2-port 2.1A USB Charger

Over the past years, I've bought many cheap USB chargers.  As Ken Shirriff points out, they're rather low quality.  I find they're too noisy for powering small microcontroller projects, but for phone charging I've been quite satisfied with them considering the low cost.  The only problem I have with plug-in USB wall chargers in general is they tend to disappear - kids misplace their phone chargers, and taking someone else's is easier than searching for their's.  I've learned to have spares hidden away for this reason, but it's still a nuisance.  So I set out to find a more permanent solution.

A number of electrical equipment manufacturers offer AC outlets with USB charging ports.  At $20-$25, they are on the pricey side, and they're deeper than standard outlets and therefore likely require a bigger electrical box than the standard 12.5 cubic inch.  After some searching I decided on a plug-in model carried by Home Depot priced at CAD$12.88. It screws into the outlet after removing the cover plate, so it's unlikely to "walk away".

I tried to find other reviews of these, but was unsuccessful.  Newegg has a Rosewill branded version for $20, and I've seen a Targus branded version selling for $30!  Given the price I expected them to be better than the cheap USB chargers I had been buying, but you don't always get what you pay for.  Therefore I decided to do my own testing and post the results.

I don't have a nice Tek oscilloscope like Ken Shirriff, but my PC oscilloscope will suffice for detecting output noise.  I have a digital multimeter, so the only other thing I needed was a USB breakout cable and a power load.  I soldered some 24 awg wire to a USB male connector so I could plug the power lines into a breadboard.

I've seen a number of different cheap power load sources, from cement resistors to incandescent bulbs.  What I decided on instead was an old car stereo speaker.  It's large enough that I don't need to worry about heatsinks, and at around 4 ohms it would provide a 1.25 amp load on 5 volts.  I just soldered on a couple wires for plugging it into a small breadboard.

With no load, I measured 5.04V, and switching my multimeter to AC measured only 5mV of ripple.  Loaded I measured 4.9V, and AC ripple was still only 5mV.  After a minute or so the speaker magnet was barely warm to the touch, and measured 4.2Ohms resistance.  This means the charger easily puts out 1.17A of power.  The screen shots from Goldwave show the majority of the ripple around 36kHz:

I suspect that 36kHz is the switching frequency of the USB power supply.  Zooming in on the 36kHz noise shows it is a sine wave, with lower amplitude superimposed noise at 500Hz and 60Hz.

Overall I was satisfied with its performance.  If all you need is a dual-port USB charger there are cheaper options, but for something permanently attached, this it looks like a good deal.

Thursday, July 3, 2014

GCC LTO call graph generation

Since my first post about gcc link-time optimization, I've been doing more investigation into how it works.  Since I've written some Arduino libraries as a way to help out newbies to embedded development, I follow the Arduino developers list.  At the request of a few developers on the list, LTO was included in the arduino nightly beta.  Due to LTO builds of the USB Host Shield library being extremely large, Christian announced on June 30th that LTO would be disabled until the issue was figured out.  I decided to investigate to see what was happening.

Arduino doesn't use makefiles for building, and instead has a build process which allows for configuration of some of the build flags in a platform.txt file which is read on startup of the IDE.  Testing build flags involves editing the file, starting the IDE, and doing a build.  To speed up the process and to have complete control over the compile, archive, and link process I worked with avr-gcc on the command line, using the temporary files compiled by Arduino.

My first discovery was that when a library's .o files are included on the command line (instead of an archive file), LTO didn't work.  All the code from each .o was included in the binary, even code that was never used.  That's when I started researching how gcc's callgraph generation works.  The gcc internals documentation is good, though it still leaves unanswered questions.

The big question is how does it pick the root of the call graph?  I thought the starting point for the call graph analysis would be the C Runtime Initialization object file, which references main.  From there the linker could look at all the .o's to find main, and build the graph.  After lots of experiments, I discovered LTO may generate multiple call graphs for the same executable - in other words there can be many root nodes.  Every .o file specified on the command line will be used for root node candidates, and if there are only .a files specified, every .o in the first .a will be used for root node candidates.

When all the .o files for an Arduino library are included on the build line, gcc's LTO includes many functions that never get called from main.  Creating an archive with the .o files solves that problem.  That still doesn't result in optimal performance of LTO.  Since the Arduino IDE compiles all of it's core .o files into core.a, which is the first argument in the build line, all the .o files in core.a get included in the build.  Specifying main.cpp.o solves this problem.

$ avr-gcc -mmcu=atmega328p -Wl,-relax -flto -Os main.cpp.o hjrp.a core.a USB_Host_Shield_2.0-master/UHS.a -o USBHIDJ.elf
$ avr-size USBHIDJ.elf
   text    data     bss     dec     hex filename
  12530     194     460   13184    3380 USBHIDJ.elf

Building the same files without main.cpp.o specified (listing core.a first) was almost 3KB larger:

$ avr-size USBHIDJ.elf
   text    data     bss     dec     hex filename
  18152     204     493   18849    49a1 USBHIDJ.elf

For some further details on LTO check out Honza Hubička's Blog: