Saturday, October 29, 2016

zcash mining

Zcash is the hottest coin this month, after going live on October 28th, following several of months of testing.  Zcash promises private transactions, so that they cannot be viewed on the public blockchain like bitcoin or ethereum.

I did not expect zcash mining to be immediately profitable, since mining rewards are being ramped up over the first month.  However the first hour of trading on Poloniex saw zcash (ZEC) trading at insane values of over 1000 bitcoin per ZEC.  Even after 24 hours, 1 ZEC is trading for about 6 BTC, or US$4300.  Despite the low mining reward rate, mining pool problems, and buggy mining software, I was able to earn 0.005 ZEC in one day with a couple rigs.

Zcash has both private address starting with "z", and public or transparent address starting with "t".  A bug in the zcash network software has meant problems with private transfers, so it is recommended for miners to use only transparent wallet addresses until the bug is fixed.  Miners using the "z" address have apparently had problems receiving their zcash payouts from mining pools.

I have been using eXtremal's miner version 0.2.2, which uses OpenCL kernels from the zcash open-source miner competition.  Windows and Linux binaries can be downloaded from, the pool the software is designed for.  I get the best performance with the silentarmy kernel, but with only one instance as running 2 instances results in a crash.  On Windows running driver version 16.10.1 I get about 26 solutions/s with a Rx 470.  Under Ubuntu with fglrx drivers I get about 11 solutions/s for both R7 370 and R9 380 cards.

I experimented with the worksize and threads values in config.txt, but was unable to improve performance compared to the default 256/8192.  Increasing the core clock on the R9 380 cards from 900Mhz to 1Ghz increased the performance by 3-4%.

Genoil has released a miner, but only Windows binaries with tromp's kernel at this time.  A version including silentarmy's kernel is in the works.

I was unable to find any zcash mining calculators, so I wrote a short python calculator.  Here's an example based on the network hashrate (in thousands) at block 1072, for a rig mining 140 solutions/s:
./ 1072 1840 140
Daily ramped mining reward in blocks: 308
Your estimated earnings: 0.0234347826087

At the current price of 6BTC/ZEC, the earnings work out to about US$100.  Even if the price drops to 3BTC/ZEC, the daily earnings are still more than double what the same hardware could make mining ethereum.  Apparently many other ethereum miners have realized this, since the ethereum network hashrate has dropped by about 25% in less than 30 hours.  I expect this trend to continue in the coming days, and eventually reach an equilibrium as the ZEC price continues to drop until it is below parity with BTC.

2016-10-30 update

Coinsforall is still having stability problems, and now 1 ZEC is worth about 1.2 BTC.  Therefore I've switched back to eth mining for all my cards except one Rx 470.  With Genoil's ZECminer I'm getting about 26 sol/s.  I started using, and after an hour of mining the pool has been stable.  Reported hashrate on the pool is about 12H/s, or half the solution rate as expected.

Sunday, September 18, 2016

Advanced Tonga BIOS editing

I recently decided to spend some time to figure out some of the low-level details of how the BIOS works on my R9 380 cards.  A few months ago I had found Tonga Bios Editor, but hadn't done anything more than modify the memory frequency table so the card would default to 1500Mhz instead of 1375.  My goal was to modify the memory timing and to reduce power usage.

The card I decided to test the memory timing mods on was a Club3D 4GB R9 380 with Elpida W4032BABG-60-F RAM.  Although the RAM is rated for 6Gbps/1.5Ghz, the default memory clock is 1475Mhz.  In my previous testing I found that the card was stable with the memory overclocked well above 1.5Ghz, but the mining performance was actually slower at 1.6Ghz compared to 1.5Ghz.  Unfortunately Tonga Bios Reader does not provide a way to edit the memory timings aka straps, so I'd have to use a hex editor.

I've highlighted the 1500Mhz memory timing in the screen shot above.  I found it by searching for the string F0 49 02, which you first have to convert from little-endian to get 249F0, and then from hex to get 150,000, which is expressed in increments of .01Mhz.  The timing for up to 1625Mhz (C4 7A 02) comes after it, and then 1750Mhz (98 AB 02).  The Club3D BIOS actually has 2 sets of timings, one for memory type 01 (the number after F0 49 02), as and for memory type 02 (not shown).  This is so the same BIOS can be used on a card that can be made with different memory.  Obviously one type of memory the BIOS supports is Elpida, and from comparing BIOS images from other cards, I determined that memory type 02 is for Hynix.

To reduce the chance of bricking my card, the first time I modified only the 1625Mhz memory timing.  Since the default memory timing is 1475Mhz, my modified timing would only be used when overclocking the memory over 1500Mhz.  So if the the card crashed on the 1625Mhz timing, it would be back to the safe 1500Mhz timing after a reboot.  To actually make the change I copied the 1500Mhz timing (starting with 77 71) to the 1625Mhz timing.  After the change, the BIOS checksum is invalid, so I simply loaded the BIOS in Tonga Bios Reader and re-saved it in order to update the checksum.

I used Atiflash 2.71 to flash the BIOS since I have found no DOS or Linux flash utilities for Tonga GPUs.  After flashing the updated BIOS, I overclocked the RAM to 1625Mhz, and my eth mining speed went from just under 21Mh to about 22.5Mh.  To get even faster timings, I copied the 1375Mhz timings from a MSI R9 380 with Elpida RAM to the Club3d 1625Mhz memory timing.  That boosted my mining speed at 1625Mhz to slightly over 23Mh

I then tried a number of ways to improve the timing beyond 1625Mhz, but I found nothing that was both stable and faster at 1700Mhz.  Different cards may overclock better, depending on both the GPU asic and the memory.  Hynix memory seems to overclock a bit better than Elpida, while Samsung memory, which seems rather rare on R9 380 cards, tends to overclock the best.  The memory controller on the GPU also needs to be able overclock from 1475Mhz.  Unlike the simple voltage modding the Hawaii BIOS, there is no easy way to modify the memory controller voltage (VDDCI) on Tonga.  The ability to over-volt the memory controller would make it easier to overclock the memory speed beyond 1625Mhz.

Since the Club3D BIOS supports both Elpida and Hynix memory, I improved the timing for both memory types.  This allows me to use a single BIOS image for cards that have either Elpida or Hynix memory.  It's also dependent on the card having a NCP81022 voltage controller, but all my R9 380 cards have the same voltage controller.  I've shared it on my google drive as 380NR.ROM if you want to try it (at the possible risk of bricking your card).  Atiflash checks the subsystem ID of the target card against the BIOS to be flashed, so it is necessary to use the command-line version of atiflash with the "-fs" option:
atiflash -p 0 380RN.ROM -fs

In addition to improving memory speeds, I wanted to reduce power usage of my 380 cards.  On Windows it is possible to use a tool like MSI Afterburner to reduce the core voltage (VDDC), but on Linux there is no similar tool.  To reduce the voltage in the BIOS, modify value0 in Voltage Table2 for the different DPM states.  After a lot of experimenting, I made two different BIOSes with different voltage levels since some cards under-volt better than others.  The first one has 975, 1050, and 1100 mV for dpm 5, 6, & 7, while the other has 1025, 1100, & 1150 mV.  These are also shared on my google drive as 380NR1100.ROM and 380NR1150.ROM.

With the faster RAM timing and voltage modifications I've improved my eth mining hashrates by about 10%, without any material change in power use.  I've tried my custom ROM on four different cards.  Although two of them seem to be OK with 900/1650Mhz clocks, I'm playing it safe and running all four at 885/1625Mhz.  If you are lucky and have a card that is stable at 925/1700Mhz, you can mine eth at almost 25Mh/s.  With most cards you can expect to get between 23 and 24Mh/s.

2021-09 Update

I no longer recommend BIOS modifications, as runtime modification of GPU settings is much safer.  For the past few years I have been using tools like UMR and ROCm-SMI.  For people who still wish to access my BIOS mod files for educational reasons, Google has changed shared drive links, so this is the new link.

Sunday, September 11, 2016

Hawaii BIOS voltage modding

When using Hawaii GPUs such as the R9 290 on Linux, aticonfig does not provide the ability to modify voltages.  Even under windows, utilities such as MSI Afterburner usually have limits on how much the GPU voltage can be increased or decreased.  In order to reduce power consumption I decided to create a custom BIOS with lower voltages for my MSI R9 290X.

The best tool I have found for Hawaii BIOS mods is Hawaii Bios Reader.  For reading and writing the BIOS to Hawaii cards, I use ATIFlash 2.71.  It woks from DOS, so I can use the FreeDOS image included in SystemRescueCD.

In the screen shot above, I've circled two voltages.  The first, VDDCI, is the memory controller voltage.  Reducing it to 950mV gives a slight power reduction.

The second voltage is the DPM0 GPU core voltage.  DPM0 is the lowest power state, when the GPU is clocked at 300Mhz, and powered at approximately 968mV.  I say approximately because the actual voltage seems to be close to the DPM0 value, but not always exact.  This may be related to the precision of the voltage regulator on the card, or the BIOS may be using more than just the DPM0 voltage table to control the voltage.  The rest of the DPM values are not voltages, but indexes into a table that has a formula for the BIOS to calculate the increase in voltage based on the leakage characteristics of the GPU.  I do not change them.

For reasons I have not yet figured out, the DPM0 voltage in each of the limit tables has to match the PowerPlay table.  After modifying the four limit tables, the BIOS can be saved and flashed to the card.

I've created modified BIOS files for a MSI R9 290X 4GB card with DM0 voltages of 868, 825, and 775.  With the 775mV BIOS I was able to reduce power consumption by over 20% compared to 968mV.

Wednesday, September 7, 2016

Monero mining on Linux

With Monero's recent jump in price to over $10, it's the new hot coin for GPU mining.  Monero has been around for a couple years now, so there are a couple options for mining.  There's a closed-source miner from Claymore, and the open-source miner from Wolf that I used.

I used the same Ubuntu/AMD rig that I set up for eth mining.  Building the miner took a couple updates compared to building ethminer.  First, since stdatomic.h is missing from gcc 4.8.4, you need to use gcc 5 or 6.  Second, jansson needs to be installed.  On Ubuntu the required package is libjansson-dev.  The default makefile uses a debug build with no optimization, so I modified the makefile to use O3 and LTO "OPT = -O3 -flto".  I've shared the compiled binary on my google drive.

To mine with all the GPUs on your system, you'll have to edit the xmr.conf file and add to the "devices" list.  The "index" is the card number from the output of "aticonfig --lsa".  Although the miner supports setting GPU clock rates and fan speeds, I prefer to use my aticonfig scripts instead.  It is also necessary to modify  "rawintensity" and "worksize" for optimal performance.  The xmr.conf included in the tgz file has the settings that I found work best for a R9 380 card clocked at 1050/1500.  For a R7 370 card, I found a rawintensity setting of 640 worked best, giving about 400 hashes per second.

Although Monero was more profitable to mine than ethereum for a few days, the difficulty increase associated with more miners has evened it out.  Dwarfpool has a XMR calculator that seems accurate.  The pool I used was, and instead of running the monero client, I created an account online using

Sunday, August 21, 2016

Ethereum mining on Ubuntu Linux

For a couple months, I've been intending to do a blog post on mining with Ubuntu.  Now that I've been able do make a static build of Genoil's ethminer, that process has become much easier.  Since I have no Nvidia GPUs, this post will only cover how to mine with AMD GPUs like the R7 and R9 series.

The first step is to download a 64-bit Ubuntu 14.04 desktop release.  I use the desktop distribution since it includes X11, although it is possible to use Ubuntu server and then install the X11 packages separately.  I recommend installing Ubuntu without any GPU cards installed (use your motherboard's iGPU), in order to confirm the base system is working OK.  Follow the installation instructions, and at step 7, choose "Log in automatically".  This will make it easier to have your rig start mining automatically after reboot.

After the initial reboot, I recommend installing ssh server.  It can be installed from the shell (terminal) with: "sudo apt-get install openssh-server -y".  Ubuntu uses mDNS, so if you chose 'rig1' as the computer name during the installation, you can ssh to 'rig1.local' from other computers on your LAN.

Shutdown the computer and install the first GPU card, and plug your monitor into the GPU card instead of the iGPU video port.  Most motherboards will default to using the GPU card when it is installed, and if not, there should be a BIOS setup option to choose between them. If you do not even see a boot screen, try plugging the card directly into the motherboard instead of using a riser.  Also double-check your card's PCI-e power connections.

Once you are successfully booting into the Ubuntu desktop, edit /etc/init/gpu-manager.conf to keep gpu manager from modifying /etc/X11/xorg.conf.  Then install the AMD fglrx video drivers: "sudo apt-get install fglrx -y".  If the fglrx drivers installed successfully, running "sudo aticonfig --lsa" will show your installed card.  Next, to set up your xorg.conf file, run "sudo rm /etc/X11/xorg.conf" and "sudo aticonfig --initial --adapter=all".

After rebooting, if the computer does not boot into the X11 desktop, ssh into the computer and verify that /etc/modprobe.d/fglrx-core.conf was created when the fglrx driver was installed.  This keeps Ubuntu from loading the open-source radeon drivers, which will conflict with the proprietary fglrx drivers.  For additional debugging, look at the /var/log/Xorg.0.log file.

Continue with installing the rest of your cards one at a time.  Re-initialize your xorg.conf each time, by executing "sudo rm /etc/X11/xorg.conf" and "sudo aticonfig --initial --adapter=all".  Reboot one more time, and then exeucte, "aticonfig --odgc --adapter=all".  This will display all the cards and their core/memory clocks.  If you are connecting remotely via ssh, you need to run, "export DISPLAY=:0" or you will get the "X needs to be running..." error.  You can use aticonfig to change the clock speeds on your card.  For example, "aticonfig --od-enable --adapter=2 --odsc 820,1500" will set card #2 to 820Mhz core and 1500Mhz memory (a good speed for most R9 380 cards).  To simplify setting clock speeds on different cards, I created a script which reads a list of card types and clock rates from a clocks.txt file.

Once your cards are installed and configured, you can use my ethminer build:
tar xzf ethminer-1.1.9nr-OCL.tgz
cd ethminer-nr

Once you've confirmed that ethminer is working, you can edit the script to use your own mining pool account.  If you want your rig to start mining automatically on boot-up, edit your .bashrc and add "cd ethminer-nr" and "./" to the end of the file.

Sunday, July 31, 2016

Improving Genoil's ethminer

In my last post about mining ethereum, I explained why I preferred Genoil's fork of the Ethereum Foundation's ethminer.  After that post, I started having stability problems with one of the newer releases of Genoil's miner.  I suspected the problem was likely deadlocks with mutexes that had been added to the code.  They had been added to reduce the chance of the miner submitting stale or invalid shares, but in this case the solution was worse than the problem, since there is no harm in submitting a small number of invalid shares to a pool.  After taking some time to review the code and discuss my ideas with the author, I decided to make some improvements.  The result is ethminer-nr.

A description of some of the changes can be found on the issues tracker for Genoil's miner, since I expect most of my changes to be merged upstream.  The first thing I did was remove the mutexes.  This does open the possibility of a rare race condition that could cause an invalid share submit when one thread processes a share from a GPU while another thread processes a new job from the pool.  On Linux the threads can be serialized using the taskset command to pin the process to a single CPU.  On a multi-CPU system, use "taskset 1 ./ethminer ..."  to pin the process to the first CPU.

As described in the issues tracker, I added per-GPU reporting of hash rate.  I also reduced the stats output to accepted (A) and rejected (R), including stales, since I have never seen a pool submit fail, and only some pools will report a rejected share.  The more compact output helps the stats still fit on a single line, even with hashrate reporting from multiple GPUs:
  m  13:28:46|ethminer  15099 24326 15099 =54525Khs A807+6:R0+0

To help detect when a pool connection has failed, instead of trying to manage timeouts in the code, I decided to rely on the TCP stack.  The first thing I did was enable TCP keepalives on the stratum connection to the pool.  If the pool server is alive but just didn't have any new jobs for a while, the socket connection will remain open.  If the network connection to the pool fails, there will be no keepalive response and the socket will be closed.  Since the default timeouts are rather long, I reduced them to make network failure detection faster:
sudo sysctl -w net.ipv4.tcp_keepalive_time=30
sudo sysctl -w net.ipv4.tcp_keepalive_intvl=5
sudo sysctl -w net.ipv4.tcp_keepalive_intvl=3

I wasn't certain if packets sent to the server will reset the keepalive timer, even if there is no response (even an ACK) from the server.  Therefore I also reduced the default TCP retransmission count to 5, so the pool connection will close after a packet is sent (i.e. share submit) 5 times without an acknowledgement.
sudo sysctl -w net.ipv4.tcp_retries2=5

I was also able to make a stand-alone linux binary.  Until now the Linux builds had made extensive use of shared libraries, so the binary could not be used without first installing several shared library dependencies like boost and json.  I had to do some of the build manually, so to make your own static linked binary you'll have to wait a few days for some updates to the cmake build scripts.  If you want to try it now anyway, you can add "-DETH_STATIC=1" to the cmake command line.

As for future improvements, since I've started learning OpenCL, I'm hoping to optimize the ethminer OpenCL kernel to improve hashing performance.  Look for something in late August or early September.

Sunday, July 17, 2016

Diving into the OpenCL deep end

Programs for mining on GPUs are usually written in OpenCL.  It's based on C, which I know well, so a few weeks ago I decided to try to improve some mining OpenCL code.  My intention was to both learn OpenCL and better understand mining algorithims.

I started with simple changes to the OpenCL code for Genoil's ethminer.  I then spent a lot of time reading GCN architecture and instruction set documents to understand how AMD GPUs run OpenCL code.  Since I recently started mining Sia, I took a look at the gominer kernel code, and thought I might be able to optimize the performance.  I tested with the AMD fglrx drivers under Ubuntu 14.04 (OpenGL version string: 4.5.13399) with a r9 290 card.

The first thing I tried was replacing the rotate code in the ror64 function to use amd_bitalign.  The bitalign instruction (v_alignbit_b32) can do a 32-bit rotate in a single cycle, much like the ARM barrel shifter.  I was surprised that the speed did not improve, which suggests the AMD OpenCL drivers are optimized to use the alignbit instruction.  What was worse was that the kernel would calculate incorrect hash values.  After double and triple-checking my code, I found a post indicating a bug with amd_bitalign when using values divisible by 8.  I then tried amd_bytealign, and that didn't work either.  I was able to confirm the bug when I found that a bitalign of 21 followed by 3 worked (albeit slower), while a single bitalign of 24 did not.

It would seem there is no reason to use the amd_bitalign any more.  Relying on the driver to optimize the code makes it portable to other platforms.  I couldn't find any documentation from AMD saying the bitalign and other media ops are deprected, but I did verify that the pragmas make no difference in kernel:
#pragma OPENCL EXTENSION cl_amd_media_ops : enable
#pragma OPENCL EXTENSION cl_amd_media_ops : disable

After finding a post stating the rotate() function is optimized to use alignbit, I tried changing the "ror64(x, y)" calls to "rotate(x, 64-y)".  The code functioned properly but was  actually slower.  By using AMD_OCL_BUILD_OPTIONS_APPEND=-save-temps, I was able to view the assember .isa files, and could tell that the calls to rotate with 64-bit values were using v_lshlrev_b32, v_lshrrev_b64, and v_or_b32 instead of a pair of v_alignbit_b32 instructions.  Besides using 1 additional instruction, the 64-bit shift instructions apparently take 2 or even 4 times longer to execute on some platforms.

In the end, I wasn't able to improve the kernel speed.  I think re-writing the kernel in GCN assembler is probably the best way to get the maximum hashing performance.

Monday, July 11, 2016

Mining Sia coin on Ubuntu

Sia is a hot crypto-currency for miners.  Just a week ago, the sia network hashrate was 6.5 Th/s, and the only way to mine was solo as there were no public pools.  In the last three days,, and started up and the network hashrate grew to 14.7 Th/s, with the two pools making up 80% of the total network hashrate.

Mining on Windows is relatively easy, with nanopool posting a binary build of siamining's gominer fork.  For Ubuntu, you need to build it from the source.  For that, you'll need to install go first.  If you type 'go' in Ubuntu 14.04, you'll get the following message:
The program 'go' is currently not installed. You can install it by typing:
apt-get install gccgo-go

I tried the similar package 'gccgo', which turned out to be a rabbit hole.  The version 1.4.2 referred to in the gominer readme is a version of the package 'golang'.  Neither gccgo-go or gccgo have the latest libraries needed my gominer.  And the most recent version of golang in the standard Ubuntu repositories is 1.3.3.  However the Ethereum foundation publishes a 1.5.1 build of golang in their ppa.

Even with the golang 1.5.1, building gominer wasn't as simple as "go get".  The reason is that the gominer modifications to support pooled mining are in the "poolmod3" branch, and there is no option to install directly from a branch.  So I made my own fork of the poolmod3 branch, and added detailed install instructions for Ubuntu:
add-apt-repository -y ppa:ethereum/ethereum
sudo apt-get update
apt-get install -y git ocl-icd-libopencl1 opencl-headers golang
go get
Once I got it running on a single GPU, I wanted to find out if it was worthwhile to switch my eth mining rigs to sia.  I couldn't find a good sia mining calculator, so I pieced together some information about mining rewards and used the Sia Pulse calculator.  I wanted to compare a single R9 290 clocked at 1050/1125, which gets about 29Mh/s mining eth, earning $2.17/day.  For Sia, the R9 290 gets about 1100Mh, which if you put that into the Sia Pulse calculator along with the current difficulty of 4740Th, it will calculate daily earnings of 6015 SC/day.  Multiplying by the 62c/1000SC shown on will give you a total of $3.73/d, but that will be wrong.  The Sia Pulse calculator defaults to a block reward of 300,000, but that goes down by 1 for each block.  So at block 59,900, the block reward is 240,100. and the actual earnings would be $2.99/d.

Since the earnings are almost 40% better than eth, I decided to switch my mining rigs from eth to sia.  I had to adjust the overclocking settings, as sia is a compute-intensive algorithm instead of a memory-intensive algorithm like ethereum.  After reducing the core clock of a couple cards from 1050 to 1025, the rigs were stable.  When trying out nanopool, I was getting a lot of "ERROR fetching work;" and "Error submitting solution - Share rejected" messages.  I think their servers may have been getting overloaded, as it worked fine when I switched to  I also find has more detailed stats, in particular % of rejected shares (all below 0.5% for me).

I may end up switching back to eth in the near future, since a doubling in network hashare for sia will eventually mean a doubling of the difficulty, cutting the amount of sia mined in half.  In the process I'll at least have learned a bit about golang, and I can easily switch between eth and sia when one is more profitable than the other.

Friday, June 3, 2016

When does 18 = 26? When buying cheap cables.

I recently bought some cheap molex to PCI-e power adapters from a seller on AliExpress.  Although there are deals for quality goods on AliExpress, I was a bit suspicious when I ordered these given just how cheap they were.  PCI-e power connectors are supposed to be rated for 75W of power carried over 2 conductors at 12V, which means 3.1A per conductor.  In order to avoid a large voltage drop the wires used are usually 18AWG, although 20AWG wires (with 1.6x the resistance) would be reasonably safe.

When the package arrived, I inspected the adapter cables, which were labeled 18AWG.  Despite the label, they didn't feel like 18AWG wires, which have a conductor diameter of 1mm.  I decided to do a destructive test on one of the adapters by cutting and stripping one of the wires.  The conductor measured only 0.4mm in diameter, which is actually 26AWG.  The first photo above shows a real 18AWG wire taken from an old ATX PSU next to the fake 18AWG wire from the adapter cables.

When I opened a dispute through AliExpress, things got more amusing.  I provided the photo, as well as an explanation that real 18AWG wire should be 1mm in diameter.  The seller claimed "we never heard of this before", and after exchanging a couple more messages said, "you can't say it is fake just because it is thin".  At that point I realized I was dealing with one of those "you can't fix stupid" situations.

So what would happen if I actually tried to use the adapter cables on a video card that pulls 75W on the PCI-e power connector?  Well you can find posts on overclocking sites about cables that melted and burst into flames.  If you have a cheap PSU without short-circuit protection, when the insulation melts and the wires short, your power supply could be destroyed.  And if that happend I'm sure the AliExpress seller is not going to replace your power supply.  How much hotter the cables would get compared to genuine 18AWG cables is a function of the resistance.  Each gauge has 1.26 times more resistance than the previous, so 20AWG has 1.26^2 = 1.59 times the resistance of 18AWG.  The 26AWG wire used in these cheap adapter cables would have 1.26^8 or just over 6 times the resistance of 18AWG wire, and would have a temperature increase 6 times greater than 18AWG for a given level of current.

It could make for a fun future project; create a resistive load of 75W, take an old ATX PSU, hook up the adapter cables, and see what happens.  People do seem to like pictures and videos of things bursting into flames posted on the internet...

Thursday, May 26, 2016

Installing Python 3.5.1 on Linux

Perl has been my go-to interpreted language for over 20 years now, but in the last few years I've been learning (and liking) python.  Python 2.7 is a standard part of of Linux distributions, and while many recent distributions include Python 3.4, Python 3.5.1 is not so common.  I'm working on some code that will use the new async and await primitives, which are new in Python 3.5.  I've searched Extra Packages for Enterprise Linux and other repositories for Python 3.5 binaries, but the latest I can find is 3.4.  That means I have to build it from source.

While the installation process isn't very complicated, it does require installing gcc and associated build tools first.  Since I'm installing it on a couple servers (devel and prod), I wrote a short (10-line) install script for rpm-based Linux distributions.  Download the script, then run "sh".  The python3.5 binary will be installed in /usr/local/bin/.

When installing pip packages for python3, use "pip3", while "pip" will install python2 packages.  And speaking of pip, you may want to update it to the latest version:
sudo /usr/local/bin/pip3 install --upgrade pip

Friday, April 22, 2016

More about mining

In my last post, I gave a basic introduction to ethereum mining.  Since there is not much information available about eth mining compared to bitcoin mining, and some of the information I have found is even wrong, I decided to go into more detail on eth mining.

Comparing the bitcoin protocol to ethereum, one of the significant differences is the concept of uncle blocks.  When two miners find a block at almost the same time, only one of them can be the next block in the chain, and the other will be an uncle.  They are equivalent to stale blocks in bitcoin, but unlike bitcoin where the stale blocks go unrewarded, uncle blocks are rewarded based on how "fresh" they are, with the highest reward being 4.375 eth.  An example of this can be found in block 1,378,035. Each additional generation that passes (i.e. each increment of the block count) before an uncle block gets included reduces the reward by .625 eth.  An example of an uncle that was 2 generations late getting included in the blockchain can be found in block 1,378,048.  The miner including the uncle in their block gets a bonus of .15625 eth on top of the normal 5 eth block reward.

Based on the current trend, I expect the uncle rate to be in the 6-7% range over the next few months.  With the average uncle reward being around 3.5 eth (most uncles are more than one generation old), uncles provide a bonus income to miners of about 4%.  Since uncles do not factor into ethereum's difficulty formula, when more uncles are mined the difficulty does not increase.  The mining calculators I've looked at don't factor in uncle rewards, so real-world returns from mining in an optimal setup should be slightly higher than the estimates of the mining calculators.

Another thing the calculators do not factor is the .15625 eth uncle inclusion reward, but this is rather insignificant, and most pools do not share the uncle inclusion reward.  Assuming a 6% uncle rate, the uncle inclusion reward increases mining returns by less than 0.2%.  If your pool is down or otherwise unavailable for 3 minutes of the day, that would be a 0.21% loss in mining rewards.  So a stable pool with good network connections is more important than a pool that shares the uncle inclusion reward.  Transaction fees are also another source of mining revenue, but most pools do not share them, and they amount to even less than the uncle inclusion reward in any case.

Finding a good pool for ethereum mining has been much more difficult than bitcoin, where it is pretty hard to beat Antpool.  For optimal mining returns, you need to use stratum mode, and there are two main variations of the stratum protocol for eth mining; dwarf and coinotron.  Coinotron's stratum protocol is directly supported by Genoil's ethminer, which avoids the need to run eth-proxy in addition to the miner. and support coinotron's stratum protocol, while nanopool, f2pool, and mininpoolhub support dwarf's protocol.  Miningpoolhub is able to support both on the same port since the json connection string is different. and coinotron only have servers in Europe, and half the time I've tried to go to coinotron's web site it doesn't even load after 15 seconds.  Miningpoolhub has servers in the US, Europe, and Asia, and has had reasonable uptimes.  As well, the admin responds adequately to issues, and speaks functional english.  They have a status page that shows enough information to be able to confirm that your mining connection to the pool is working properly.  I have a concern over how the pool reports rejected shares, but the impact on mining returns does not appear to be material.  Rejected shares happens on other pools too, and since I am still investigating what is happening with rejected shares, there is not much useful information I can provide about it.

So for now my recommended pool is   My recommended mining progam is v1.0.7 of Genoil's ethminer, which added support for stratum connection failover where it can connect to a secondary pool server if the first goes down.  The Ethereum Foundation is supporting the development of open-source mining pool software, so we may see an ideal eth mining pool in the near future, and maybe even improvements to the official ethminer supporting stratum protocol.

Saturday, April 16, 2016

Digging into ethereum mining

After bitcoin, ethereum (eth) has the highest market capitalization of any cryptocurrency.  Unlike bitcoin, there are no plug-and-play mining options for ethereum.  As was done in the early days of bitcoin, ethereum mining is done with GPUs (primarliy AMD) that are typically used for video gaming.

The first ethereum mining I did was with a AMD R9 280x card using the ethereum foundation's ethminer program under Windows 7e/64.  The installer advised that I should use a previous version of AMD's Catalyst drivers, specifically 15.7.1.  Although the AMD catalyst utilities show some information about the installed graphics card, I like GPU-z as it provides more details.  After setting up the software and drivers, I started mining using dwarfpool since it was the largest ethereum mining pool.

As an "open" pool, dwarf does not require setting up an account in advance.  One potential problem with that is the eth wallet address used for mining does not get validated.  I found this out because I had accidentally used a bitcoin wallet address, and dwarfpool accepted it.  After fixing it, I emailed the admin and had the account balance transferred to my eth wallet.

Dwarf recommends the use of their eth-proxy program, which proxies between the get-work protocol used by ethminer, and the more efficient stratum protocol which is also supported by dwarfpool.  Even using eth-proxy, I wasn't earning as much ethereum as I expected.

The ethereum network is running the homestead release as of 2016/03/14, which replaced the beta release called frontier.  The biggest change in homestead was the reduction in the average block time from 17 seconds to 14.5 seconds, moving half way to the ultimate target of a 12-second block time.  I wasn't sure if the difference in the results I was getting from mining was due to the calculators not having been updated from frontier or some other reason.  After reading a comment in the ethereum mining forum, I realized returns can be calculated with a bit of basic math.

The block reward in ethereum is 5 eth, and with an average block generation time of 14.5 seconds, there is 86400/14.5 * 5 = 29793 eth mined per day.  Ethereum blockchain statistics sites like report the network hash rate which is currently around 2,000 gigahashes per second.  A R9 280x card does about 20 megahashes per second, or 1/100,000th of the network hashrate, and therefore should earn about 29,793/100,000 or 0.298 eth per day.  The manual calculations are in line with my favorite eth mining calculator (although it can be a bit slow loading at times).  Due to the probabilistic nature of mining, returns will vary by 5-10% up or down each day, but in less than a week you can tell if your mining is working optimally.

Using the regular ethminer, or even using eth-proxy, I was unable to get pool returns in line with the calculations.  However using Genoil's ethminer, which natively supports the stratum protocol, I have been able to get the expected earnings from  Dwarf uses an unsupported variation of the stratum protocol, so I could not use Genoil's ethminer with it.  I briefly tried nanopool, but had periods where the pool stopped sending work for several minutes, even though the connection to the pool was still live.

Both the official ethminer and Genoil's version were built using MS Visual C++, so if your system doesn't already have it installed, you'll need MS Visual Studio redistributable files.  Getting the right version of the AMD Windows catalyst drivers for ethminer to work and work well can be problematic.  Version 15.12 works at almost the same speed as 15.7.1, however the crimson version 16 drivers perform about 20% slower.

For me, as a Linux user for over 20 years, the easiest setup for eth mining was with Linux/Ubuntu.  I plan to do another post about mining on Ubuntu.

Sunday, March 27, 2016

Hacking GPU PCIe power connections

Until recently, I never thought much about PCIe power connectors.  Three 12 power and three ground wires was all I thought there was to them.  I thought it was odd that the 8-pin connectors just added two more ground pins and not another power pin, but never bothered to look into it.  That all changed when I got a new GPU card with a single 8-pin connector.

My old card had two 6-pin connectors, which I had plugged a 1-2 18AWG splitter cable into.  That was connected to a 16AWG PCIe power cable, which is good for about 200W at a drop of under 0.1V.  My new card with the single 8-pin connector wouldn't power up with just a 6-pin plug installed.  Using my multi-meter to test for continuity between the pins, I realized that it's not just a row of 12V pins and a row of ground pins.  There was continuity between the three 12V pins, and between three of what I thought were five ground pins.  After searching for the PCIe power connector pinout, I found out why.
Diagram edited from

Apparently some 6-pin PCIe cables only have 2 12V wires, 2 ground, and a grounded sense wire (blue in the diagram above).  With just two 12V wires, a crap 18" 20AWG PCIe power cable would have a drop of over 0.1V at 75W.  Since the 8-pin connector has three 12V pins, it can provide 50% more power.  My 6-pin 16AWG PCIe cable would have voltage drop of only 40mV at 75W, so I just needed to figure out a way to trick the GPU card into thinking I had an 8-pin connector plugged in.  The way to do that is ground the 2nd sense pin (green in diagram above).

I didn't want the modification to be permanent, so soldering a wire to the sense pin was out.  The PCIe power connectors use the same kind of pins as ATX power connectors, and I had an old ATX power connector I had cut from a dead PSU.  To get one of the female contacts out of the ATX connector, I used a hack saw to cut apart the ATX connector.  Not pretty, but I'm no maker, I'm a hacker. :-)  I stripped the end of the wire (red in the first photo), wrapping the bare part of the wire around the screw that holds the card bracket in the case.  I powered up the computer, and the video card worked perfectly.

Looking for a cleaner solution, I decided to make a jumper wire to go between the sense pin and the adjacent ground.  I also did some searching on better ways to remove the female contacts from the connectors.  For this, has a good technique using staples.  When the staples aren't enough to get the contacts out, I found a finish nail counter-sink punch helps.

Here's the end result, using a marrette (wire nut) to make the jumper:

See my related post Powering GPU Mining Rigs.

Wednesday, January 20, 2016

LED low power limbo: light below 1uA

Anyone reading this blog has likely noticed how LED efficiency has significantly improved in the last decade.  If you follow the old rule-of-thumb and use a 330-Ohm series resistor to power a modern LED from a 5V supply, looking directly at the LED will leave a dot floating in your vision for a few minutes like a camera flash.  Series resistors for 0603 SMD LEDs like those on the Baite Pro Mini board are often around 1K-Ohm, and even then I find them too bright.  Even when powered through a MCUs ~40K-Ohm pull-up resistor, I find LEDs clearly visible.  This got me wondering, how low can you go?

I started with a cheap (<$2 for a bag of 100) 5mm blue LED with a 470K resistor, powered from a 3.3V supply.  The room was lit with 2 800 lumen CFL bulbs, and the LED was still clearly visible.  The voltage across the resistor (measured with a meter that has 10M input impedance) was 856mV, so solving for I in V=IR means that the current was only 1.8uA.  The next step was to try 2 470K resistors, and although it was dim, it was still clearly visible, especially when looking directly into the LED.  In the photo above the LED looks brighter than it does with the naked eye due to the light sensitivity of the camera being different than the human eye.

The largest resistors I have in my collection are 1M-Ohm, and I really wanted to try at least 4.7M-Ohm.  Without an obvious solution, I put my breadboard away in a drawer for a few days.  My inspiration came when I thought of my TL431A voltage references.  With a 270Ohm resistor and a TL431 I made a simple low-current 2.5V supply.  With the 2.5V supply and a 470K series resistor, the LED was still visible!  The current through the LED was about 0.2uA, and the voltage drop across the LED, which normally is around 3V with 20mA of current, was slightly above 2.3V.  I found that the LED was brighter when I didn't look directly at it, which is to be expected with dim objects since the most light sensitive cells in our retina (the rods) are located outside the center of our vision.  After adding the 2nd 470K resistor back into the circuit, with just 144nA of current, the LED was still faintly perceptible.

Rather than digging out the 1M resistors, I grabbed a 1n4148 diode and connected it in place of one of the 470K resistors.  If you didn't read my diodes, diodes everywhere post, you might think a 1n4148 would drop about 0.6V from the supply, leaving too little to light the LED.  But with all diodes, the lower the current, the lower the voltage drop.  With the 1n4148 and a single 470K resistor, the blue LED was no long obviously visible (sometimes I thought I noticed some blue out of the corner of my eye), and the voltage drop across the 1n4148 was just 157mV.  The current through the LED now 103nA.  With my camera pointed directly at the LED and the room lights still on, I could still clearly see blue light.  To be sure I cycled the power on the circuit a few times, and the blue light came on and off as expected.

Since even lithium coin cell batteries that you might use for a low-power project have an internall leakage current in the hundreds of nano-amperes, I had reached the end of the practical application of the experiment.  But like any good hacker (hat tip to Jamie and Adam), I wanted to see how far I could take the experiment.  A human is able to detect when a few dozen photons enter their dark-adjusted eye.  The next step was pretty simple; turn off the lights.

Once I turned off the lights, the blue LED was again visible.  My next step was adding the 1n4148 back into the circuit, along with the 2 470K resistors(after turning the room lights back on).  At this point the current was only 17nA, and I was questioning whether I would be able to see anything, even after my eyes adjusted to the dark.  I turned out the lights and went to bed.

While in bed I wondered how much light, quantitatively, was being emitted by the LED.  The light is the result of electromagnetic waves emitted by atoms in the LED gets stimulated by an electron.  Super-high efficiency LEDs can supposedly emit 1 quantum of light (i.e. a photon) per 3 electrons.  My cheap LEDs are nowhere near as efficient, perhaps emitting 1 quantum of light for every 300 electrons.  Dust off the old physics texts, and you can figure out that 100nA of current is about 600 billion electrons per second, and 17nA is a bit more than 100 billion electrons per second.  The chance of seeing light at 17nA was seeming more likely.

After about 10 minutes in bed letting my eyes adjust to the dark, I got up to look at the test circuit.  With only the faint green glow of the ethernet link lights from my router a couple meters away, I stared at the blue LED.  I thought I could make out a fuzzy ball of light when I got very close (~10cm) away from the LED.  I cycled the power on the circuit a couple times, and sure enough the light came and went.

Getting back to the practical applications of this experiment, think of a wireless sensor running on a CR2032 coin cell.  Using the internal 35K Ohm pull-up on an AVR MCU to power a blue indicator LED will use 10-15uA of current while still making the LED easily visible.  Blinking the led for 100ms out of every 5s will consume an average of just 300nA, while making a useful heartbeat indicator.

Thursday, January 14, 2016

Lessons in buying Bitcoin

While bitcoin is far from mainstream, with it making headlines like Mark Zuckerberg's nemesis twins Tyler and Cameron launching Gemini, I figured I'd learn how to use bitcoin.  Aside from nerds using it to tip on reddit and github, bitcoin doesn't have much practical use.  Personally my interest is primarily educational, so if any bitcoin related business opportunities arise in the future, I may be able to capitalize on them.

With their logo at the start of this post, you can probably guess that I recommend coinbase for Canadians and Americans looking to buy small amounts of bitcoin (under US$500 worth of bitcoin).  This recommendation comes after I've looked at the offerings from several bitcoin exchanges including bitstamp, bitfinex, Kraken,, and Canadian exchange QuadrigaCX.  I registered for an account at bitstamp and coinbase, and traded bitcoin on the latter.

Bitstamp supports funding (sending money to bitstamp so you can buy bitcoin) from Canadian bank accounts.  Any funds are converted to USD, but finding out the exchange rate takes some work.  I emailed bitstamp on Christmas eve asking for their foreign exchange fees, and received a reply back on the 28th:
to view our exchange rates, please see the following link and click on the "Corporate exchange rates" for the correct rates: .Please note that all currencies are converted to USD free of charge by our bank.
I checked the exchange rates, and found that their bank adds about 0.6% to the spot rate for CAD/USD.  On top of that you'd have to add their trading fee of 0.1% for a limit order or 0.2% for a market order.  Adding that to their $1 minimum e-check fee means that the total fees to buy $100 in bitcoin would be $1.70.  That's reasonable compared to most other exchanges, but you'll have to pass their account verification first.  Despite providing a 300dpi high-quality jpeg scan of my driver's license, my account verification request was denied with the message, "the quality of the image/scan cannot be accepted according to UK AML standards."  In other words, no bitstamp for me!

For coinbase I couldn't completely figure out their fees until I actually set up an account.  Some of their support pages refer to 0% maker and 0.25% taker fees on their exchange.  Other support pages will refer to a 1% fee for buying bitcoin.  In the end I figured out both are correct, since there are two ways to buy and sell bitcoin.

With coinbase, unlike bitstamp, you can fund your account without ID verification.  You will need a cell phone with a US or Canadian number for a basic residency check.  Once your account is setup and you choose to deposit funds to your "CAD wallet", you are presented with the following options:

When I chose "Deposit with Interac", I wasn't able to proceed and was given an error message that I need to verify my account.  A good account interface design wouldn't have presented the option, and instead would have it greyed out with a note that the option is available after ID verification.  To use the bank account, you need to provide your bank, transit, and account number.  After a day or two you'll see a small deposit to your account (for me it was under 50c).  You need to log into your online banking to see the amount of the deposit, and then enter the exact amount in your coinbase account to link with your bank account.  Once that is done, you can take funds from your bank account, and after a few days the funds will become available for buying bitcoin.

Being a bit impatient, I decided to provide ID verification so I could use Interac Online in order to get funds instantly into my account.  A few minutes after uploading same jpeg file of my license that bitstamp refused to accept, I got an email stating my identity has been verified.  In addition to making the Interac Online available, verifying my account increased my limits from $500/day to $3000/day (not that the $500 limit was a problem for me).  However I still couldn't use Interac Online because, "Interac online is not available for visa debit card holders."  If you have a relatively new bank card (issued in the last couple years) with the Visa debit logo in the corner, you're out of luck.

My wife's account, however, doesn't have the Visa debit logo.  After noticing this, and reading about the referral program that gives $10 in BTC for the new account and for my account, I set up another account for my wife.  Now that I knew how coinbase worked, the process was a lot quicker.  I helped he set up the account, and had her account verified with a copy of her license.  Then I used Interac Online to withdraw C$149 from her account, leaving C$148 in her CAD wallet after the $1 fee was deducted.  I then chose to buy bitcoin, entered $148, which left $146.52 after the 1% fee, and completed the transaction at the quoted exchange rate (about C$620/BTC).  A few minutes later I got an email about my invitation bonus, and at the same time an additional US$10 worth of BTC showed up in my wife's account.

So what about those 0% maker and 0.25% taker fees?  For that you need to use the coinbase exchange, which you can do by clicking on "exchange" from your account, and then click on log in with coinbase.  The interface is similar to a discount stock broker, with a list of bid and ask prices along with a calculation of the current spread.  With a Canadian account you can only trade CAD/BTC, but you can view the USD/BTC order book.  The spreads on the USD/BTC exchange are usually only 1-2c, while spreads of $1-$2 are common for CAD/BTC.   Because of that I was able to get better prices than I could if I was trading USD/BTC.  I tried a couple small (<0.1BTC) limit sell orders, with a price a few cents below the lowest listed sell order price.  The first one filled in about 10 minutes, and the second filled in less than a minute.  Both, as expected, had no fees as "maker" orders.

As long as the invitation program continues, the net fees to get C$160 in bitcoin is actually negative.  After paying $2.50 in fees to buy $146 in bitcoin, you'll get US$10(C$14) in bitcoin as a bonus.  Probably not worth the trouble for most people, but certainly worth it for the nerds and geeks that want to give out bitcoin tips.

Monday, January 11, 2016

ESP8266 reset and booting

When I got my first esp-01 module, I found there was a lot of mis-information about how to reset and flash the modules.  For the esp-01, pulling GPIO0 low while pulsing reset low is the easy and common way to get it to enter the bootloader for flashing.  When using modules like the ESP-12 that have most of the available pins broken out, things get a little more complicated.

I noticed the issue when I tried wiring a module for deep sleep mode by connecting GPIO16 to RST.  In deep sleep mode only the RTC is running, and instead of an internal software wake-up timer like some other MCUs, the esp8266 has to be woken up by a pulse on RST generated by the RTC on GPIO16.  With the GPIO16 to RST connection, I had difficulty resetting the module with a 270 Ohm resistor to ground.  The reason is that GPIO16 is not a pull-up, it is in output high mode.  The solution is to reboot the module using the EN pin (also labeled CH_PD on some modules) instead of RST.  The difference between RST and EN is that bringing EN low powers down the whole chip including the RTC.

I did some additional testing of the chip boot-up by holding RST low and EN high.  While in reset, the chip keeps GPIO0, GPIO2, and GPIO15 high with an internal 33K pull-up to Vcc.  If reset is then released, the chip will boot in SDIO mode, which is really only good for hooking it to a raspberry pi.  ESP-01 modules have GPIO15 connected to ground, but for modules like the ESP-12 it is necessary to pull GPIO15 low during boot.

After booting up, I checked the default state of most of the pins.  TX0 is high, which is the UART idle state, while RX0 has a weak pull-up so that it is not floating.  GPIO0 has a weak pull-up, GPIO2 is high, GPIO4 is low, GPIO5 is high, and GPIO15 stays low, even with a pull-down resistor.

With ESP-01 modules, I was using my zero-wire auto-reset circuit with DTR connected to GPIO0 in order to flash the modules.  For projects that were using GPIO0, this meant having to disconnect the DTR line after flashing.  I'm now working on a circuit that doesn't require DTR, and will hold GPIO0 and GPIO15 low for a short while after the module is rebooted by toggling EN.

I haven't found any information that indicates how long GPIO0 and GPIO15 need to be held low in order to enter the bootloader, so I'll need to experiment with the resistor and capacitor values.  Using a 15K resistor and 0.1uF capacitor keep GPIO0 and GPIO15 low for about 1ms after startup.

Sunday, January 3, 2016

A 3.6V LiFePO4 charger for under 50c

I like LiFePO4 batteries.  They have a rather flat discharge at around 3.2V, which is ideal for powering 3.3V devices without a regulator.  You can also use them in devices that take 2 standard AA cells by using a blank shunt in the 2nd battery slot since 2 fresh alkaline cells in series provide 3.2-3.3V.  And since they are readily available in the 14x50mm AA size, you can use cheap AA holders for them in electronics projects.

When it comes to chargers, things can be a bit problematic.  LiFePO4 batteries should be charged to 3.6V, rather than 4.2V like regular lithium-ion batteries.  A good charger costs $10-$15, but charging at a high current will reduce the number of recharge cycles.  The Soshine batteries I bought indicate on the label a standard charging current of 300mA to 3.6V.  Rather than search for a charger to fit the bill, I decided to make one.

From my experiments with TL431 regulators, I remembered the regulator circuit above.  My idea was to take a 5V USB supply, and regulate it to 3.6V for charging the LiFePO4 cells.  For the NPN transistor, I wanted to use something more powerful than the usual 2N3904 which has a collector current rating of 200mA.  I found some old PN2222a transistors which are rated for 600mA, more than enough for the 300mA I needed.  Another thing to keep in mind when choosing a transistor is power dissipation.  TO92 packages are good for about 500mW before they get hot enough to burn flesh.  With the voltage drop across the transistor around 1.45V (5.05V - 3.6V), up to 350mA should be fine with a TO92 transistor.

Having chosen the transistor, I needed to figure out R1 and R2 such that I would get the desired 3.6V at Vout.  With Vref = 2.5, R1/R2 = 0.44, I needed to find 2 resistors close to that ratio.  After sorting through my resistor collection, I found some 1.2K and 2.7K resistors, which gives a ratio of 0.444.  The capacitor between the reference and cathode of the tl431 can be omitted, so the only other component I needed to figure out was unlabeled resistor between the transistor collector and base.  The amount of base current required will depend on the gain of the transistor.  With 300mA of collector current  the gain could be anywhere from 15 to 50.  I had some 68Ohm resistors handy which I decided to try.  Vbe of the transistor is around 0.7V, so with V+ at 5.05 and Vout at 3.6, using a 68Ohm resistor would give me a base current of 11mA.  I tested it with a partially discharged battery, and measured 290-300mA!  Things weren't quite perfect though, as after 10 minutes the voltage passed 3.6, and kept rising toward 4V.  I disconnected the power, and after some debugging I realized I had R1 and R2 mixed up.  I discovered this by measuring the reference voltage, and finding it was around 1.3V when Vout was 4.2.  After swapping the resistors, with no battery connected I was getting 3.61 for Vout and 2.5V on the reference - just what I wanted.

One other word of warning - be careful using the cheap little 170-hole breadboards for projects with more than 20-30mA of power.  I had started with one of them until I remembered the problematic high-resistance contacts on them.

Before getting out the soldering iron and making the circuit permanent, I wanted a way to tell when charging was complete.  After toying with a couple ideas, my final solution is a 2V green LED in series with the TL431 cathode.  When Vout reaches 3.6, the tl431 shunts almost all of the base current (a small base current is still required due to the ~1mA of leakage through R1 + R2).  When I tested it out, I found an unexpected benefit of the LED glowing dimly as charging starts, and then turning bright when charging completes.  This can be explained by looking at the cathode current graph from the datasheet:

With a fully-discharged battery, Vout will start at around 3V, and Vref will be about 2.1V.  The cathode current will be about 250uA, and since the LED is in series with the cathode, that will be the LED current as well.  This is enough to make a dim but visible glow.  When Vout reaches 3.6V, the tl431 will shunt most of the 11mA from the transistor base, making the LED glow brightly.

The last thing I tried while I had everything in the breadboard was substituting at 2N3904 for the PN2222a.  Suprisingly, I was able to get 200mA of charging current.  So if all you have are 3904s, you can still get a reasonable amount of current.  You could probably even use 2 3904s in parallel for close to 400mA of current.

For the permanent version of the circuit I used a small 7x4 piece of stripboard.  By planning out the layout on paper first, I was able to use 7 strips without any breaks.  I got a little too rushed doing the assembly and put the tl431 in the wrong way around (cathode and reference swapped).  After fixing it, I tested for continuity on all my solder connections. One needed to be re-worked, and with a bit of flux, a dab more solder, everything was good.  After some hot glue to attach the battery holder, I plugged it into a USB power port, and it worked perfectly.

Here's my final BOM:
AA battery holder: 15c
1/12th of a stripboard: 6c
USB male connector: 22c ( or 10c from other sources)
Old pn2222a: 5c (replacement cost; paid $3 for a package of 15 at RadioShack almost 30 yrs ago)
TL431A: 1.5c
resistors: 3c
3mm green led: 1c

Satisfaction of building your own battery charger: priceless :-)

2015/01/05 update

I tried doing the stripboard layout in fritzing, but it doesn't have a tl431, and I couldn't get a good view given the perspective it uses.  Instead I tried drawing my layout plan in a more readable form: