Wednesday, May 10, 2017

GDDR5 memory timing details



In my Advanced Tonga BIOS editing post, I discussed some basic memory timing information, but did not get into the details.  GDDR5 memory is much more complex than the asynchronous DRAM of 20 years ago.  There are many sources of information on SDRAM, while GDDR information is harder to come by.  Although a thorough description of GDDR5 can be found in the spec published by JEDEC, neither nVIDIA nor AMD share information on how their memory controllers are programmed with memory timing information.  By analyzing the AMD video driver source, and with help from people contributing to a discussion on bitcointalk, I have come to understand most of the workings of AMD BIOS timing straps.

When a modern (R9 series and Rx series) AMD GPU card boots up, memory timing information (straps) are copied from the BIOS to registers in the memory controller.  Some timing information such as refresh frequency is not dependent on the memory speed and therefore is not contained in the memory strap table, but much of the important timing information is.  The memory controller registers are 32-bits wide, and so the 48-byte memory straps map to 12 different memory controller registers.  The shift masks in the Linux driver source are therefore non-functional, and can only be taken as hints as to the meaning of the individual bits.  Due to an apparently bureaucratic process for releasing open-source code, AMD engineers are generally reluctant to update such code.

Jumping right to the code, here's a C structure definition for the Rx memory straps:
SEQ_WR_CTL_D1_FORMAT SEQ_WR_CTL_D1;
SEQ_WR_CTL_2_FORMAT SEQ_WR_CTL_2;
SEQ_PMG_TIMING_FORMAT SEQ_PMG_TIMING;
SEQ_RAS_TIMING_FORMAT SEQ_RAS_TIMING;
SEQ_CAS_TIMING_FORMAT SEQ_CAS_TIMING;
SEQ_MISC_TIMING_FORMAT SEQ_MISC_TIMING;
SEQ_MISC_TIMING2_FORMAT SEQ_MISC_TIMING2;
uint32_t SEQ_MISC1;
uint32_t SEQ_MISC3;
uint32_t SEQ_MISC8;
ARB_DRAM_TIMING_FORMAT ARB_DRAM_TIMING;
ARB_DRAM_TIMING2_FORMAT ARB_DRAM_TIMING2;

Looking at the RAS timing, it consists of 6 fields: RCDW, RCDWA, RCDR, RCDRA, RRD, and RC.  The full field definitions can be found in my fork of Kristy-Leigh's code.  Many of the "pad" fields are likely the high bits of the preceding field that are not currently used.  I tested a couple pad fields already (MISC RP_RDA & RP), confirming that the pad bits were actually the high bits of the fields.


For GDDR5, some timing values have both Long and Short versions that apply for access within a bank group or to different bank groups.  The RRD field of RAS timing is likely RRDL, because the values typically seen for this field are 5 and 6.  If RRDS was 5, this would mean at most one page could be opened every five cycles, limiting 32-byte random read performance to 2/5 or 40% of the maximum interface speed.  From my work with Ethereum mining, I know that RRDS can be no more than 4.  In addition, performance tests with RRD timing reduced to 5 from 6 are congruent with it being RRDL.  The actual value of RRDS used by the memory controller does not seem to be contained in the timing strap.  The default 1750Mhz strap for Samsung K4G4 memory has a value of 10 for FAW, which can be no more than 4 * RRDS.  Therefore RRDS is most likely less than 4, and possibly as low as 2.

To simplify the process of modifying memory straps for improved performance, I wrote strapmod.  I also wrote a cgi wrapper for the program, which you can run from my server http://45.62.227.192/cgi-bin/strapmod.  For example, this is the output with the 1750Mhz strap for Samsung K4G4 memory:
Rx strap detected
Old, new RRD: 6 , 5
Old, new FAW: A , 0
Old, new 32AW: 7 , 0
Old, new ACTRD: 19 , 0x10
777000000000000022CC1C0010626C49D0571016B50BD509004AE700140514207A8900A003000000191131399D2C3617
777000000000000022CC1C0010625C49D0571016B50BD50900400700140514207A8900A003000000101131399D2C3617

34 comments:

  1. Good work, Ralph! How much of a boost did you see using the customized timings vs copying the 1500Mhz timings?

    ReplyDelete
    Replies
    1. The benefit depends on the type and the base strap used (I wasn't using the 1500Mhz strap though). The biggest benefit is with Rx cards running high memory clocks. For R9 (i.e. Tonga) running Hynix memory at 1625 with the 1375 strap, it doesn't need much tuning since it already has tight values for RRD and FAW. Elpida 1375 isn't as tight, so my strapmod utility can help.

      Delete
  2. Wow! It works great on Samsung memory of MSI Armor RX 470 series. 0.7 to 0.9 Mhash increase with your timings.

    ReplyDelete
  3. Truly amazing: I went from 28.4Mh/s to 31.7Mh/s using it. Great job.

    ReplyDelete
  4. Is there any disadvantage for customizing the straps with your tool? I have not found any disadvantage but I always OC first and undervolt with stock straps, I have tried to understand how this affect my testing with no luck ( I am still reading), i was thinking just to just customize my straps with your tool and then test undervolting and OC, It's incredible the quantity of time I spend running experiments just to understand how it works I already have very descent results, but I have a necessity of testing . Thanks in advance.

    ReplyDelete
    Replies
    1. I suppose using the custom straps could cause stability problems, but I did a lot of testing on Tonga and Polaris to find the tweaks that improve performance without impacting stability. A couple times I came close to bricking a card. To play it safe, always test custom straps above the boot-up strap. So if your BIOS memory clock is 1750, just change the strap after 1750 like 2000, and the strap will only get used when you overclock the memory beyond 1750.

      Delete
  5. By the way I am fan of your blog, It's very refreshing.

    ReplyDelete
  6. nice, good job!
    btw the site is offline :(

    ReplyDelete
    Replies
    1. My virtualhost provider changed service terms on me and suspended my service. I'm working on getting it back online.

      Delete
  7. Great job!
    Does it works only with Samsung memory?
    Not with Elpida, Hynix etc?

    ReplyDelete
    Replies
    1. I've tested strapmod with Hynix and Samsung memory on Rx cards. With R9 cards I've tested it with Hynix and Elpida memory.

      Delete
    2. So, this technics works for each type of memory... hmm, will try right now. Have 3 cards with Elpida.

      Delete
  8. Thanks so much for the detailed analysis and work around it!

    Does it work for 4gb cards as well? I noticed the reference values are different for the 4 and 8 gb versions.

    ReplyDelete
  9. Do you have the Hynix Memory straps? I can't seem to get your files anymore..

    ReplyDelete
  10. Hi Ralph,

    Your straps does increase reported hashrate by the mining program but the effective hashrate is much lower, thus I suspect these timings are too tight and as a result generate high stale shares number hich explains why I'm seeing lower shares reported by pool. Am I right?

    ReplyDelete
  11. This comment has been removed by the author.

    ReplyDelete
  12. Hai, really interested about this. But i have no idea what your discussion about. I have MSI RX580 8G Gaming X (Hynix), this gpu have 2 set of timing (1: Samsung) & (2: Hynix). I edit bios by copy from 1:1750 and paste to 2:2000 & 2:2250. If you don't mind, can u check if below detail ar the best customize or u can correct for me to get the highest gpu compabilities on hashrate. Tqsm.

    Samsung :-

    TRCDW=14 TRCDWA=14 TRCDR=24 TRCDRA=24 TRRD=5 TRC=69 Pad0=0 TNOPW=0 TNOPR=0 TR2W=28 TCCDL=3 TR2R=5 TW2R=16 Pad0=0 TCL=22 Pad1=0 TRP_WRA=51 TRP_RDA=25 TRP=20 TRFC=157 Pad0=0 PA2RDATA=0 Pad0=0 PA2WDATA=0 Pad1=0 TFAW=0 TCRCRL=2 TCRCWL=7 TFAW32=0
    MC_SEQ_MISC1: 0x20140514
    MC_SEQ_MISC3: 0xA000897A
    MC_SEQ_MISC8: 0x00000003
    ACTRD=16 ACTWR=16 RASMACTRD=49 RASMACTWR=57 RAS2RAS=150 RP=44 WRPLUSRP=54 BUS_TURN=23

    Hynix :-

    TNOPW=0 TNOPR=0 TR2W=25 TCCDL=2 TR2R=5 TW2R=17 Pad0=0 TCL=18 Pad1=0 TRP_WRA=48 TRP_RDA=22 TRP=19 TRFC=148 Pad0=0 PA2RDATA=0 Pad0=0 PA2WDATA=0 Pad1=0 TFAW=10 TCRCRL=2 TCRCWL=6 TFAW32=7
    MC_SEQ_MISC1: 0x20140174
    MC_SEQ_MISC3: 0xA000896A
    MC_SEQ_MISC8: 0x20310002
    ACTRD=21 ACTWR=15 RASMACTRD=41 RASMACTWR=47 RAS2RAS=148 RP=39 WRPLUSRP=49 BUS_TURN=22

    ReplyDelete
    Replies
    1. Forgot to tell that i use SRBPolaris v3 on Windows 10. HeHe.

      Delete
    2. I share the knowledge, but I don't do the work for other people. Read datasheets and the discussions on bitcointalk, look at the Linux AMD driver source, and you should be able to figure it out.

      Delete
  13. Hi,

    My RX460 with Hynix memory has a few memory corruption errors with the 2000MHz strap: 999000000000000022559D0031625C489055131339CDD50A00408600740114206A8900A00200312019123037AD2C3A16

    Can someone help with a few tweaks and what is very wrong. I tried to increment decrement following many docs and so on but a few of the parameters (SRB Polaris) relationship are fuzzy.


    thanks anyway for the articles and shared knowledge!

    ps. maybe i post somewhere my strap experiments...

    ReplyDelete
  14. Hello! I wanted to use your script to modify the straps. The link to the script does not work. When can we expect the recovery of the script?

    ReplyDelete
    Replies
    1. The script can still be found on my github account. It's the cgi version that I haven't moved to my new server, as it is not a priority at this time.

      Delete
  15. Hi, is there anyway I can run the script or apply these changes in windows? I really know nuts about Linux at this point thanks!!!

    ReplyDelete
    Replies
    1. Python is available for Windoze, and downloads can be found on https://www.python.org/.

      Delete
  16. I'm not proficient in python but installed and don't know how to use the scrypt. It says prynt rx straps missing. Can you help?

    ReplyDelete
    Replies
    1. I'm no longer improving strapread, and have updated the README with pointers to other tools.
      https://github.com/nerdralph/strapread

      Delete
  17. Ralph Doncaster, do you know how to read a strap from .rom file or GPU from pcie in linux? Without using windows and bios editor

    ReplyDelete
    Replies
    1. My fork of amdmeminfo directly reads the GPU memory registers.
      https://github.com/nerdralph/amdmeminfo

      Delete
  18. Can you make a voltage offset mod :D

    ReplyDelete
  19. dear ralph
    i have samsung 8gb do you have any advice that i can change on my strap compared with yours

    ReplyDelete