Optimizing HPC Server Memory Configurations

Choosing the right memory configuration for your next server can have a significant impact on application performance. Long gone are the days when all you needed to do was specify memory size based on simple “GB per core” rules. Yet I am surprised how many customer RFPs still have little in the way of memory requirements besides capacity. The DIMM type, number of memory channels, DIMMs per channel (DPC), DIMM size, DIMM speed, and channel speed can all impact application performance, often resulting in 33% or greater variation in application performance, even with two identical servers configured with exactly the same memory capacity. In this blog I’ll cover some of the important things to remember when specifying server memory configurations.

For starters, you need to understand some basics about UDIMMs, also referred to as unbuffered or unregistered memory and RDIMMs referred to as registered or buffered memory. To this alphabet soup you should also add LRDIMM, or Load-Reduced DIMM.

Most modern day servers support either 3 or 4 memory channels per CPU socket and soon 4 memory channels per socket will become the dominant standard. This part of memory configuration is pretty simple, be sure you are specifying at least one memory DIMM per channel. A server with four 4-GB memory DIMMs (one DPC) will almost always perform better than a server with two 8-GB memory DIMMs per channel. Bigger is not always better when it comes to memory DIMMs.

In addition to the number of memory channels, DPC is important. Most servers support from one to three DPC. A cut-rate server with only one DPC may seem like a good buy, until you need to expand your memory and find out there is no way to do so without throwing away your DIMMs and purchasing higher capacity DIMMs. But, more DPC is not always better. Many servers clock down the memory to a lower speed when you add a second or a third DPC. Buying two state-of-the-art 1600 MHz DIMMs per channel does you little good if the server will clock down the memory to 1333 MHz. In that case you might as well purchase the less expensive 1333 MHz memory. So be sure to specify both the speed of the DIMM and the speed of the memory channel if purchasing more than one DPC.

DIMM size is at least still fairly straightforward. At least until you consider the impact of the above. All other things being equal (DIMM type, DIMM speed, DPC, channel speed), a larger DIMM will give you more capacity. Memory prices fluctuate widely based on market conditions, and thanks to Moore’s law we continue to see regular density (although not speed) increases in memory DIMMs. Today, while 4 GB DIMMs are still sold in some servers, the lowest cost/MB is typically achieved with 8GB DIMMs, with a small price penalty to move up to 16 GB DIMMs. Larger 32 GB DIMMs are still quite rare to find in general use because of their high cost but of course this will change over time.

Recently introduced 1600 MHz DIMMs are generally the fastest available today. Compared to the slower 1333 MHz DIMMs, you will see a fairly linear decrease in latency and increase in throughput with 1600 MHz DIMMs. For any performance sensitive application, you should avoid the slower 1066 MHz DIMMs or any server configuration that clocks down the memory bus to 1066 MHz. Again, be sure to ask not only about the max memory and channel speed, but the channel speed your server will operate at as configured. When adding a 2nd or a 3rd DPC, many servers will clock down the memory bus.

In researching this article, I worked with HP’s HPC benchmarking lab to measure latency and throughput of various memory DIMMs and memory configurations using HP’s Cluster Platform 3000 SL6500 with Xeon E5 (Sandy Bridge – EP) 8C 2.60GHz CPUs and FDR Infiniband, along with other HP servers. While this server has not yet been officially launched, some lucky customers like Purdue University already have similar systems up and running and on the Top500 list. We tested a variety of 4 GB, 8 GB, 16 GB, and 32 GB DIMMs, using UDIMMs, RDIMMs, and LRDIMMs, in a variety of configurations including 1 and 2 DPC. Some HP servers which we did not test including the HP ProLiant DL360 also support 3 DPC. In general, the best combination of latency and throughput was achieved with 16 GB RDIMMs running at 1600 MHz with 2 DPC.

Of course, if all of this sounds a bit confusing, don’t worry, HP’s HPC Competency Center is standing by and ready to help you configure your next HPC solution and optimize the memory configuration as well as all other aspects of your system.

About these ads

About Marc Hamilton

Marc Hamilton – Vice President, Solutions Architecture and Engineering, NVIDIA. At NVIDIA, the Visual Computing Company, Marc leads the worldwide Solutions Architecture and Engineering team, responsible for working with NVIDIA’s customers and partners to deliver the world’s best end to end solutions for professional visualization and design, high performance computing, and big data analytics. Prior to NVIDIA, Marc worked in the Hyperscale Business Unit within HP’s Enterprise Group where he led the HPC team for the Americas region. Marc spent 16 years at Sun Microsystems in HPC and other sales and marketing executive management roles. Marc also worked at TRW developing HPC applications for the US aerospace and defense industry. He has published a number of technical articles and is the author of the book, “Software Development, Building Reliable Systems”. Marc holds a BS degree in Math and Computer Science from UCLA, an MS degree in Electrical Engineering from USC, and is a graduate of the UCLA Executive Management program.
This entry was posted in HPC, Uncategorized. Bookmark the permalink.

35 Responses to Optimizing HPC Server Memory Configurations

  1. Pingback: Speed Tips for Optimizing HPC Server Memory Configurations | insideHPC.com

  2. Gopal Raghavan says:

    Marc
    What are the results of your testing on LRDIMMs v/s RDIMMS ? Does the extra capacity mitigate the extra latency ?

  3. Gopal,
    There is no simple answer on LRDIMM vs RDIMM. As noted, RDIMM gives you better memory latency and many HPC applications are sensitive to memory latency. We will be announcing our Gen8 server products soon, but current HP ProLiant SL390s servers most often used in HPC applications already support two DIMMs per channel and with the small price premium now commanded by 16 GB DIMMs most customers can achieve the memory capacities they require using standard RDIMM technology. That being said, LRDIMMs provide a compelling new option for customers who need to support the largest memory capacities, especially when their applications are not sensitive to the increased latency. Here is a good link to a video that talks more about LRDIMMs.

    http://lrdimmblog.inphi.com/

  4. Jo Cohen says:

    Hi Mark,
    From what I gathered, LRDIMM requires a special BIOS in order to achieve its Load-Reduction functionality – and that in turn takes off some functionality from MCH (Memory Controller).
    Does that also contradict with HP Advanced Memory Technology, which is a combination of BIOS code and RDIMM code?
    Have you by any chance tested HyperCloud from Netlist. It is a special type of Load-Reduce memory that is Plug & Play and does not require BIOS changes? I also read that HyperCloud (from Netlist claims) has only 1 clock-cycle penalty when compared the regular RDIMM latency.
    According to Netlist ran testings, their HyperCloud is capable of running 3DPC at 1333MT/s (see Cirrascale/Netlist announcement).

  5. Hi Jo,
    HP is testing both 2DPC at 1600 MHz and 3DPC at 1333 MHz using standard HP memory. What we support in any particular future Gen8 server will be documented as we announce each server. I am aware of the Netlist Hypercloud memory and have reviewed their web page. Our ProLiant SL line of servers designed for HPC typically support 2DPC while some of our standard rack mount ProLiant DL servers support 3DPC. There are very few HPC customers that require more than 256 GB per 2-socket x86 server, and that is quite affordable today using 2DPC and 16 GB DIMMs.

  6. Jo Cohen says:

    Hi Mark,
    The key point in your answer, is that HP is testing 3DPC at 1333MT/s using std HP memory!
    It would be very very interesting so see if they are able to do so without Load-Reduction technology.
    If I understood your post above, 1600MHz will be able to run 2DPC at 1333MHz reduced speed. But 1600MHz memory should cost higher than 1333MHz equivalent of Load-Reduce memory. And if the latter is also capable of running 3DPC at that speed, then what’s the point of buying 1600MHz RDIMM?
    Listening to both Inphi and Netlist conferences – they appear to target specifically the High Performance Computing (HPC) market with their memory solutions. Their solutions tout higher density at higher speed. This does not fit with what you said about HPC customers not requiring more than 256GB per 2-socket servers. From what I read (eg SAS) HPC applications (In-memory DB, CFD, FEA, EDA, etc) developers need both huge amount of memory and speed.
    Outside the HPC, data-centers could also benefit from more memory per server to accommodate more VMs, and running faster per virtual machine.
    Thanks

  7. HPC_fan says:

    Is this 2 DPC at 1600MHz only achievable with the new HP memory – or can one achieve that with standard 1600MHz RDIMMs bought off the market and plugged into HP Gen8 servers ?

  8. There is nothing that prevents customers from using 3rd Party memory in HP ProLiant servers including ProLiant Gen8 servers. HP supports customers right to choose but we are confident in our quality and innovation. Features that are supported today for 3rd party memory such as pre-failure alerts and error event logging in IML will remain the same in Gen8. Enhanced memory performance features above Processor/chipset manufacturer POR specifications for memory speed and power requirements will be unique to HP memory as they are a benefit of our system-level engineering.
    As to specific support for HP or non-HP memory running at 1600 MHz in future Gen8 servers, sorry, I can’t provide any specific information about future Gen8 servers until the individual servers are announced. But based on the interest in the comment thread here, I expect I’ll be writing another blog soon.

  9. HPC_fan says:

    Obviously the 2 DPC at 1600MHz is something very new, since historically it has been:

    http://h18004.www1.hp.com/products/servers/options/memory-description.html

    Single-rank and Dual-rank RDIMM
    1 DIMM Per Channel @ 1333MHz
    2 DPC @ 1333MHz (1.5V), 2DPC @ 1066MHz (1.35V)
    3 DPC @ 800MHz

    Is 2 DPC at 1600MHz planned for HP Gen8 or for the Ivy Bridge-based servers at end of year ?

    Thanks.

  10. Jo Cohen says:

    Mark,
    Referring to your comment about the HPC customers not requiring more than 256GB servers, I just noted that even 256GB system requires that all 3 channels be populated.
    12 slots x 16GB = 192GB (2 DPC)
    18 slots x 16GB = 288GB (3 DPC)
    If 1600MHz “regular” RDIMM can only do 2DPC without reverting to 1333MHz, then such a system will only provide 192GB of memory.

    Again, I’m not sure why would anyone buy 1600MHz to populate a 288GB system capable of running only at 1333MHz, where 16GB HyperCloud has been already demonstrated to run at Cirrascale 3DPC at 1333MH (for 288GB) and cost less?

    Quote >> (related to announcement made 11/16/11): Netlist, a designer and manufacturer of high-performance memory subsystems, today is demonstrating its 288GB HyperCloud DRAM memory running at a breakthrough speed of 1333MT/s on an industry standard server <<

    BTW, can you tell whether 32GB RDIMM has also been tested on Gen 8?

  11. HPC_fan says:

    Jo,

    Many HPC applications require CPU power over memory capacity.

    The 3 DPC applications are more likely to be in the virtualization/cloud computing space where moderate amounts of CPU power are required, but you need increasing total memory to keep up with the increased processor power with every generation of Intel processor (greater cores per CPU).

    For virtualization/cloud computing there is also the compulsion to fit more virtual machines in a server.

    On the other hand, I don’t think every HPC scientific app would be requiring huge amounts of memory.

    The trend of increasing cores per processor compels the use of greater memory per processor – for best processor utilization. That’s bad for server OEMs (less boxes can do the same job) – but then cloud computing/mobile-internet/iCloud type applications will be greatly expanding the need – so I don’t think server OEMs need to worry there – esp. when every person on the planet will eventually be wanting Siri-like cloud capability with his iphone (you get the point of how the demand is also scaling).

  12. Jo Cohen says:

    Mark,
    Your answer to Gopal suggests that HP had to incorporate BIOS functionality from an LRDIMM provider particular to an HP motherboard (or that HP developed such a functionality by itself) – just to enable systems run with LRDIMM memory.

    HP’s Smart/Advanced Memory Technology requires its own specific BIOS isn’t it?

    Was an effort made to make these 2 versions of BIOS compatible?

    Thanks

  13. HPC_fan says:

    Jo,

    LRDIMMs require BIOS updates so that the LRDIMM stuff can work – so obviously HP would have to modify the BIOS – or keep LRDIMM functionality in mind when designing it’s own HP BIOS (which does the DRAM error logging stuff).

    But that is the usual stuff most OEMs will have to do to support LRDIMMs (if they chose to bother).

    What you are asking is if there is some other serious conflict that might happen if HP BIOS wants to do something and the constraints LRDIMMs may impose upon how the BIOS is designed ?

  14. HPC_fan says:

    Obviously if HP has access to Netlist IP – they wouldn’t need to bother with the BIOS conflict with LRDIMM type issues (since HyperCloud doesn’t require BIOS updates and is “plug and play” and “interoperable with regular RDIMMs”).

    In effect an HP “RDIMM” that uses Netlist IP would be reasonably indistinguishable from a regular RDIMM.

  15. Jo Cohen says:

    HPC_fan,
    I’m not arguing that most HPC applications demand speed over capacity. That explains why many top universities are running clusters of computer nodes running in parallel for the most complex simulations (weather, earthquake, aerodynamics, etc).
    However, given that today’s new processors from Intel and AMD have an increasing number of cores and thus the number of threads that can run in parallel – more memory will be needed for them to run efficiently.
    For example, Microsoft says that “CFD (Computational Fluid Dynamics) simulations have extremely high computer memory requirements, ranging from a few GBs to hundreds of GB of RAM”. As the amount of details in their simulation model rise so is the memory requirements.
    If you read some of the few HPC publications, you’ll see that many are talking about memory bottleneck for lack of better and faster simulations.

  16. HPC_fan says:

    Well the other view then is – contradicting Marc – that 3 DPC is not just of use to high memory users – in general 3 DPC is of use when you want to buy cheaper, less-dense memory to achieve the same total memory.

    I think we are right now in a memory pricing anomaly (in that you would consider buying 16GB instead of two 8GB memory modules) – and this is because the 4Gbit DRAM die producers are pricing it cheap to get drive adoption of the new 4Gbit DRAM dies. In addition, most of the “low power” varieties appear in the 4Gbit DRAM. However, there is an expectation that 4Gbit DRAM will increase in price later in the year.

    Because of this anomaly you may not find people flocking to buy 8GB memory modules, but instead choose the 16GB modules.

    But in general having the capability to run memory at 3 DPC at full speed is a valuable capability.

  17. HPC_fan says:

    Marc,

    This blog suggests that “HP Smart Memory” includes some type of “rank multiplication” technology.

    Would this technology be included in ALL “HP Smart Memory” ?

    I would think so – if HP doesn’t want to manage multiple types of memory, and also if HP wants to avoid upgrade issues (“you do not have the right kind of memory to take this to 768GB on a 2-socket server”), it would make sense for HP to include this technology on ALL of the “HP Smart Memory”.

    http://hansdeleenheer.blogspot.com/2012/03/hp-proliant-gen8-technical-deepdive.html

    Wednesday, March 21, 2012
    HP ProLiant Gen8 | technical deepdive

    quote:
    —-
    I do presales for a Gold partner so we get in depth sessions on new products that dig deeper especially on the differences to the former models and why some of the choices are made as they are.
    Today I joined such a technical deepdive, lead by Colin Taylor. Colin is a “Gen8 Master”. This means he is one of those few guys that reads ones and zeroes in your system when 1st, 2nd, 3rd and 4th line support just don’t know anymore where to go with your issues.
    Colin has worked with the Gen8 servers since summer 2011 so is thé man for the job. Today we were the Padawans to this Master.

    There are also checks if the disk drives are in fact a genuine HP disk drives. This is the same for the memory.
    The DDR3 memory of a G7 server is also not interchangeable in Gen8 because of some extra features. One of them is that there is a 32Gb Quad Rank DIMM that shows to the CPU as a Dual Rank type just for the fact of being able to use all the lanes.
    Another cool memory aspect is that when using HP genuine DIMMs (no Kingston for example) you’ll be able to run higher speed then in industry standards even on fully equipped servers.
    —-

  18. HPC_fan says:

    I suppose the 4-rank to 2-rank comment may have been about the 32GB LRDIMM.

    However HP is advertising a “25% higher speed” it seems for it’s RDIMMs as well:

    3 DPC at 1066MHz at 1.35V

  19. HPC_fan says:

    The second rollout in Romley “tiered rollout” now has HP listing Netlist HyperCloud for the HP Gen8 servers.

    It is the only memory that delivers:

    3 DPC at 1333MHz at 1.5V

    That is, for heavy memory loading applications.

    It is available as a Factory Installed Option (FIO).

    http://h18004.www1.hp.com/products/quickspecs/14225_na/14225_na.html

    quote:
    —-
    Load Reduced DIMMs (LRDIMM)
    HP 32GB (1x32GB) Quad Rank x4 PC3L-10600L (DDR3-1333) Load Reduced CAS-9 Low Voltage Memory Kit 647903-B21

    HyperCloud DIMMs (HDIMM)
    HP 16GB (1x16GB) Dual Rank x4 PC3-10600H (DDR3-1333) HyperCloud CAS-9 FIO Memory Kit 678279-B21
    NOTE: This is a Factory Installed Option (FIO) only.

    Performance
    Because HP SmartMemory is certified, performance tested and tuned for HP ProLiant, certain performance features are unique with HP SmartMemory. For example, while the industry supports DDR3-1333 RDIMM at 1.5V, today’s Gen8 servers support DDR3-1333 RDIMM up to 3 DIMMs per channel at 1066MT/s running at 1.35V. This equates to up to 20% less power at the DIMM level with no performance penalty and now with HyperCloud Memory on DL360p Gen8 and the DL380p Gen8 servers will support 3 DIMMs per channel at 1333MT/s running at 1.5 V. In addition, the industry supports UDIMM at 2 DIMMs per channel at 1066MT/s. HP SmartMemory supports 2 DIMMs per channel 1333MT/s, or 25% greater bandwidth.
    —-

    “HP Smart Memory HyperCloud” or “HP HDIMM” is the name they are giving it.

  20. HPC_fan says:

    What is the significance of something being a Factory Installed Option (FIO) – when are things made FIOs ?

    Or why is something made a FIO – because it is thought that particular combination would be ordered often – or the fit is good – or because ordering it separate would be a problem for the end-customer ?

  21. HPC_fan says:

    Maybe it is because if you are doing 3 DPC then you would get that type of memory.

    Or alternatively if you get HP Smart Memory HyperCloud or HDIMM – then there is no point getting it lower than 3 DPC (since 3 DPC is what it is useful for).

    So the FIO forces customer to choose the optimal configuration for that type of memory.

  22. There are many different memory configurations supported by HP’s Gen8 servers, and these vary from server to server based on design criteria for the server. If in doubt, best to consult an HP representative to help configure memory to meet your specific requirements. In the HPC space, most customers today are opting for the fastest 1600 MHz memory.

  23. HPC_fan says:

    For HPC that makes sense – i.e. stay within 2 DPC so one can run the 1600MHz RDIMM memory modules at full speed.

    1 DPC at 1600MHz at 1.5V
    2 DPC at 1600MHz at 1.5V
    3 DPC at 1066MHz at 1.5V

    For 3 DPC, the HP HDIMMs (NLST HyperCloud) clearly dominates – CAD, virtualization type of applications.

    This seems like a niche (i.e. 3 DPC) – however it is indicative of things to come.

    The figures above for RDIMM are for 16GB RDIMMs 2-rank using 4Gbit DRAM die – i.e. 2-rank RDIMMs.

    When 32GB RDIMMs arrive, they will be 4-rank. 32GB RDIMM that are 2-rank will not be producable until 8Gbit DRAM die appear (which can be years or never in the future – the investment required to go to 8Gbit DRAM die are huge and only Samsung supposedly maybe capable of that – from what I gathered from Netlist comments in CCs).

    If you use 4-rank, the speed slowdowns are more severe – and start showing up at 2 DPC.

    The speed tables in the HP docs do not have a column for quad-rank, but for illustrative purposes, here is one from IBM.

    http://www.redbooks.ibm.com/abstracts/tips0850.html

    IBM System x3650 M4
    IBM Redbooks Product Guide

    quote:
    —-
    Table 5. Maximum memory speeds:

    RDIMM – dual-rank (2-rank) – at 1.5V
    – 1 DPC at 1333MHz
    – 2 DPC at 1333MHz
    – 3 DPC at 1066MHz

    RDIMM – quad-rank (4-rank) – at 1.5V
    – 1 DPC at 1066MHz
    – 2 DPC at 800MHz
    – 3 DPC not supported (because 4 ranks x 3 DPC = 12 ranks which exceeds the 8 ranks per memory rank limit of current systems)

    LRDIMM – at 1.5V
    – 1 DPC at 1333MHz
    – 2 DPC at 1333MHz
    – 3 DPC at 1066MHz

    HCDIMM – at 1.5V
    – 1 DPC at 1333MHz at 1.5V
    – 2 DPC at 1333MHz at 1.5V
    – 3 DPC at 1333MHz at 1.5V
    —-

    So with 32GB RDIMMs (4-rank), it becomes impossible to ignore the speed slowdowns – even at 2 DPC.

    For this reason Intel was pushing LRDIMMs because of the “load-reduction” feature (which is a Netlist IP).

    Right now LRDIMM buffer chipsets are only produced by Inphi.

    Inphi seems to be repeating the same mistakes that led to the downfall of MetaRAM (stealing someone else’s IP).

    Inphi has employed some former execs of MetaRAM (former CEO of MetaRAM) and seems to be going down the same path – their litigation is not going well with Netlist (their challenge of Netlist patents at the USPTO has led to those patents becoming stronger as they have been revalidated by the patent reexam process). Inphi has withdrawn it’s retaliatory lawsuit, while the case against them will be a walkover with the gain in patent strength just gained.

    Inphi is also not a patent powerhouse in this area – and seems to be just a component supplier and seems to be doing the bidding of bigger players.

    The other companies who could make LRDIMMs have prudently excused themselves – IDTI for the time being by delaying LRDIMM rollout to later this year, and Texas Instruments has notably not been interested in LRDIMMs following their settlement in Netlist vs. Texas Instruments.

    This leaves LRDIMMs in a dubious position.

    To add insult to injury, LRDIMMs have “high latency issues” and have had problems achieving 1333MHz at 3 DPC at 1.5V (HP docs above list them running at 1066MHz).

    In addition, they do not scale over well to DDR4 (which is an extension of the Netlist HyperCloud IP).

    For these reason, we are starting to see greater visibility for the Netlist memory – and this may get more prominent at 32GB (and higher speeds required for DDR4).

    Some time before DDR4 mainstreaming, I expect JEDEC to license the Netlist IP.

    Netlist says this will be the first “proprietary” memory to gain mainstream acceptance in the industry. The word “proprietary” is deceptive – it suggests use of Netlist IP – however the reason this memory is able to be used is because it presents the memory as standard memory to the system (unlike LRDIMMs which require a BIOS update – which many Romley systems have implemented in order to support Intel’s push for LRDIMMs).

  24. HPC_fan says:

    From above HP docs link:

    quote:
    —-
    Performance

    Because HP SmartMemory is certified, performance tested and tuned for HP ProLiant, certain performance features are unique with HP SmartMemory. For example, while the industry supports DDR3-1333 RDIMM at 1.5V, today’s Gen8 servers support DDR3-1333 RDIMM up to 3 DIMMs per channel at 1066MT/s running at 1.35V. This equates to up to 20% less power at the DIMM level with no performance penalty and now with HyperCloud Memory on DL360p Gen8 and the DL380p Gen8 servers will support 3 DIMMs per channel at 1333MT/s running at 1.5 V. In addition, the industry supports UDIMM at 2 DIMMs per channel at 1066MT/s. HP SmartMemory supports 2 DIMMs per channel 1333MT/s, or 25% greater bandwidth
    —-

    HP is currently advertising 16GB RDIMMs and 16GB HDIMMs (NLST HyperCloud) and UDIMMs.

    The only 32GB memory module shown is the 32GB LRDIMM.

    This is possibly because the 32GB RDIMM would show impaired performance at 2 DPC (as outlined in previous post).

    HP/Samsung at the IDF conference on LRDIMMs video (see Inphi LRDIMM blog) say (in answer to a question) that they will not be pushing the 16GB LRDIMM (because it cannot outperform the 16GB RDIMM 2-rank using 4Gbit DRAM die which will be plentifully available – for the same “high latency issues” in LRDIMMs mentioned in previous post). They said they would push the 32GB LRDIMM – and the reason is that the 32GB RDIMM can only made at 4-rank – and this allows 32GB LRDIMMs to shine. However 32GB LRDIMMs will not outperform the 32GB HyperCloud.

    Here is a rough summary of latencies that I have collected (the Cisco UCS info is taken from some comment in a blog and may not be accurate but sounds right – it is for the Catalina ASIC-on-motherboard approach which Cisco has since dropped – presumably putting in it’s lot with the mainstream crowd i.e. use LRDIMM/HyperCloud from now on also).

    http://messages.finance.yahoo.com/Stocks_%28A_to_Z%29/Stocks_N/threadview?m=te&bn=51443&tid=42437&mid=42484&tof=1&frt=2#42484

    Re: HC potential based on IPHI data .. huge latency difference Inphi LRDIMMs HyperCloud CSCO UCS 14-Jan-12 12:55 am

    quote:
    —-
    So in summary if you compare the latency advantages for Netlist:

    – LRDIMMs have a “5 ns latency penalty” compared to RDIMMs (from Inphi LRDIMM blog)
    – CSCO UCS has a “6 ns latency penalty” compared to RDIMMs.
    – NLST HyperCloud have similar latency as RDIMMs (a huge advantage) and have a rather significant “4 clock latency improvement” over the LRDIMM (quote from Netlist Craig-Hallum conference)
    —-

    So you can see that 16GB LRDIMMs are non-viable vs. the 16GB RDIMMs (2-rank) – because of the “high latency issues” with LRDIMMs.

    The 32GB LRDIMMs are viable vs. the 32GB RDIMMs (4-rank), but not viable vs. the 32GB HyperCloud – because of worse latency than the HyperCloud.

    The IDF conference reasoning may be the reason for why you do not see any 16GB LRDIMMs advertised on HP docs.

    Yet both IBM and HP list the 32GB LRDIMM in their docs – even though IBM docs had the 32GB LRDIMM as “Available later in 2012″ (why not list the 32GB HyperCloud then which will also be available later ?).

    I suspect this was because OEMs were under pressure from Intel to push the LRDIMMs – they couldn’t push the 16GB LRDIMM, so they pushed a non-existant 32GB LRDIMMs.

    Perhaps 32GB LRDIMMs are available for HP now – if they are, please inform.

    Netlist has said in recent CC that 32GB HyperCloud is still under qualification. We will get more info on the claims discussed above when the 32GB HyperCloud becomes available at IBM and HP.

  25. Thanks HPC_fan for all the additional comments.

  26. HPC_fan says:

    Trying to simplify the use case in light of info in last 2 posts.

    Choosing between RDIMMs and HyperCloud

    On a 2-socket server, each Romley processor has 4 memory channels (up from 3 of pre-Romley). On each memory channel if you use 3 DPC (3 DIMMs per channel) you have:

    2 sockets x 4 memory channels per socket x 3 DIMMs per channel = 24 DIMM sockets

    or

    8 x 3 DPC = 24 DIMM sockets.

    Using 16GB or 32GB memory modules at 1 DPC, 2 DPC and 3 DPC (i.e. various levels of loading on the memory bus):

    8 x 3 DPC = 24 DIMM sockets – 384GB (16GB) – 768GB (32GB)
    8 x 2 DPC = 16 DIMM sockets – 256GB (16GB) – 512GB (32GB)
    8 x 1 DPC = 8 DIMM sockets – 128GB (16GB) – 256GB (32GB)

    Using 16GB memory modules (RDIMM/HyperCloud)

    Currently with 16GB RDIMMs (2-rank), at 2 DPC loading of the memory bus will allow 1600MHz RDIMMs to run at 1600MHz (according to HP docs above). For a total of 256GB.

    When you need more than 256GB, you have to load memory bus more – at 3 DPC – and slowdowns start appearing.

    At 3 DPC using 16GB RDIMMs – you would be better off getting the 16GB HyperCloud.

    So in summary using 16GB RDIMM/HyperCloud – you would use RDIMMs for less than 256GB (1600MHz achievable), and HyperCloud for 384GB (1333MHz achievable).

    Using 32GB memory modules (RDIMM/HyperCloud)

    Once 32GB RDIMMs are available – they will be 4-rank (for reasons in previous post).

    And speed slowdowns will happen at 2 DPC (and possibly even at 1 DPC – the IBM docs in previous post show even 1 DPC affected – running at 1066MH !).

    So in summary using 32GB RDIMM/HyperCloud – you may not want to use RDIMM at all (1066MHz or lower speeds achievable) – but only use the 32GB HyperCloud (1333MHz achievable).

    Other options

    If Netlist could supply a 1600MHz HyperCloud that would simplify the comparison at 16GB even more – for example 3 DPC at 1600MHz.

    LRDIMMs (load-reduced DIMMs) are the other options – and one can see why Intel wanted a “load-reduction” solution to appear with Romley. However, LRDIMMs underperform the HyperCloud – LRDIMMs have “high latency issues” (see posts above) and don’t deliver 1333MHz at 3 DPC (HP docs above).

    However as in previous post, LRDIMMs have “high latency issues” etc. as mentioned in previous post.

    LRDIMMs also have some issues with legality of LRDIMMs – as mentioned in previous post.

    LRDIMMs use a “centralized buffer chipset”, while the industry is moving toward’s Netlist’s “distributed buffer chipset” approach:

    http://www.theregister.co.uk/2011/11/30/netlist_32gb_hypercloud_memory/

    Netlist puffs HyperCloud DDR3 memory to 32GB
    DDR4 spec copies homework
    By Timothy Prickett Morgan
    Posted in Servers, 30th November 2011 20:51 GMT

    Another comparison is available in the CMTL test labs comparison of LRDIMMs vs. HyperCloud:

    http://www.netlist.com/products/hypercloud/whitepapers/hcdimm_vs_lrdimm_whitepaper_march_2012.pdf

    HyperCloud HCDIMM Outperforms LRDIMM in  ‘Big Data’ & ‘Big Memory’ Applications 
    Whitepaper 
    March 2012  

  27. HPC_fan says:

    Monolithic and DDP memory packages

    The reason the 32GB LRDIMM cannot deliver 1333MHz at 3 DPC (HP docs) MAY have to do with it’s use of 4Gbit x 2 i.e. dual-die packaging (DDP) memory packages being used. The design of the DDP – which has two DRAM dies in one package, may impede the ability to do “load-reduction” properly.

    The 32GB RDIMMs (4-rank) are also using 4Gbit DDP.

    In contrast the Netlist 32GB HyperCloud uses 4Gbit DRAM die (monolithic) memory packages – just as the ones used on the 16GB RDIMMs (2-rank).

    Netlist uses it’s “Planar-X” IP to open up real-estate for the greater number of memory packages on a single memory module.

    More on this in this thread:

    http://messages.finance.yahoo.com/Stocks_%28A_to_Z%29/Stocks_N/threadview?m=tm&bn=51443&tid=47909&mid=48037&tof=1&frt=2

    Re: Now Wall Street knows Via Seeking Alpha .. comment .. 32GB LRDIMMs 4Gbit x 2 (DDP) 1-Apr-12 02:40 pm

    quote:
    —-

    http://www.netlist.com/products/hypercloud/

    quote:
    —-
    NMD4G7G31G0DHDxx 32GB 1333MHz 2Rx4 4Gb Planar-X LP – NEW
    —-

    The description suggests it is a 2-rank (virtual) i.e. just like the 16GB HCDIMM. And it uses 4Gbit DRAM die (monolithic). And it uses NLST’s Planar-X IP.

    Here is the indication that 32GB LRDIMMs are using 4Gbit x 2 (DDP) memory packages:

    http://www.samsung.com/global/business/semiconductor/support/brochures/downloads/memory/samsung_LRDIMM.pdf

    quote:
    —-
    Lineup: 32GB (4Gb DDP), 16GB (2Gb DDP)
    —-

    http://www.inphi.com/lrdimm/images/pdfs/LRDIMM-whitepaper.pdf

    pg. 5:

    quote:
    —-
    LRDIMM capacities up to 32GB are possible today with 4Rx4 modules using
    4 Gb, DDP (dual-die package) DRAM.
    —-

    —-

  28. HPC_fan says:

    Marc,

    I can understand that 3 DPC at 1333MHz is the realm of virtualization, CAD etc.

    And HPC apps have access to 2 DPC at 1600MHz.

    But what to do when 32GB RDIMMs (4-rank) roll around (IBM shows 4-rank delivering 1066MH at 1 DPC, and a pitiful 800MHz at 2 DPC).

    I therefore suspect that HPC apps may take a slight bandwidth hit on the 1 DPC, and a more severe hit on the 2 DPC – which would make “load-reduction” start to become relevant to HPC as well ?

    Does this make sense ?

    I am surprised there is not more awareness of this issue though ?

  29. HPC_fan says:

    quote:
    —-
    We tested a variety of 4 GB, 8 GB, 16 GB, and 32 GB DIMMs, using UDIMMs, RDIMMs, and LRDIMMs, in a variety of configurations including 1 and 2 DPC. Some HP servers which we did not test including the HP ProLiant DL360 also support 3 DPC.
    —-

    Ok, I see that your benchmarks did not include the servers on which “HP Smart Memory HyperCloud” is shipping at 3 DPC (factory installed option):
    – HP ProLiant DL360
    – HP ProLiant DL380

    Which are servers for high volume virtualization type of applications (with Romley’s faster processors you can fit more VMs but then need the appropriate memory to fit the greater number of VMs per server).

  30. ddr4memory says:

    I have written up some instruction for memory choices for the HP DL360p and DL380p virtualization servers and the similar offering from IBM, the IBM System x3630 M4 server.

    Hope they are simple to understand:

    http://ddr4memory.wordpress.com/2012/05/23/installing-memory-on-2-socket-servers-memory-mathematics/

    May 23, 2012
    Installing memory on 2-socket servers – memory mathematics

    For HP:

    http://ddr4memory.wordpress.com/2012/05/23/memory-options-for-the-hp-dl360p-and-dl380p-servers-16gb-memory-modules/

    May 23, 2012
    Memory options for the HP DL360p and DL380p servers – 16GB memory modules

    http://ddr4memory.wordpress.com/2012/05/23/memory-options-for-the-hp-dl360p-and-dl380p-servers-32gb-memory-modules/

    May 23, 2012
    Memory options for the HP DL360p and DL380p servers – 32GB memory modules

    For IBM:

    http://ddr4memory.wordpress.com/2012/05/23/memory-options-for-the-ibm-system-x3630-m4-server-16gb-memory-modules/

    May 23, 2012
    Memory options for the IBM System x3630 M4 server – 16GB memory modules

    http://ddr4memory.wordpress.com/2012/05/23/memory-options-for-the-ibm-system-x3630-m4-server-32gb-memory-modules/

    May 23, 2012
    Memory options for the IBM System x3630 M4 server – 32GB memory modules

  31. HPC_fan says:

    An article on memory choices for the HP DL360p and DL380p virtualization servers. Hope it is simple to understand.

    I’ll get to the IBM System x3630 M4 server shortly.

    http://ddr3memory.wordpress.com/2012/05/24/installing-memory-on-2-socket-servers-memory-mathematics/

    May 24, 2012
    Installing memory on 2-socket servers – memory mathematics

    For HP:

    http://ddr3memory.wordpress.com/2012/05/24/memory-options-for-the-hp-dl360p-and-dl380p-servers-16gb-memory-modules/

    May 24, 2012
    Memory options for the HP DL360p and DL380p servers – 16GB memory modules

    http://ddr3memory.wordpress.com/2012/05/24/memory-options-for-the-hp-dl360p-and-dl380p-servers-32gb-memory-modules/

    May 24, 2012
    Memory options for the HP DL360p and DL380p servers – 32GB memory modules

  32. Mark
    HPC fan appears to be a netlist employee who is touting Hypercloud relentlessly on this and other blogs. Maybe it would be best if you removed all his postings and replaced them by a single posting summarizing Hypercloud benefits (or not) ? This blog is getting very hard to read when one single poster occupies over 75% of it !

  33. vicl2012v says:

    Netlist is so understated – they have to partner with the same people eventually. So no wonder they are not going to point out problems.

    However, that does not mean everybody should keep quiet about it – if LRDIMMs have problems why not discuss it openly ?

    Especially the OEMs have no axe to grind.

  34. HPC_fan says:

    I have been a shareholder for 3 years (and am not an employee) and have followed the ins and outs of this company’s trajectory (so I am intimately aware of the court docs, the activities at JEDEC as evidenced in the court docs (GOOG, SMOD, Inphi)) and the patent reexaminations the company had to go through (and has survived). The patents that Inphi challenged have survived reexamination (those who know patents know what this means).

    HyperCloud has arrived and is underpinning next-gen memory and given how LRDIMMs blankets current discussion about revolutionizing the memory industry – but it does not harm people if they know what is underpinning this movement.

    Right now there is an hysteresis in how people are reacting – as many folks have hitched their bandwagon to the directive Intel had given – and it will take time for them to turn. While others have moved faster. I believe IBM and HP have moved fast and will benefit from this.

    Here is my analysis of what pressures Intel was under when they chose LRDIMMs (I might be wrong – but maybe your feedback will add color to the discussion):

    http://ddr3memory.wordpress.com/2012/05/24/the-need-for-high-memory-loading-and-its-impact-on-bandwidth/

    May 24, 2012
    The need for high memory loading and it’s impact on bandwidth

    http://ddr3memory.wordpress.com/2012/05/24/lrdimm-buffer-chipset-makers/

    May 24, 2012
    LRDIMM buffer chipset makers

    http://ddr3memory.wordpress.com/2012/05/24/intels-need-for-lrdimms-on-roadmap-to-ddr4/

    May 24, 2012
    Intel’s need for LRDIMMs on roadmap to DDR4

    I wonder if you complained as much when LRDIMMs were being publicized as the best thing – did you examine what the issues might be with LRDIMMs ?

  35. Marc says:

    Thanks everyone for all your comments on this blog. Time to move on to other topics, so I am closing this entry to further comments.
    Marc

Comments are closed.