mentby.com
Blog | Jobs | Help | Signup | Login

Intel buys QLogic InfiniBand business



Interesting article.

Difficult for me analyse - usually you sell your business when it's a  
succes, or when you want to run away.
Not sure which of the 2 it is here.

Maybe some years from now with some support from Intel that Qlogic  
also can unroll FDR. Right now they're stuck with QDR,
which on their homepage they announce as 40 gigabit per second.

http://www.qlogic.com/Products/adapters/Pages/InfiniBandAdap[..]

Showing the Qlogic 7300 series.

Mellanox is slamdunking with FDR now, the new generation network  
which is double the bandwidth i suppose from QDR,
which already got unrolled a few months ago and should be shipping by  
now.

Qlogic AFAIK didn't even announce their next generation network yet,  
let alone display it
and still toys with QDR, which is what i toy at home with. Fact they  
announced 'improving' the oldie QDR
i would interpret as bad news for innovating to FDR.

Maybe someone from Mellanox wants to comment on FDR and whether it's  
double the bandwidth of QDR,
as i suppose some will be monitoring this list.


Vincent Diepeveen Mon, 23 Jan 2012 11:58:36 -0800

wonder what Intel's thinking - could do some very interesting stuff,
but it would take a bit of charisma.  QPI-over-IB anyone?

I'm not crazy about Intel being a vertically-integrated HPC supplier
(chips, systems, interconnect, mpi, compilers - I guess they still
don't have their own scheduler or sexy cloud branding ;)

the world is a better place when each level has internal competition
based on useful, open (free), multi-implementation standards.


Mark Hahn Mon, 23 Jan 2012 13:21:07 -0800

Markets always go through these full on vertical integration phases (for
a while) before the assets are sold off (either voluntarily or via
bankruptcy court).  Its a natural part of the business cycle.

Cisco is building servers now.  Oracle, the whole stack.  Pretty soon,
some whipper snapper of a company is going to come along and eat their
lunches, and then they will get competitive pressure to change.

single vendor" (that is until they get eventually screwed over by that
one vendor, or realize that the "great deal" they are getting really
isn't as great as it sounded ... ).  Which is part of the reason its so
hard getting into accounts other vendors have locked up.  Sadly, lots of
this works around the spirit (and probably skating very close to the
edge of the letter) of the law surrounding most public acquisition
processes, but thats life I guess.

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman*******
web  :  http://scalableinformatics.com
         http://scalableinformatics.com /sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615


Joe Landman Mon, 23 Jan 2012 13:34:44 -0800

That's what I'm thinking!


Prentice Bisbal Mon, 23 Jan 2012 13:46:58 -0800

forget it

maybe they just want a new generation ethernet nic dirt cheap for  
their motherboards;

if you produce it in those numbers as they do probably anything gets  
dirt cheap,
this doesn't bit highend, yet it might be cheaper then to buy qlogic  
than pay royalties to
any of the infiniband vendors; which would be either mellanox or qlogic.

Also they bought qlogic for 125 million dollar, though in cash, which  
doesn't seem to me as exceptionnel much
from intels viewpoint whereas they might intend to sell some of their  
upcoming line of vector cpu's which badly
need a network of course.

125 million is just a few supercomputers. maybe it was just a cheap  
buy, as qlogic doesn't have FDR yet, who knows?

What i wonder about is how wallstreet knew in advance about qlogic  
getting taken over. If we look careful we see that
since say roughly december 19th 2011, the nasdaq rose roughly 10.5%  
and qlogic rose quite a lot more, several percent.

So it was significant more in demand than the index, which is weird  
if we realize that qlogic has unrolled nothing those months
whereas its competitor Mellanox has unrolled FDR.

It's obvious some traders knew this deal was coming, but real  
fingerpointing is not my job.

Vincent


Vincent Diepeveen Mon, 23 Jan 2012 13:48:23 -0800

I remember way back hearing the IB was going to be the technology to
replace all those various buses (PCI, etc) on a motherboard [1], then it
all went quiet and then it re-emerged as an interconnect.  So perhaps
Intel (who were part of one of the two groups that merged to create IB)
have thoughts again on this?

cheers,
Chris

[1] interestingly a similar comment appears on the IB Wikipedia page
under history, but sadly without references..

http://en.wikipedia.org/wiki/InfiniBand#History

- --
    Christopher Samuel - Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel*******Phone: +61 (0)3 903 55545
          http://www.vlsci.unimelb.edu.au/


Christopher Samuel Mon, 23 Jan 2012 15:01:28 -0800

Do you mean IB over QPI ?
Either way, High Node Count Coherence will be an issue.
In any case, by acquiring their IP it is a step forward towards SoC (System on
Chip). A preliminary step (building block) for the Exascale strategy and for
low cost enterprise/cloud solutions.

Joshua
------ Original Message ------
Received: 03:47 PM CST, 01/23/2012


Joshua Mora Acosta Mon, 23 Jan 2012 15:03:14 -0800

Just ignore his statement - it's total nonsense.

Nanosecond latency of QPI using 2 rings versus something that has a  
latency up to factor 1000 slower
with the pci-e as the slowest delaying factor.

Doing cache coherency over that forget it.

From what i understand a big problem at modern cpu's is the  
crossbar. At latest chip displayed,
the bulldozer, it's taking a significant amount of transistors.

If you confront that crossbar suddenly with latencies a a factor 4000  
slower, that's not gonna let it perform better
of course.

Not with intel. Intel sells fast equipment yet it has a huge price  
always,
about the opposite of infiniband which is a dirt cheap technology.

I guess we must see this much simpler. At such a giant as intel,  
paying a bit over 100 million is peanuts.
Probably less than what they would need to pay for royalties to a  
manufacturer owning a bunch of patents
in the ethernet NIC area; the HPC intel gets 'for free'.

Allows them to produce maybe a 10 gigabit ethernet NIC dirt cheap  
without needing to pay royalties to qlogic.
It will not be a big performer such 10 gigabit ethernet nic, yet  
price matters a lot of course when integrating. Every penny counts then.

What you typically see with intel is that for them the mass market is  
so important, read that's the 1 gigabit ethernet market right now,
that all other products suffer there, as they will give their mass  
market products always, of course, priority.

Itanium is a good example; it always was proces generations behind  
their main products. It never was given a fair chance to compete.

So where they win it with sandy bridge becasue it's soon a proces  
generation or 2 having the edge on AMD,
there intels other products suffer from this,as they don't get that  
proces technology.

meanwhile ethernet is total crucial to have low latency for the  
financial world, as they can make dozens of billions a year by being  
faster
than others at exchanges.

Now back to that mass market and integration of a good and especially  
cheap 10 gigabit nic into intels mainboards,
this buy might be pretty interesting to intel.

Yet that's a market so big, it has nothing to do with HPC i'd argue.

From HPC viewpoint i wouldn't see this takeover as a threat to  
anyone in HPC,
i guess it basically means intel won't challenge for the crown in  
HPC, giving Mellanox monopoly for a while at FDR.

It's about ethernet i bet.


Vincent Diepeveen Mon, 23 Jan 2012 15:23:35 -0800

Hear that Shai F?  Stop work on vSMP now, cause Vincent says it can't
work!!!

More seriously, with this acquisition, I could see serious contention
for ScaleMP.  SoC type stuff, using IB between many nodes, in smaller boxen.

Yes.

Must use Shakespeare for this takedown:  Methinks thou dost protesteth
too much ...

So ... exactly what are the existing intel 10GbE NIC's then ... Swiss
Cheese?  I see a fair number of vendors licensing Intel's IP, or, more
to the point, using Intel silicon (hint: this might be a good reason for
the acquisition) to build their stuff...

... which they have been doing for years ...

... not sure they were, but its possible Qlogic has 10GbE IP that Intel
licenses, but this transaction was about ... Infiniband ...

Errr ... given that this is one of our core markets, don't mind if I
note that latency is critical to these players, so proximity to the
exchange, and reliable and deterministic latency is absolutely critical.
  There are switches that are doing 300ns port to port in the Ethernet
space now.  With the NICs, you are looking in the 2-ish microsecond
regime.  These are not cheap.

Compare this to QDR.  1 microsecond +/- some.

Which has lower latency?

There are many reasons why exchanges (mostly) aren't on IB.  A few of
them are even valid technical reasons.  Historical momentum, and
conservative approaches to new technology rank pretty high.  So does the
inability to generally export IB far and wide.  And the complexity of
the stack.  Ethernet is (almost) plug and play.  Its just a network.

IB is sort of kind of plug, install OFED, and play for a while over
IPoIB until you can recode for some of the RDMA bits.  And don't try to
run file systems and other things with lots of traffic over IPoIB.  It
leaks and gradually you will catch some cool ... surprises.

Honestly, its a shame that IPoIB never really got the attention it
deserved like the other elements of the IB stack did.  Getting a rock
solid IP implementation atop a fast/low latency net could have driven
many design wins outside of HPC.  And would have been a gateway
drug^H^H^H^Htechnology for using the other stack elements.

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman*******
web  :  http://scalableinformatics.com
         http://scalableinformatics.com /sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615


Joe Landman Mon, 23 Jan 2012 16:04:09 -0800

There is an implicit /sarc tag here BTW.  vSMP does a wonderful job
(where Vincent claims that things won't work ... they do work, and very
well at that).

Serious contention to buy ScaleMP (as in potential acquirers)

Must be getting too much blood in the coffee stream.  Can't communicate ...

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman*******
web  :  http://scalableinformatics.com
         http://scalableinformatics.com /sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615


Joe Landman Mon, 23 Jan 2012 16:06:58 -0800

That would be some BlueGene type machine you speak about that intel  
would produce with a low power SoC.

This where at this point the bluegene type machines simply can't  
compete with the tiny processors
that get produced by the dozens of millions.

"The tiny processors have won"
    Linus Thorvalds

Intel has themselves a second law of Moore. You can google for it.  
Every new generation of factory that
can produce this machine with double the number of transistors, that  
factory also is 2x more expensive.

A few years ago intel projected that by 2020 building a single  
factory would have a cost of 20 billion dollar.

Now Obama might contribute to this by overspending 40-50%, more  
overspending than the overspending of
Greece, Spain, UK and Portugal combined.

So that will cause massive inflation, which will hurt the poor most,  
and it sure will help the 2nd law of Moore become sooner a reality
rather than later; yet if we move away from politics to money and  
mass production;
i hope you realize that a few HPC cpu's won't pay back for 20 billion  
dollar.

In short only cpu's that get mass produced can.

A good example of massproduced processors are gpu's.

If we look at the leading gpu's, which have by now thousands of  
cores, there is no way to compete with that with SoC's.

What's price of producing 1 gpu versus 200 SOC's with a small core?

Furthermore intel never really could compete in the SOC world so far  
with the low power cpu's that get produced by the billion a year,
so betting on that would be quite surprising, though not impossible  
gamble.

Intel always has been good in low latency designs. yet obviously  
further integration of logics into the cpu means of course you also
need a capable ethernet chip in your cpu. Qlogics can provide that.

Mass produce half a billion of those and then it's cheaper to buy a  
company with such technology than to pay royalties.

Another HPC problem with the bluegene type designs:

all those soc's basically spread the calculation power over a bigger  
area than 1 big power eating chip will.
Bigger area means bigger distance to transfer massive data, and  
that's in itself a very expensive thing.

Overall seen bluegene machines never really had a low power usage,  
despite some stupid professors shouting that.
Per gflop it always was never the performance king; they just  
compared with total hopeless type designs and IBM usually
delivered in time, something that is very important in HPC as well.

IMHO the only reason bluegene could be competative is because it was  
fighting dinosaur type HPC cpu's.

Now SoC's might be mighty interesting in the gamersworld and in the  
telecom to build new phones with,
wich makes it mighty interesting for intel to produce those  
dirtcheap, and maybe even put a more capable ethernet
chip on it, again dirtcheap; as for the HPC world i don't see it  
happen that this SoC can compete anyhow with a gpu or even CPU.

Better write some code in CUDA or OpenCL i'd argue.

Latest AMD gpu the HD Radeon 7970, it is delivering 1 teraflop or so?

With soon a 2 gpu version coming on 1 card that's gonna deliver close  
to 2 Tflop a card, double precision yes.
Multiply by 4 for single precision. 8+ Teraflop single precision.

For a couple of hundreds of dollars. Nvidia will undoubtfully follow  
with their 1 teraflop gpu.

If take a washing machine and pack it with cheapo socks, creating a 2  
Tflop machine, do you guess you can SELL that for a couple of
hundreds of dollars?

Just transport costs already will be more expensive than a single gpu  
card...

Intel cannot compete with that in HPC for the stuff that needs  
bandwidth and doesn't care for latency. as at a new proces technology,
they first go produce a few FPGA cpu's, and after that they produce  
worlds fastest CPU. So there is simply no window in
time to use the latest proces technology for a HPC vector type chip.  
That's why AMD-ATI and Nvidia will win that contest handsdown.

And we sure hope intel will keep selling its cpu's very well, which  
if it is the case means that this won't change.

After all they already make cash on majority of supercomputers as  
each node also usually has 2 Xeon cpu's which go for a multiple of  
the price
of the GPU that's in the box...


Vincent Diepeveen Mon, 23 Jan 2012 16:39:29 -0800

So that's why the top 5 places on the last Green500 are all BlueGene..

- --
    Christopher Samuel - Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel*******Phone: +61 (0)3 903 55545
          http://www.vlsci.unimelb.edu.au/


Christopher Samuel Mon, 23 Jan 2012 16:52:53 -0800

I wondered about that as well.

When i see 1 gpu get nearly 1 teraflop eating probably a tad more  
power than
official, say a 250 watt it'll consume. I already use more power now  
than the specs in
fact.

Yet even then that's 4 gflop per watt.

Last time i calculated bluegene, sure that's probably the previous  
generation,
it was 3 watts per gflop, or factor 12 more power than a Radon HD 7970.

Please note that in the statements of most HPC centers claiming blue  
gene to be energy efficient,
usually they do not release numbers.

But now the important question, what's price of bluegene per teraflop?

It's let's have a look, around a 500 euro or so for a Radeon HD7970  
card.

Vincent


Vincent Diepeveen Mon, 23 Jan 2012 16:59:47 -0800

What does that matter if you can't power or cool a similar performance
GPU system?   Let alone have any applications that will actually take
advantage of it.

cheers,
Chris
- --
    Christopher Samuel - Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel*******Phone:  61 (0)3 903 55545
          http://www.vlsci.unimelb.edu.au/


Christopher Samuel Mon, 23 Jan 2012 17:07:03 -0800

Numascale does this already with SCI

--
Doug

--
Doug

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Douglas Eadline Mon, 23 Jan 2012 17:07:31 -0800

There were some exascale goals mentioned. I wonder if there is
some plans for a MIC based exascale beast

--
Doug

--
Doug

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Douglas Eadline Mon, 23 Jan 2012 17:14:03 -0800

For...chess?  ;D

*Torvalds, and if Linux (or any well-supported kernel/OS for that
matter) currently had data structures designed for extremely high
parallelism on a single MoBo (i.e. 100s to 10,000s of cores) then I
would agree with this statement.  As I currently see it, all we can
really say is that someday, probably, perhaps even hopefully:

"The tiny processors will win."

That's after we work out all the nasty nuances involved with designing
new data structures for OSes that can handle that number of cores, and
probably design new applications that can use these new OS features.
And no, GPU support in Linux doesn't count as this already having been
done.  We just farm out very specific code to run on those things.  If
somebody has an example of a full-blown, usable OS running on a GPU
ALONE, I would stand (very interestingly) corrected.

Thanks, for a moment there, I almost used AskJeeves.

Was waiting for the hook.  Inevitable really.  I think if we were
discussing the efficacy and quality of resultant bread from various
bread machines versus the numerous methods for making bread by hand
somehow, someway, a GPU would make better bread.  Might be a wholesome
cyber-loaf of artisan wheat, but nonetheless, it would be better in
every way.

Best,

ellis


Ellis H. Wilson III Mon, 23 Jan 2012 17:20:59 -0800

They sold 300 systems, is claim on homepage. Not exactly what intel  
aims for. I bet they instead aim to sell half a billion cpu's with
built in ethernet - let's face it their NICs started to get outdated.

For HPC it won't be a slamming succes let alone give you any  
performance.

After all what's price of 1000 SoC's with 1000 tiny cpu's on it, that  
together produce you 1 teraflop,
versus 1 manycore that produces 1 teraflop?

This is not what you buy Qlogics for.

Maybe it was just a cheap buy for the number of patents they posses,  
and the big need within intel for some engineers
that can improve their cpu's with connectivity that the average user  
will like; as for HPC,
moving those engineers within intel to the areas where intel can make  
most cash, that's with cpu's and not with HPC
hardware, seems Mellanox gets a monopoly on HPC network performance.


Vincent Diepeveen Mon, 23 Jan 2012 17:54:15 -0800

I figured out the main why:

http://seekingalpha.com/news-article/2082171-qlogic-gains-ma[..]

That's the whole market, and QLogic says they are #1 in the FCoE
adapter segment of this market, and #2 in the overall 10 gig adapter
market (see http://seekingalpha.com/article/303061-qlogic-s-ceo-discusse[..] )

Historically, QLogic had a fibre channel adapter business that was a
huge cash cow, and they bought their way into various markets and had
limited success with them: iscsi, fibre channel switches, and yes,
InfiniBand, where QLogic managed to get some large sales (TriLabs 3 PF
procurement) yet was at only 15%-20% market share.

I'm surprised that QLogic could succeed in 10gige adapters given all
the competition, but hey, I never understood why fibre channel was
popular, either.

Now that QLogic has found what the next best thing after fibre channel
adapters is, they might as well concentrate on it. It'll be
interesting what Intel plans to do in the exascale market. I've
thought for a long time that non-cache-coherent processors like MIC
ought to have InfiniPath-like hardware queues for sending and
receiving short messages efficiently, even on-chip.

Not to mention that whole exascale thing.

-- greg


Greg Lindahl Mon, 23 Jan 2012 20:56:22 -0800

it's easy to source and build pretty big IB systems;
how much so with SCI?

I actually like the idea of high-fanout-distributed-router systems,
but they seem prepetually exotic.  where are the hypercubes, FNNs?
afaikt, commodification of IB has snuffed topology as a design issue,
except for cray/BG/k machine-level projects.


Mark Hahn Mon, 23 Jan 2012 21:33:38 -0800

Inevitably, though, massively parallel interconnects (all boxes connected
to all other boxes) won't scale.


Jim (337C) Lux Mon, 23 Jan 2012 21:54:08 -0800

Indeed, when thinking about scale I always end up thinking about
the masters of scale -- ants

--
Doug

--
Doug

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Douglas Eadline Tue, 24 Jan 2012 04:47:33 -0800

You can soup up a local 3d torus with a small network
like connectivity. That keeps the the node connectivity
and number of wires still manageable.

Moreover, the universe does it with local connectivity
(even quantum entanglement needss a relativistic channel
to tell it from RNG) just fine. A 3d grid/torus would
be a good match for anything that can do long-range
by iterating short-range interactions.


Eugen Leitl Tue, 24 Jan 2012 05:21:10 -0800

s/small network/small world network

--
Eugen* Leitl <a href=" http://leitl.org">leitl</a> > http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820  http://www.ativel.com   http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE


Eugen Leitl Tue, 24 Jan 2012 05:23:43 -0800

Unfortunately, ants only run a small set of specialized codes, and are not
the generalized computing resource that we're looking for (and, frankly,
don't yet know how to effectively use, if it were to exist)


Jim (337C) Lux Tue, 24 Jan 2012 08:23:55 -0800

Greg,

That can explain why QLogic is selling, but not why Intel is buying.

10 years ago, Intel went _out_ of the Infiniband marked, see  http://www.neworkworld.com/newsletters/servers/2002/01383318[..]

So has the IB business evolved so incredible well compared to what Intel epected back in 2002? Do not think so.

I would guess that we will see message passing/RDMA over Thunderbolt or siilar.

Håkon


HÃ¥kon Bugge Fri, 27 Jan 2012 11:30:31 -0800

Qlogic offers that QDR.
Mellanox is a generation newer there with FDR.

Both in latency as well as in bandwidth a huge difference.


Vincent Diepeveen Fri, 27 Jan 2012 12:06:20 -0800

I found that statement interesting.   I've actually not known anything
about their 10GbE products.  My bad.

Intel buying makes quite a bit of sense IMO.  They are in 10GbE silicon
and NICs, and being in IB silicon and HCAs gives them not only a hedge
(10GbE while growing rapidly, is not the only high performance network
market, and Intel is very good at getting economies of scale going with
its silicon ... well ... most of its silicon ... ignoring Itanium here
...).  Its quite likely that Intel would need IB for its PetaScale
plans.  Someone here postualted putting the silicon on the CPU.  Not
sure if this would happen, but I could see it on an IOH, easily.  That
would make sense (at least in terms of the Westmere designs ... for the
Romley et al. I am not sure where it would make most sense).

But Intel sees the HPC market growth, and I think they realize that
there are interesting opportunities for them there with tighter high
performance networking interconnects (Thunderbolt, USB3, IB, 10GbE
native on all these systems).

Haven't looked much at FDR or EDR latency.  Was it a huge delta (more
than 30%) better than QDR?  I've been hearing numbers like 0.8-0.9 us
for a while, and switches are still ~150-300ns port to port.  At some
point I think you start hitting a latency floor, bounded in part by "c",
but also by an optimal technology path length that you can't shorten
without significant investment and new technology.  Not sure how close
we are to that point (maybe someone from Qlogic/Mellanox could comment
on the headroom we have).

Bandwidth wise, you need E5 with PCIe 3 to really take advantage of FDR.
  So again, its a natural fit, especially if its LOM ....

Curiously, I think this suggests that ScaleMP could be in play on the
software side ... imagine stringing together bunches of the LOM FDR/QDR
motherboards with E5's and lots of ram into huge vSMPs (another thread).
  Shai may tell me I'm full of it (hope he doesn't), but I think this is
a real possibility.  The Qlogic purchase likely makes this even more
interesting for Intel (or Cisco, others as a defensive acq).

We sure do live in interesting times!

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman*******
web  :  http://scalableinformatics.com
         http://scalableinformatics.com /sicluster
phone:  1 734 786 8423 x121
fax  :  1 866 888 3112
cell :  1 734 612 4615


Joe Landman Fri, 27 Jan 2012 12:20:22 -0800

Why buy previous generation IB in such case?
It's about the ethernet of course...

They produce tens of millions of cpu's each quarter and also
announced a SoC (socket on chip).

From SoC's actually the market produces billions a year. So it's
alucrative market, yet highly competative.

Having 10 gigabit ethernet on such SoC and the total at a low price
would give intel a huge lead there
worth dozens of billions a year.

It's not clear to me where all their SoC plans go, but i bet right
now they are open to any market needing SoC's.

Note that many SoC's are dirt cheap. Even in very low volume we speak
about some tens of dollars, cpu included
and other connectivity included.

Price is everything there, yet i guess intel will be offering the
'top' SoC's there with faster cpu's and 10 GigE.

Then they produce a bunch of mainboards.

Think also of upcoming generation of consoles, ipad 3's and similar
products etc - it's not clear
yet which company gets the contracts for upcoming consoles, it's all
wide open for now.

Yet they might sell also a 100  million of those.

Intel is an attractive company to do business with for console
manufacturers now.

IBM's cell kind of lost momentum there and has nothing new to offer
that really outperforms as it seems.
Also power usage of cell was kind of disappointing.

Initial version PS3 was 220 watts on average and 100% usage it could
go up to 380  watt.
Try to put that on your couch.

Don't confuse this with the later crunching CELL version, a much
improved chip, used for some supercomputers.

Yet if i remember well, some reports, was it Aad v/d Steen (?)
already predicted it would be not interesting for upcoming
supercomputers
as it is some kind of hybrid chip - which has no long term future.

He was right.

Undoubtfully they'll try something in the HPC market.

If you already have put lots of cash in development of a product it's
better to put it
on the market.

Based upon their name they'll sell some.

And some years from now they should have something bigtime improved.
Yet realize how complicated it is to tape out a GPU at a new process
technology
  if you aren't sure you gonna sell a 100  million of them.

Such massive projects have to pay back for factories. A product
that's having a potential of not even selling for over a few dozens
of billions of dollars is not even interesting to develop.

Just startup costs for a GPU at a new proces technology is some
dozens of millions for each run and the more complex it is and the
newer the proces technology the more expensive it is.

Realize IBM produces its power7 and bluegene/q upcoming cpu at 45 nm
technology.

GPU's release now in 28 nm. That's giving theoretically an advantage
of a tad less of (45 / 28) ^ 2 = 2.58

So a gpu of intel needs to be factor 2.58 better in the same proces
technology than todays gpu's of
AMD (already released 28 nm) and Nvidia (coming soon 28 nm i'd expect).

This where with cpu's, intels big advantage is always that they are
better in getting newer proces technologies to work sooner than the
competition.

Ivy Bridge will be 22 nm so i heard rumours.

Posting here some months ago from Gilad Shainer was it's 0.85 us RDMA
for FDR versus 1.3 us or so for the other;
more importantly for clusters is the bandwidth.

I guess that pci-e 3.0 allows simply much higher speeds whereas the
QDR is PCI-E 2.0 stuff.

Isn't pci-e 3.0 about 2x higher bandwidth than 2 pci-e 2.0?

Now i might be happy with that last, but i guess that for big FFT's
or be it matrice,
you still need massive bandwidth.

Even if n is big in O ( k *  n log n )

Where k in case of matrice is a tad bigger than n and in case of
Number Theory is usually around the number of bits,
so 3.32 times n or so, that means you still need k steps of n log n.

That's massive bandwidth.

There is a lot of headroom for better latencies from software viewpoint,
as cpu's keep getting faster yet latency of years ago networks was
just marginally
worse than what's there now.

In case of hardware i really am no expert there.

All the socket2011 boards that are in the shops now are PCI-e 3.0 and
a wave of
mainboards with 2 sockets will release a few days before or at the
same day that
intel finally releases the Xeon version of Sandy Bridge.

Seems it didn't release yet as it's not too high clocked, if i look
at this sample cpu :)

It's 2Ghz to be precise (8 cores Xeon).

A technology that just sold to 300 machines, this is not interesting
market for intel.

They have very expensive factories that each cost many billions of
dollars.
These need to produce nonstop and sell products, to pay back for the
factories and to make a profit.

Intel used to be worth over a 100 billion dollar at NASDAQ.

Wasting your most clever engineers, from which each company always
has too few, to products that can't keep busy your
factories, is a total waste of time. So your huge base of B-class
engineers, let me not quote some mailing list names,
that's the ones you move to Qlogic then for the HPC.

That's enough to keep it afloat for a while in combination with
'intel inside'.

Intels profit is too huge to be busy toying with tiny markets with a
handful of customers,
from which majority forgot to take their medicine when you propose
rewriting the software to some new hardware platform
you are gonna unroll. A habit intel is not exactly excited about of  
course, as they like to sell each time new technology.

Also each larrabee intel would sell means they sell a bunch of xeons
less of course.

Not for everyone i guess - many lost their job and as i predicted
some years ago a guy with a
nobel prize might be carpet bombing a huge nation this summer.

Intel has 3 huge factories in Israel last time i checked.

It sure can give unpredicted results for future.


Vincent Diepeveen Fri, 27 Jan 2012 13:41:38 -0800

[... merciful trimming ...]

IP.  Its all about IP.  Its always about IP.  If ever you think its not
about IP, you should remember "Landman's N+1<sup>th</sup> rule of M&A:
It's the IP man ... just da IP!"

... no its not.  Intel has its own ethernet.  Its had it for a LONG
time, and it did not buy Qlogic ethernet ... Its not about the ethernet.
  Say it with me ... ITS NOT ABOUT THE ETHERNET ... There, don't you
feel better now?  I do ...

SoC is "System On a Chip".  Socket on a chip is ... er ... cart before
the horse?

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman*******
web  :  http://scalableinformatics.com
         http://scalableinformatics.com /sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615


Joe Landman Fri, 27 Jan 2012 13:47:51 -0800

That's right. This was probably bought, not sold. If you look at the
press release Intel put out, it's all about Exascale computing.

http://newsroom.intel.com/community/intel_newsroom/blog/2012[..]

If you want to put an IB HCA in a CPU or a {north,south}bridge,
TrueScale nee InfiniPath is a much smaller implementation than others,
and most of the chip is memory, which Intel knows how to shrink
drastically compared to the usual way people implement memory.

Also, keep in mind that Intel's benchmarking group in Moscow has a lot
of experience with benchmarking real apps for bids using TrueScale
head-to-head against other HCAs, and I wouldn't be surprised if it was
the case that TrueScale QDR is faster than that other company's FDR on
many real codes, for the usual reason that TrueScale's MPI-oriented
InfiniBand extension is more suited for MPI than the standard
InfiniBand has-more-features-than-MPI-requires protocols.

Finally, I haven't seen it mentioned whether or not QLogic's IB switch
was part of the purchase. If it is, then you should note that it's not
hard to make that chip speak ethernet, and Intel could probably
dramatically improve it with their superior serdes technology.

-- greg


Greg Lindahl Fri, 27 Jan 2012 14:14:08 -0800

So I wonder why multiple OEMs decided to use Mellanox for on-board solutions and no one used the QLogic silicon...

Surprise surprise... this is no more than FUD. If you have real numbers to back it up please send. If it was so great, how come more people decided to use the Mellanox solutions? If QLogic was doing so great with their solution, I would guess they would not be selling the IB business...

Beowulf mailing list, Beowulf*******sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit  http://www.beowulf.org/mailman/listinfo/beowulf


Gilad Shainer Fri, 27 Jan 2012 14:26:57 -0800

I'm not surprised, as this 10ge adapter is aimed at the same part of
the market that uses fibre channel, which isn't that common in HPC. It
doesn't have the kind of TCP offload features which have been
(futilely) marketed in HPC; it's all about running the same fibre
channel software most enterprises have run for a long time, but having
the network be ethernet.

Are you talking about the latency of 1 core on 1 system talking to 1
core on one system, or the kind of latency that real MPI programs see,
running on all of the cores on a system and talking to many other
systems? I assure you that the latter is not 0.8 for any IB system.

Last time I did the computation, we were 10X that floor. And, of
course, each increase in bandwidth usually makes latency worse, absent
heroic efforts of implementers to make that headline latency look
better.

-- greg


Greg Lindahl Fri, 27 Jan 2012 14:27:35 -0800

today announced a definitive agreement to sell the product lines ... associated with its InfiniBand business to Intel Corporation ..."

So "the product lines" means both the switch and HCA product lines.

Last summer Intel acquired an Ethernet switch business: http://newsroom.intel.com/community/intel_newsroom/blog/2011[..]
so it is not unprecedented that they are interested in switching as well as host technologies.

-Tom

If it is, then you should note that it's not

This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.


Tom Elken Fri, 27 Jan 2012 15:09:57 -0800

I was a bit surprised that the entire transcript had only one
side-ways mention of IB.  also interesting that they seem quite
heavily into the heavily-offloaded adapter market (which is sort
of the opposite of the original infinipath stuff.)

has there been any mention of Thunderbolt in a switched context?
afaikt it's just a weird "let's do faster USB and throw in video" thing.

weird to have redundant/competing parts in many of the same markets though.
afaik, intel 10G has a reasonable rep; they presumably won't be junking
their own products.

I can't quite tell whether Qlogic's IB switches use Mellanox chips or not.
afaik, Qlogic has their own adapter chips (and perhaps FC/eth).

mellanox qdr systems I've tested are about 1.6 us half-rtt pingpong.
I don't think the switch latency is a big deal, since with 36x fanout,
you don't need a very tall fat-tree.

really?  I'd be interested in hearing from real people who've actually
used it (not marketing, thanks).  I don't really understand how ScaleMP
can do the required coherency in units smaller than a page, which means
that "non-embarassing" programs will surely notice...


Mark Hahn Fri, 27 Jan 2012 15:13:01 -0800

With the QDR generation, QLogic developed its own IB switch chip, and uses it in the 12000 line of switches.

-Tom

This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.


Tom Elken Fri, 27 Jan 2012 15:25:19 -0800

That's a strange argument.

What does Intel want?  Something to make them more money.

In the past that's been integrating functionality into their CPU or
support chipsets.  In the past that's been sata, usb, memory controller,
pci-e controller, and GigE.  The cost in transistors and die
area seems very relevant to Intel's interests.

Anyone have an estimate on how much latency a direct connect to QPI
would save vs pci-e?

What to motherboard board manufacturers want?  Something to make them
more money.

So that's mostly marketing/reputation, pricing, and whatever they can do
to differentiate themselves.  If buying a $150 IB chip lets them charge
$400 more then it's a win, assuming they spend less than $250 of R&D to
add it to the motherboard.  I doubt the difference in transistors or a
few watts would be a big deal either way.

that TrueScale

FUD = Fear, Uncertainty, and Doubt.  Doesn't sound like FUD to me.
More like a cheap attack on Greg, I think we (the mailing list) can do
better.

I've personally compared several generations of Myrinet and Infinipath
to allegedly faster Mellanox adapters.  Mellanox hasn't won yet, but
benchmarks to find the best solution and it might well be Mellanox next
time.  It would be irresponsible to recommend Mellanox cluster provide
just pick mellanox FDR over Qlogic QDR just because of the spec sheet.
Of course recommending Qlogic over Mellanox without quantifying real
world performance would be just as irresponsible.

Maybe we could have a few less attacks, complaining and hand waving and
more useful information?  IMO Greg never came across as a commercial
(which beowulf list isn't an appropriate place for), but does regularly
contribute useful info.  Arguing market share as proof of performance
superiority is just silly.

  There is some add latency due to the 66/64 new encoding, but overall
  latency is lower than QDR. MPI is below 1us.

I googled for additional information, looked around the Mellanox
website, and couldn't find anything.  Is that above number relevant to
HPC folks running clusters?  Does it involve a switch?   If not
realistic are there any realistic numbers available?


Bill Broadley Fri, 27 Jan 2012 18:11:11 -0800

That makes sense.

I am looking at these things from a "best of all possible cases"
scenario.  So when someone comes at me with new "best of all possible
cases" numbers, I can compare.  Sadly this seems to be the state of many
OEM/integrators/manufacturers.

In storage, we see small disk form factor SSDs marketed generally, with
statments like 50k IOPs, and 500 MB/s.  Though they neglect to mention
several specific issues with these, such as writing all zeros, or the
75k IOPs are sequential IOPs you get from taking the 600 MB/s interface,
dividing by 8k byte operations on a sequential read.  Actually do a real
random read and write and you get very ... very different results.
Especially with non-zero (real) data.

I think thats the point though, that moving that performance "knee" down
to lower latency involves (potentially) significant cost, for a modest
return ... in terms of real performance benefit to a code.

Thanks for the pointer on the computation.  If we are 1000x off the
floor, we can probably come up with a way to do better. 10x, probably
its much harder than we think and not necessarily worth the effort.

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman*******
web  :  http://scalableinformatics.com
         http://scalableinformatics.com /sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615


Joe Landman Fri, 27 Jan 2012 18:25:03 -0800

~ 0.2us. Remember that the first 2 generations of InfiniPath were both
SDR: one for HyperTransport and one for PCIe. The difference was 0.3us
back then; PathScale + QLogic did some heroic things since to shorten
the pipeline stages & up the clock rate.

-- greg
(and if anyone needs a reminder, I no longer have any financial
involvement with QLogic or Intel.)


Greg Lindahl Fri, 27 Jan 2012 21:30:16 -0800

The point I've been trying to make for the past 8 years is that one of
the two chip families you're looking at doesn't degrade as much as the
other from the "best of all possible cases" to a real cluster running
a real code.

And if you knew that one family of SSDs had a wildly different ratio
of peak alleged perf to real application performance, would you ignore
that? I suspect not.

-- greg


Greg Lindahl Fri, 27 Jan 2012 21:34:28 -0800

It is not an argument, it is stating a fact. If someone claims that a product provide 10x better performance, best fit etc., and from the other side it has very little attraction, something does not make dense.

Intel explained their move in their PR. They see lots of growth in HPC, definitely in the Exascale, and they see InfiniBand as a key to deliver the right solution. They also mention InfiniBand adoption in other markets, so a good validation for InfiniBand as a leading solution for any server and storage connectivity.

<snip>

I never saw any genuine testing from PathScale and then QLogic comparing their stuff to Mellanox, and you are more than welcome to try and prove me wrong. The argument in this email thread is no more than a re-cap of QLogic latest marketing campaign and yes, it is no more than FUD. Cheap attacks are not my game, so please....

Going into a bit more of a technical discussion... QLogic way of networking is doing everything in the CPU, and Mellanox way is to implement if all in the hardware (we all know that). The second option is a superset, therefore worse case can be even performance. I encourage you to contact me directly for any application benchmarking you do, and I will be happy to provide you the feedback on what you need in order to get the best out of the Mellanox products. That can be QDR vs QDR as well, no need to go to FDR - I am open for the competition any time...

I am not sure about that... quick search in past emails can show amazing things...
I believe most of us are in agreement here. Less FUD, more facts.

It is with a switch

-Gilad

Beowulf mailing list, Beowulf*******sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit  http://www.beowulf.org/mailman/listinfo/beowulf


Gilad Shainer Sat, 28 Jan 2012 10:23:07 -0800

you are mistaken.  you ask a pointed question - do not construe it
as a statement of fact.  if you wanted to state a fact, you might say:
"multiple OEMs decided to use Mellanox and none have used Qlogic".

by stating this, you are implying that Mellanox is superior in some way,
though another perfectly adequate explanation could be that Qlogic
didn't offer their chips to OEMs, or did so at a higher price.  (in fact,
the latter would suggest the possibility that Qlogic chips are actually
worth more.)  note my use of subjunctive here.

in reality, Mellanox is the easy choice - widely known and used,
the default.  OEMs are fond of making easy choices: more comfortable
to a lazy customer, possibly lower customer support costs, etc.

this says nothing about whether an easy choice is a superior solution
to the customer (that is, in performance, price, etc).

I saw no 10x performance claim here.  there was some casual mention
of a situation where Qlogic QDR performs similar to Mellanox FDR.

besides Lustre, where do you see IB used for storage?

this is a dishonest statement: you know that QLogic isn't actually trying
to do *everything* in the CPU.

this is also dishonest: making the adapter more intelligent clearly
introduces some tradeoffs, so it's _not_ a superset.  unless you are
claiming that within every Mellanox adapter is _literally_ the same
functionality, at the same performance, as is in a Qlogic adapter.

"facts" in this context (as opposed to FUD, armwaiving, etc) must be
dispassionate and quantifiable.  not hyperbole and suggestive rhetoric.

out of curiosity, has anyone set up a head-to-head comparison
(two or more identical machines, both with a Qlogic and a Mellanox card of
the same vintage)?

regards, mark hahn.


Mark Hahn Sat, 28 Jan 2012 13:29:46 -0800

Mark, i stumbled upon the same problem a few months ago when i  
googled for 4x infiniband you can find something,
when moving up to QDR it becomes more sporadic.
Not to mention that the interesting test is where the cards are bad -  
latency.
If you find anything, usually it's manufacturer side statements  
without clear testsetup and usually doing 0 byte tests.

This is exactly why i intend to write a benchmark.

What i personally believe is not important whether FDR,  pci-e 3.0  
and a considerable higher claimed bandwidth than pci-e 2.0 QDR.

What i do believe is that one must measure objectively.

That's why i'm posting for a while now that as soon as the cluster  
works here i'm gonna
write a benchmark to measure latencies moving up the read length  
slowly so that it more and more gets a bandwidth game and simply  
present the
graph for the interested readers.

We're not interested in theoretic tests of 1 core busy that is  
measuring a latency of another core at the other side busy.

A test really requires all cores busy and hammering onto the network  
card.

In the end always everything is a measure of bandwidth of course, but  
even then the lack of scientists online who tested objectively QDR,
no matter *what manufacturer*, such tests really are there in short  
supply and some of them either just tested 1 tiny thing or a  
theoretic thing,
or just lacked all realism when i read the rest of the article.

All with all, after some days of googling,

I found 1 tester who toyed something using the same switch (good  
idea) but the graphs drawn presenting the results are tough to interpret
and basically was interested in something else than what's fast now  
for the network cards.

Running the same oldie tests, whereas all manufacturers have way  
faster alternatives now, such as RDMA reads, is just not interesting.

To be continued in some months...


Vincent Diepeveen Sat, 28 Jan 2012 16:12:12 -0800

You probably meant to say "I think differently" and not "you are mistaken".... Making this mailing list little more polite will benefit us all.  


OEMs don't place devices on the motherboard just because they can, not because it is cheaper. They do so because they believe it will benefit their users, hence they will sell more. I can assure you that silicon was offered from both companies, and it wasn't an issue of price. From this point you can make any conclusion that you wish to.

<snip>

Protocols: iSER (iSCSI), NFSoRDMA, SRP, GPFS, SMB and others
OEMs: DDN, Xyratex, Netapp, EMC, Oracle, SGI, HP, IBM and others.

You are right, you do need a HW translation from PCIe to IB. But I am sure you know where the majority of the transport, error handling etc is being done....

It is not dishonest. In general offloading is a superset. You can chose to implement just offloading or to leave room for CPU control as well. There will always be parts that are better to be in HW, and if you have flexibility for the rest it is a superset.  

Maybe we read different emails.

Beowulf mailing list, Beowulf*******sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit  http://www.beowulf.org/mailman/listinfo/beowulf


Gilad Shainer Sat, 28 Jan 2012 21:04:52 -0800

as far as I can tell, this paper mainly says "a coalescing stack delivers
benchmark results showing a lot higher bandwidth and message rate than a
non-coalescing stack."  the comment on figure 8:

     To some extent, the environment variables mentioned before
     contribute to this outstanding result

which is remarkably droll.  I'm not sure how well coalescing works for real
applications.


Mark Hahn Mon, 30 Jan 2012 07:07:15 -0800

First, I looked on the paper and it includes latency and bandwidth comparison as well, not only message rate. It is important for others to know that, and not to dismiss it. Second, both companies have options for message coalescing. You can chose to use it or not - I saw apps that got a benefit from it, and saw applications that does not. Without coalescing Mellanox provides around 30M message per second.

-Gilad.


Gilad Shainer Mon, 30 Jan 2012 11:23:37 -0800

Note also that many of the benchmarks in this analysis weren't run
using MPI -- if I remember correctly, the ib_* commands mentioned use
InfiniBand verbs directly, which means they aren't accellerated on
InfiniPath.

-- greg


Greg Lindahl Mon, 30 Jan 2012 23:54:21 -0800



Related Topics

Post a Comment