Talk:Single instruction, multiple data
This article is rated Start-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||
|
Other early SIMD machines
[edit]Other examples of early SIMD machines were:
- Xplor supercomputer, from Pyxsys, Inc., circa 2001
- Connection Machine, models 1 and 2 (CM-1 and CM-2), from Thinking Machines Corporation, circa 1985
- Zephyr DTC computer from Wavetracer, circa 1991
- Massively Parallel Processor (MPP), from NASA/Goddard Space Flight Center, circa 1983-1991
There were many others from this era as well. At least some of them deserve mention on the SIMD page. — Preceding unsigned comment added by Gwilcox (talk • contribs) 20:35, 13 February 2005 (UTC)
Max. popularity
[edit]Max. Popularity :
- SIMD is very used for these slowest video-codecs of complex design : XviD, DivX, H.263+ AVC (the best of 2004 was AVC NeroDigital), ...
The reasons are that many users are P2P users, movies pirates, porn viewers, ... and they need to compress the space of MPEG-2 DVD 4.7 GB to XviD/DivX/AVC MPEG-4 CD 700 MB (for slow Internet) using the tool VirtualDub. The typical time to complete the compression with SIMD depending on the used codec is between 2 hours using the fastest PC and 12 hours using the slowest PC. — Preceding unsigned comment added by 62.14.140.198 (talk • contribs) 15:45, 15 March 2005 (UTC)
DSPs not past-tense
[edit]DSPs aren't exactly past-tense. There are more of them in use now than there are processors with SIMD units. — Preceding unsigned comment added by 134.129.123.40 (talk • contribs) 23:03, 1 September 2005 (UTC)
- yep. Tensilica sold its billionth Audio DSP over 12 years ago. They were bought by... Cadence? i think? and they supply MAXIM. they also do LTE SIMD DSPs for use in phones. plenty of those around. And TI hasn't stopped selling DSPs either. Lkcl (talk) 05:06, 14 June 2021 (UTC)
Stream processing
[edit]Another SIMD/MIMD architecture has emerged in stream processors. I am not well aware of where to put this update in the page (maybe a previous editor has an idea). I believe it's quite important this page (the only which is not a stub) also mentions the new paradigm functionality. MaxDZ8 09:52, 27 October 2005 (UTC)
SIMD definition
[edit]- $1 = $2 + $3
- Single instruction ? Yes: +
- Multiple data ? Yes: $1, $2, $3
There's no definition of what SIMD means. Please write one in the text. --Hdante 11:25, 4 March 2006 (UTC)
bad link
[edit]there is a bad link in http://www.teranex.com/support/docs/TeranexParallelProc.pdf [1] — Preceding unsigned comment added by Pizzadeliveryboy (talk • contribs) 12:11, 2 June 2006 (UTC)
Z80
[edit]I'm surprised the Z80 isn't mentioned. It was arguably the first mass-market CPU with SIMD instructions, albeit not terribly well implemented ones (the main purpose appeared to be to save memory by allowing block copies to be implemented with one two-byte instruction. The instruction was refetched on every cycle, meaning that it didn't offer significant performance advantages over coding the loops by hand.) 66.32.15.71 13:11, 13 August 2006 (UTC)
- It's unincluded because that's not really SIMD in the modern sense: it's really operating on just one chunk of data per clock cycle -- in other words it's executing the instruction many times, with one word processed per instruction executed. Most modern engineers use SIMD to refer to an architecture that processes several orthogonal chunks of data in parallel with a single instruction executed. This is a vague concept on an out-of-order CISC processor like the Intel (because what the heck does a "clock" mean anyway when a given instruction can take anywhere between two and twenty clocks?) but a very real one on something like the PlayStation 2's vector unit, where you really can add two groups of four floating point numbers to each other every clock. Collabi 19:12, 14 August 2006 (UTC)
- Also, the Z80 had a single ALU for processing the LDIR and LDDR instructions. The ALU was only involved in the inc/dec-then-repeat part, I suspect. Typical about a SIMD architecture is that there is a processing unit of some kind (the ALU in case of the Z80) in each of the parallel branches. The LDIR and LDDR instructions are microcoded loops, but they are not parallelisation concepts. Rick 11:13, July 6, 2015 (UTC) — Preceding unsigned comment added by 83.161.146.46 (talk)
Why the speedup is 75% in the graph?
[edit]Shouldn't it be 300%? — Preceding unsigned comment added by Berniefu (talk • contribs) 00:52, 10 December 2012 (UTC)
I agree, if it can do four times as many operations in the same time it's a 300% speedup. Also, does anyone else think the font in those images is hideous and inappropriate? Jrmrjnck (talk) 19:54, 27 February 2013 (UTC)
YMM vs ZMM
[edit]In the Hardware section is sais: "Intel's AVX SIMD instructions now process 256 bits of data at once", isn't it deprecated, and now in 512 bits? — Preceding unsigned comment added by Themarina.m (talk • contribs) 14:44, 13 August 2017 (UTC)
.NET SIMD linked library has been deprecated
[edit]In the Software section the Microsoft.Bcl.Simd library is linked, but it has been deprecated in favour of the System.Numerics.Vectors NuGet package. Frabert (talk) 09:44, 12 January 2018 (UTC)frabertFrabert (talk) 09:44, 12 January 2018 (UTC)
Leaning on parallelism without much thought
[edit]Thus, such machines exploit data level parallelism, but not concurrency: there are simultaneous (parallel) computations, but only a single process (instruction) at a given moment.
Why do we have this automatic boner for parallelism? (In my experience, "exploit" is almost always parallelism's preceding verb; bonerville by any other name would smell as sweet.)
What we really need to be saying is that in cases where regimentation is natural (or can be imposed upon the data), SIMD allow fewer instructions to process more data—this being important if performance (in either the time or energy domain or both) is architecturally rate limiting.
There's another meaning of parallelism where you try to avoid creating false dependencies: every operation with its arguments available is free to dispatch. In this regime you don't end up at low resource utilization waiting for a slow boat. That's the notion of parallelism which is closer to the bonerific free lunch. But that glorious notion of parallelism has nothing to do with SIMD at all. It just happens that modest SIMD (registers in the 256-bit range) is compatible with superscalar OOO.
I know I'm not in the majority here, but parallelism is not an elementary explanatory term if you make any effort to think clearly.
The actual trade-off here is that in cases where you can sufficiently regiment the data (which might be painful for the programmers or other computational units within the system), you decrease the instruction dispatch rate per volume of data processed.
Note also that we didn't originally classify fetching AX and then processing the two halves separately as AH and AL as a SIMD fetch operation. Chunking optimizations are also a slightly different concept. You can also feed arithmetic units in parallel with superscalar dispatch (macro-op fusion could also take you there on the internal execution path, even if it's not explicit in the ISA). Chunking is inherent in almost all modern cache design.
The really core idea of SIMD is explicit data regimentation, and the driving motivation for this is to conserve orchestration resources at some level, because orchestration has fundamental costs and limits. — MaxEnt 22:07, 26 April 2018 (UTC)
Replace diagrams to ones that don't use an Elvish font
[edit]The two diagrams in the article for some reason use a "tengwar" font, which makes no sense for an article on cpu instruction handling. Can anyone generate new diagrams that use normal fonts instead of using Lord of the Rings typography? Pomax (talk) 06:11, 28 December 2018 (UTC)
Mention Groq in Commercial applications section?
[edit]In conjunction with MaxDZ8's comment regarding stream processing, it seems this page should mention Groq as a provider of SIMD/stream processing solutions Groq Technology. Perhaps in the "Commercial applications" section? Cl.taurus (talk) 18:08, 23 April 2021 (UTC)
Page quality is awful (in the summary)
[edit]See also https://wiki.riteme.site/wiki/Talk:Vector_processor#deeper_problems_with_all_associated_articles
This page is painfully factually misleading (turns out only to be in the intro, the rest is pretty good)
- SIMD is described as a type of processor (wrong, it's a type of ALU)
- SIMT is described as a type of multi threading by linking to the *software* multithreading page (wrong).
- Vector Processing is described as a class of SIMD (wrong).
- Vector processing is further described as being only single issue execution to pipelines (wrong on two counts)
and that's in just the first few paragraphs. i'm sufficiently shocked i actually stopped reading any further. the entire page will need a complete review, i'll try to make some notes over the next few days / weeks and provide some notes and refs where i can. Lkcl (talk) 05:03, 14 June 2021 (UTC)
ok whew, i couldn't help but continue reading, in trepidation, and actually it's pretty good. the only other thing i could see on a first readthrough is "One of the recent processors to use vector processing wad the Cell Processor". which will only be true if it was Predicated SIMD. which is unlikely given it was 2003. there was a drive around this time to use the word "Vector" a lot, and to copy features of Vector Processors. i'll need to read the Cell SPE ISA Manual to confirm it. sigh :) summary: other than the intro, the page is pretty good. Lkcl (talk) 05:16, 14 June 2021 (UTC)
ah maaan, even the diagrams are misleading: they have no reference to the fact that it is simply the ALU which has SIMD, even referring to "SIMD CPU". sigh. a 2nd readthrough there's additional references to "SIMD Processor" as well which is additionally misleading. Lkcl (talk) 07:47, 14 June 2021 (UTC)
gaah this is really tough to find accurate diagrams. the quality of course lectures on SIMD is dreadfully unclear. here, finally, a good one https://d3i71xaburhd42.cloudfront.net/d40bf0b4b8e5cd2f337020ecacf487154c28d4eb/6-Figure4-1.png and another one https://www.google.com/imgres?imgurl=https://ars.els-cdn.com/content/image/1-s2.0-S074373151830813X-gr2.jpg&imgrefurl=https://www.sciencedirect.com/science/article/pii/S074373151830813X&docid=0szvp7E9KPOlpM&tbnid=we4ZQ0R6asH3FM&vet=1&source=sh/x/im
both those images clearly show that SIMD is a feature of the *ALU* i.e.:
- the register comes from the register file (as a single quantity, from a single instruction)
- the register (QTY 1) is split into equal-sized fragments (QTY N)
- each fragment is directed synchronously to either different parts of the same ALU *OR* in some cases to multiple but functionally identical ALUs (this is the part that's dreadfully unclear in online Academic coursework)
- the result fragments (QTY N) are collated synchronously into their corresponding positions into a single result (QTY 1)
- the single result (QTY 1) is sent to the register file.
it's called single instruction multiple data for a reason.
it really is not all that different from scalar processing which is why it is so seductive. dead easy for the hardware engineer, total bitch for programmers. Lkcl (talk) 08:11, 14 June 2021 (UTC)
excellent, found one at last that's not paywalled. https://csdl-images.computer.org/trans/td/2018/01/figures/mcall2-2746081.gif https://www.computer.org/csdl/journal/td/2018/01/08017591/13rRUwbs1Sl argh no, actually even that is miscategorised! it's SIMT! or, could be confused for SIMT, because each ALU has an associated register file. why is this so hard for people to get right and also clear?? gah! :) Lkcl (talk) 10:15, 14 June 2021 (UTC)
- In the 1966 paper where Flynn introduces the term "SIMD" (along with "SISD", "MISD", and "MIMD"), he gives as examples of SIMD processors [[ILLIAC IV#SOLOMON|SOLOMON] and ILLIAC IV,[1] saying that
There are n universal execution units each with its own access to operand storage. The single instruction stream acts simultaneously on the n operands without using confluence techniques. Increased performance is gained strictly by using more units. Communication between units is restricted to a predetermined neighborhood pattern and must also proceed in au niversal, uniform fashion [Fig. 7(a)].
- In a later 1972 paper, he describes three types of SIMD processors - array processors (like SOLOMON and ILLIAC IV), pipelined processors (not exactly the same as what's now called pipelining; see the diagram), and associative processors (where the units doing the processing for a given instruction are selected by pattern-matching).[2]
- This isn't just an ALU, this is a system, with multiple elements, each with its own ALU and, at least in the case of SOLOMON, its own memory, all acting under the direction of a single control unit fetching an instruction and executing it by getting all processor units to perform the same operation.
- Modern machines referred to as SIMD are typically different - the multiple elements are just parts of the ALU, operating on multiple data items in a single word of some length (possibly larger than the "native" word length of the machine). That's not the only type of SIMD in existence, however. Guy Harris (talk) 10:53, 14 June 2021 (UTC)
- yes, true / agreed. however none of this is mentioned / referenced in the article. i'm actually going to be implementing SIMD that's not pipelined at all, but is multiple Finite State Machines, with a post-termination synchronisation phase for example.
- SOLOMON sounds very similar to the Aspex ASP i worked with back in 2004: although the company called it Massive wide SIMD (QTY 4096 2-bit ALUs), the fact that each of those 4096 ALUs had a 256 byte independent CAM plus another 128 bits of DRAM made it more like the SIMT of today.
- bottom line is, both this page and the vector processor one are a bit of a mess as far as categorisation is concerned, and need a concerted effort and some clear diagrams. Duncan's Taxonomy is clearer than Flynn's but even that missed SIMT. Lkcl (talk) 12:06, 14 June 2021 (UTC)
- also the pattern-matching sounds remarkably similar to what is now called "Predicated SIMD" which is really useful and should be referenced, good find Lkcl (talk) 15:57, 15 June 2021 (UTC)
- hey guy yeah those diagrams in fig 5 are really clear. antiquated but clear. (A) is SIMT. (B) is standard midetn SIMD if you upgrade it slightly and add Register on the right rather than Memory (C) is predicated SIMD. the question is, why the hell are these missing from the Flynn Taxonomy page and associated Category/Template?? Lkcl (talk) 20:26, 15 June 2021 (UTC)
- ahh maaaan, i took a look at Flynn's Taxonomy on SIMD, it makes a false claim that NVIDIA's SIMT is novel and that Flynn's Type (1) SIMD is not SIMT. https://wiki.riteme.site/wiki/Flynn%27s_taxonomy#Single_instruction_stream,_multiple_data_streams_(SIMD) the more investigation the more alarming this is getting. Lkcl (talk) 20:36, 15 June 2021 (UTC)
- guy i've started adding first to Flynn's taxonomy, the refs here are invaluable, thank you Lkcl (talk) 21:57, 16 June 2021 (UTC)
- ok i am reasonably happy with Flynn's taxonomy and the next target will be the SIMD page itself. i am pushing my luck by doing so much editing so will do a minimum of corrections here. Lkcl (talk) 18:00, 17 June 2021 (UTC)
References
- ^ Flynn, Michael J. (December 1966). "Very high-speed computing systems" (PDF). Proceedings of the IEEE. 54 (12): 1901–1909. doi:10.1109/PROC.1966.5273.
- ^ Flynn, Michael J. (September 1972). "Some Computer Organizations and Their Effectiveness" (PDF). IEEE Transactions on Computers. C-21 (9): 948–960. doi:10.1109/TC.1972.5009071.
notes on avx512
[edit]section for collating research on AVX512 SIMD problems
- https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html#512-bit-integer-simd-avx-512
- https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling/
- https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/
- not avx512 but 256 bit, describes evidence of a "warmup" feature in the top 128 bits https://www.agner.org/optimize/blog/read.php?i=628
Requested move 21 January 2022
[edit]- The following is a closed discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review after discussing it on the closer's talk page. No further edits should be made to this discussion.
The result of the move request was: Moved, with the redirect from the acronym remaining in this instance. (And apologies for the mix up with the SISD move earlier, that is now fixed). — Amakuru (talk) 16:26, 28 January 2022 (UTC)
SIMD → Single instruction, multiple data – Propose to move the article title to the full name as it immediately becomes more descriptive and easily recognizable for users somewhat familiar with computer architecture, by recognizing the words "instruction" and "data". However, this may however conflict with WP:TITLEFORMAT. The same change has been proposed for SISD, SIMD, MISD, MIMD). Sauer202 (talk) 13:24, 21 January 2022 (UTC)
- Weak support could be a DAB with Scottish Index of Multiple Deprivation, the 1st and most Google results are for the Scottish meaning though that's probably biased due to my location (England). However this does get many more views (8,870) compared with 217[[2]] for the Scottish meaning. Crouch, Swale (talk) 10:57, 23 January 2022 (UTC)
- Support For consistency, as I see that MIMD has been moved. But I suggest leaving SIMD as a redirect to this article because of the four classes (SISD, SIMD, MISD, MIMD) SIMD is the acronym that is still widely used (for SIMD instructions etc). JonH (talk) 15:38, 28 January 2022 (UTC)
- For "SIMD" "instruction" I get 1,010,000 Google hits, and for "SIMD" "scotland" I get 171,000. JonH (talk) 16:04, 28 January 2022 (UTC)
- Comment Currently SISD has been moved to Single instruction, multiple data. I assume that is a mistake that will be corrected. JonH (talk) 15:54, 28 January 2022 (UTC)