From anton Tue Oct 5 20:23:52 1999 X-newsreader: xrn 9.02 Sender: anton@a0.complang.tuwien.ac.at (Anton Ertl) From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Subject: Re: Q: Merced a flop or not? Path: a0.complang.tuwien.ac.at!anton Newsgroups: comp.arch Distribution: Followup-To: References: <37F32306.45496655@klagges.com> <7t0da1$55u@dfw-ixnews16.ix.netcom.com> Organization: Institut fuer Computersprachen, Technische Universitaet Wien Keywords: In article , rstacpoo@utas.edu.au (Stackers) writes: >Serious question. Has anyone done a coprehensive review of the IA64 >manual? Is it on the web? Any info appreciated. Here's my review: It's written well enough (but then I am pretty familiar with the techniques they are referring to). The only shortcoming I found when reading was that the special registers are explained in one section without saying much about their use, and later, when the features are explained, the registers are only referred to by their (pretty cryptic) names, and I was left to wonder what they are talking about. If you want a review of the IA64 (user level, the published manual does not explain system-level stuff), here's my take (there are probably also earlier postings in this vein): It's basically a RISC with lots of special features: flag registers (Power/PPC also have this, but somewhat differently) predicated execution (ARM, HPPA) parallel-and/or 128 integer registers (AMD29K had 192) 128 FP registers integer register stack with automatic saving and restoring (more flexible than SPARCs register windows, but less flexible than AMD29K's mechanism) architectural support for moving loads speculatively up above branches (I wonder if they read our paper http://www.complang.tuwien.ac.at/papers/ertl-krall94cc.ps.Z, which proposes a similar mechanism) support for run-time memory disambiguation (moving loads speculatively up above stores) branch registers for indirect jumps (Power/PPC has two, IA64 8) architectural support for counted loops (Power/PPC also has this). rotating register files for register renaming in loops the combination of loop support, rotating register files, and predication allows compact prologues and epiloges for modulo-scheduled loops the architecture has 41-bit instructions, with three instructions per 128-bit bundle; five bits per bundle are used for additional decoding info (in particular, program-specified instruction-group boundaries). Bundles have significance as jump targets, but little significance for instruction grouping (it is possible to specify a group boundary within a bundle). ... (probably some features I forgot) The overall picture I get is: Most of the features they incorporate look good today by themselves, but the combination is quite complex and somehow does not feel well-rounded. And I wonder how good these ideas will look in the future; e.g., there is a technique for run-time disambiguation (http://domino.watson.ibm.com/library/CyberDig.nsf/a3807c5b4823c53f85256561006324be/12a089effaf3a918852565930072a0db?OpenDocument) that does not need architectural support (but it needs more loads). I am especially sceptical about the modulo scheduling stuff; modulo scheduling works great for simple loops (including loops that have been made simple by if-conversion), but hopefully we will see some technique that can schedule more complex control structures in the future, and the architectural support for modulo scheduling is probably not flexible enough to be useful there. While the compiler techniques for exploiting the special features of IA64 exist, putting them all in one compiler will make the compiler quite unwieldy and probably pretty specialized for the IA64. I.e., compilers for not-so-popular languages (i.e., nowadays anything but C, C++, FORTRAN, Java) will probably not make use of these features, because 1) it's a lot of work 2) it's not retargetable and 3) these features probably don't fit the typical usage of the language. Retargetable compilers for popular languages also will suffer from 1) and 2). The architecture is much less VLIW than I expected, the grouping information is the only thing that looks remotely like VLIW. I wonder why they bothered with the grouping information and the bundling restrictions at all. It may save a cycle upon a branch mispredict; but why not generate this information in hardware on loading the code into the instruction cache? - anton -- M. Anton Ertl Some things have to be seen to be believed anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen http://www.complang.tuwien.ac.at/anton/home.html