| Lanai, the mystery CPU architecture in LLVM. 
 
 Disclaimer: I have had access to some confidential information  about some of the matter discussed in this page. However, everything  written here is derived form publicly available sources, and references  to these sources are also provided.
 
 
 https://q3k.org/lanai.html
 https://q3k.org/lanai.html
 
 Some of my recent long-term projects revolve around a little known  CPU architecture called 'Lanai'. Unsurprisingly, very few people have  heard of it, and even their Googling skills don't come in handy. This  page is a short summary of what I know, and should serve as a reference  for future questions.
 
 Myricom & the origins of Lanai  Myricom is a hardware company founded in 1994. One of their early  products was a networking interface card family and protocol, Myrinet. I  don't know much about it, other than it did some funky stuff with  wormhole routing.
 
 As part of their network interface card design, they introduced data  plane programmability with the help of a small RISC core they named  LANai. It originally ran at 33MHz, the speed of the PCI bus on which the  cards were operating. These cores were quite well documented on the  Myricom website, seemingly with the end-user programmability being a  selling point of their devices.
 
 It's worth noting that multiple versions of LANai/Lanai have been  released. The last publicly documented version on the old Myricom  website is Lanai3/4. Apart from the documentation, sources for a  gcc/binutils fork exist to this day on Myricom's Github.
 
 At some point, however, Myricom stopped publicly documenting the  programmability of their network cards, but documentation/SDK was still  available on request. Some papers and research websites actually contain  tutorials on how to get running with the newest versions of the SDK at  the time, and even document the differences between the last documented  Lanai3/4 version and newer releases of the architecture/core.
 
 This closing down of the Lanai core documentation by Myricom didn't  mean they stopped using it in their subsequent cards. The core made its  way into their Ethernet offerings (after Myrinet basically died), like  their 10GbE network cards. You can easily find these 10G cards on eBay,  and they even have the word 'Lanai' written on their main ASIC package.  Even more interestingly, Lanai binaries are shipped with Linux firmware  packages, and can be chucked straight into a Lanai disassembler (eg. the  Myricom binutils fork's objdump).
 
 Technical summary of Lanai3/4
      Here's a sample of Lanai assembly:32 registers, most of them general purpose, with special  treatment for R0 (all zeroes), R1 (all ones), R2 (the program counter),  R3 (status register), and some registers allocated for mode/context  switching.4-stage RISC-style pipeline: Calculate Address, Fetch, Compute, MemoryDelay slot based pipeline hazard resolutionNo multiplication, no division. It's meant to route packets, not crunch numbers.The world's best instruction mnemonic: PUNT, to switch between user and system contexts. 
 
 000000f8 <main>:       f8: 92 93 ff fc   st      %fp, [--%sp]       fc: 02 90 00 08   add     %sp, 0x8, %fp      100: 22 10 00 08   sub     %sp, 0x8, %sp      104: 51 80 00 00   or      %r0, 0x0, %r3      108: 04 81 40 01   mov     0x40010000, %r9      10c: 54 a4 08 0c   or      %r9, 0x80c, %r9      110: 06 01 11 11   mov     0x11110000, %r12      114: 56 30 11 11   or      %r12, 0x1111, %r12      118: 96 26 ff f4   st      %r12, -12[%r9]      11c: 96 26 ff f8   st      %r12, -8[%r9]      120: 86 26 13 f8   ld      5112[%r9], %r12  00000124 <.LBB3_1>:      124: 46 8d 00 00   and     %r3, 0xffff, %r13      128: 96 a4 00 00   st      %r13, 0[%r9]      12c: 01 8c 00 01   add     %r3, 0x1, %r3      130: e0 00 01 24   bt      0x124 <.LBB3_1>      134: 96 24 00 00   st      %r12, 0[%r9] The `add`/`sub`/`or` instruction have their destination on the right  hand side. `st` and `ld` are memory store and load instructions  respectively. Note the lack of 32-bit immediate load (instead a `mov`  and `or` instruction are used in tandem). That `mov` instruction isn't  real, either - it's a pseudo instruction for an `add 0, 0x40010000,  %r9`.  Also note the branch delay slot at address 134 (this instruction  gets executed even if the branch at 130 is taken). 
 The ISA is quite boring, and in my opinion that's a good thing. It  makes core implementations easy and fast, and it generally feels like  one of the RISC-iest cores I've dealt with. The only truly interesting  thing about it is its' dual-context execution system, but that  unfortunately becomes irrelevant at some point, as we'll see later.
 
 Google & the Lanai team  In the early 2010s, things weren't going great at Myricom. Due to  financial and leadership difficulties, some of their products got  canceled, and  in 2013, core Myricom engineers were bought out by Google, and they transferred the Lanai intellectual property rights with them.  The company still limps on, seemingly targeting the network security  and fintech markets, and even continuing to market their networking gear  as programmable, but Lanai is nowehere to be seen in their new designs.
 
 So what has Google done with the Lanai engineers and technology? The only thing we know is that  in 2016 Google implemented and upstreamed a Lanai target in LLVM, and that  it was to be used internally at Google. What is it used for? Only Google knows, and Google isn't saying.
 
 The LLVM backend targets Lanai11. This is quite a few numbers higher  than the last publicly documented Lanai3/4, and there's quite a few  differences between them:
 
 
      Lanai Necromancy  As you can tell by this page, this architecture intrigued me. The  fact that it's an LLVM target shipped with nearly every LLVM  distribution while no-one has access to hardware which runs the emitted  code is just so spicy. Apart from writing this page, I have a few other  Lanai-related projects, and I'd like to introduce them here:No more dual-context operation, no more PUNT instruction. The  compiler/programmer can now make use of nearly all registers from r4 to  r31.No more dual-ALU (R-R-R) instructions. This was obviously slow,  and was probably a combinatorial bottleneck in newer microarchitectural  implementations.Slightly different delay slot semantics, pointing at a new  microarchitecture (likely having stepped away from a classic RISC  pipeline into something more modern).New additional instruction format and set of accompanying  instructions: SPLS (special part-word load/store), SLI (special load  immediate), and Special Instruction (containing amongst others  popcount, of course). 
 
      If you're interested in following or joining these efforts, hop on to ##q3k on libera.chat.I'm porting Rust to Lanai11. I have a working prototype, which  required submitting some patches to upstream LLVM to deal with IR  emitted by rustc. This has been  upstreamed. My rustc patches are pending on...I'm implementing LLD support for Lanai. Google (in the LLVM  mailing list posts) mentions they use a binutils ld, forked off from the  Myricom binutils fork. I've instead opted to implement an LLD backend  for Lanai, which currently only supports the simplest relocations. I  haven't yet submitted a public LLVM change request for this, but this is  on my shortlist of things to do. I have to first talk to the  LLVM/Google folks on the maintenance plan for this.I've implemented a simple Lanai11 core in Bluespec, as part of my  qfc monorepo.  3-stage pipeline (merged addr/fetch stages), in-order. It's my first  bit of serious Bluespec code, so it's not very good. I plan on  implementing a better core at some point.I've implemented a small Lanai-based microcontroller,  qf105, which is due to be manufactured in 130nm as part of the OpenMPW5 shuttle. Which is, notably, sponsored by Google :). 
 In addition to my effort piecing together information about Lanai  and making use of it for my own needs, the TrueBit project also  used it as a base for their smart contract system (in which they implemented a Lanai interpreter in Solidity).
 
 Documentation  Useful resources, in no particular oder:
 
 
 |