It's barely gaining adoption though. The lack of buzz is a chicken and egg issue for Mojo. I fiddled shortly with it (mainly to get it working some of my pythong scripts), and it was suprisingly easy. It'll shoot up one day for sure if Latner doesn't give up early on it.
Isn't the compiler still closed source? I and many other ML devs have no interest in a closed-source compiler. We have enough proprietary things from NVIDIA.
Use-cases like this are why Mojo isn't used in production, ever. What does Nvidia gain from switching to a proprietary frontend for a compiler backend they're already using? It's a legal headache.
Second-rate libraries like OpenCL had industry buy-in because they were open. They went through standards committees and cooperated with the rest of the industry (even Nvidia) to hear-out everyone's needs. Lattner gave up on appealing to that crowd the moment he told Khronos to pound sand. Nobody should be wondering why Apple or Nvidia won't touch Mojo with a thirty-nine and a half foot pole.
Kernels now written in Mojo were all in hand written in MLIR like in this repo. They made a full language because that's not scalable, a sane language is totally worth it. Nvidia will probably end up buying them in a few years.
Pipeline: JAX -> XLA -> CUDA Tile -> ThunderKittens/ThunderMittens/HipKittens source code, then compile like normal for each GPU.
You'll need a bespoke scheduling IR to drive the CUDA Tile IR for output to ThunderKittens/ThunderMittens/HipKittens, since the main differences between them are size/scheduling related.
Wrap the final step in an optimizer loop to find the best sizes/schedules; profit.
(I have not looked at the CUDA Tile design specifically, but if it isn't suitable, designing an IR specifically to do this is still a good idea IMO.)
I did it in three. I selected it in your comment, and then had to hit "more" to get to the menu to ask Google about it, which brought me to https://www.google.com/search?q=MLIR which says: MLIR is an open-source compiler infrastructure project developed as a sub-project of the LLVM project. Hopefully
Get better at computers and stop needing to be spoon-fed information, people!
In this day and age, asking questions about what something is is a minefield of “just ask AI” and “You should know this”. Let’s stop putting down people who ask questions and root out those that have shitty answers.
I get why it feels frustrating when someone snaps "just google it." Nobody likes feeling dumb. That said, there’s a meaningful difference between asking a genuine question and demanding that every discussion be padded to accommodate readers who won’t even type four letters into a search bar. Expecting complete spoon-feeding in technical threads isn’t curiosity; it’s a refusal to engage. Learning requires participation.
You cannot explain everything to everyone all the time. Besides, this is not even a paper.
Sometimes you are not the target audience and have to put some words into Google.
From Wikipedia: The name "Multi-Level Intermediate Representation" reflects the system’s ability to model computations at various abstraction levels and progressively lower them toward machine code.
And yet you didn’t tell us what it stands for, just what it is. The person you’re responding to was specifically talking about finding out what it stands for
Will be interesting to see if Nvidia and other have any interest & energy getting this used by others, if there actually is an ecosystem forming around it.
Google leading XLA & IREE, with awesome intermediate representations, used by lots of hardware platforms, and backing really excellent Jax & Pytorch implementations, having tools for layout & optinization folks can share: they really build an amazing community.
There's still so much room for planning/scheduling, so much hardware we have yet to target. RISC-V has really interesting vector instructions, for example, and it seems like there's so much exploration / work to do to better leverage that.
Nvidia has partners everywhere now. Nvlink is used by Intel, AWS Tritanium, others. Yesterday the Groq exclusive license that Nvidia paid to give to Groq?! Seeing how and when CUDA Tiles emerges: will be interesting. Moving from fabric partnerships, up up up the stack.
For NVidia it suffices this is a Python JIT allowing programming CUDA compute kernels directly in Python instead of C++, yet another way how Intel and AMD, alongside Khronos APIs, lag behind in great developer experiences for GPU compute programming.
Ah, and Nsight debugging also supports Python CUDA Tiles debugging.
Second-rate libraries like OpenCL had industry buy-in because they were open. They went through standards committees and cooperated with the rest of the industry (even Nvidia) to hear-out everyone's needs. Lattner gave up on appealing to that crowd the moment he told Khronos to pound sand. Nobody should be wondering why Apple or Nvidia won't touch Mojo with a thirty-nine and a half foot pole.
Julia, Python GPU JITs work great on Windows, and many people only get Windows systems as default at work.
You'll need a bespoke scheduling IR to drive the CUDA Tile IR for output to ThunderKittens/ThunderMittens/HipKittens, since the main differences between them are size/scheduling related.
Wrap the final step in an optimizer loop to find the best sizes/schedules; profit.
(I have not looked at the CUDA Tile design specifically, but if it isn't suitable, designing an IR specifically to do this is still a good idea IMO.)
I lost count at five or six. Define your acronyms on first use, people.
Get better at computers and stop needing to be spoon-fed information, people!
Google leading XLA & IREE, with awesome intermediate representations, used by lots of hardware platforms, and backing really excellent Jax & Pytorch implementations, having tools for layout & optinization folks can share: they really build an amazing community.
There's still so much room for planning/scheduling, so much hardware we have yet to target. RISC-V has really interesting vector instructions, for example, and it seems like there's so much exploration / work to do to better leverage that.
Nvidia has partners everywhere now. Nvlink is used by Intel, AWS Tritanium, others. Yesterday the Groq exclusive license that Nvidia paid to give to Groq?! Seeing how and when CUDA Tiles emerges: will be interesting. Moving from fabric partnerships, up up up the stack.
Ah, and Nsight debugging also supports Python CUDA Tiles debugging.
https://developer.nvidia.com/blog/simplify-gpu-programming-w...
this is nicely illustrated by this recent article:
https://news.ycombinator.com/item?id=46366998