I will admit I was initially surprised Matt was not already familiar with this behavior given his reputation. I remember discovering it while playing with llvm intermediate representation 10 years ago in college. I would never have considered myself very knowledgeable about modern compilers, and have never done any serious performance work. In that case it had solved a recursion to a simple multiplication, which completely surprised me. The fact that Matt did not know this makes me think this pass may only work on relatively trivial problems that he would never have written in the first place, and therefore never have witnessed the optimization.
If you don’t rely on IDE features or completion plugins in an editor like vim, it can be easier to navigate tightly coupled complexity if it is all in one file. You can’t really scan it or jump to the right spot as easily as smaller files, but in vim searching for the exact symbol under the cursor is a single character shortcut, and that only works if the symbol is in the current buffer. This type of development works best for academic style code with a small number (usually one or two) experts that are familiar with the implementation, but in that context it’s remarkably effective. Not great for merge conflicts in frequently updated code though.
Part of the issue is that it suggests that the code had a spaghettified growth; it is neither sufficient nor necessary but lacking external constraints (like an entire library developed as a single c header) it suggests that code organisation is not great.
I'm once again surprised at GCC being slower than clang. I would have thought that GCC, which had a 20? year head start would've made faster code. And yet, occasionally I look into the assembly and go "what are you doing?" And the same flags + source into clang is better optimized or uses better instructions or whatever. One time it was bit extraction using shifts. Clang did it in 2 steps: shift left, shift right. GCC did it in 3 I think? I think it maybe shifted right first or maybe did a logical instead of arithmetic and then sign extended. Point is, it was just slower.
Compiler know-how and resources available during compilations made very signicant progress between gcc and LLVM/clang era.
gcc was and is an incredible achievement, but it is traditionally considered difficult to implement many modern compiler techqniques in it. It's at least unpleasant, let's put it this way.
Graph coloring is NP-hard so it would be very difficult to replace it with an O(1) algorithm.
If you mean graph coloring restricted to planar graphs, yes it can always be done with at most 4 colors. But it could still be less, so the answer is not always the same.
(I know it was probably not a very serious comment but I just wanted to infodump about graph theory.)
I'm actually surprised that gcc doesn't do this! If there's one thing compilers do well is pattern match on code patterns and replace with more efficient ones; just try pasting things from Hacker's Delight and watch it always canonicalise it to the equivalent, fastest machine code.
This particular case isn't really due to pattern matching -- it's a result of a generic optimization that evaluates the exit value of an add recurrence using binomial coefficients (even if the recurrence is non-affine). This means it will work even if the contents of the loop get more exotic (e.g. if you perform the sum over x * x * x * x * x instead of x).
Doing something like that with a pattern is obvious, but also useless, as it will catch very limited cases. The example presented, is known there is a closed form (it’s believed Gauss even discovered it being 6 yo). I’m sure this optimization will catch many other things, so is not trivial at all.
10 years is not a lot. Is almost “yesterday” things being done in a field 10 years old, can still surprise experts in the field. With 30+ years experience I still find relatively new things, that are maybe 15 yo.
In topics like compiler optimization, is not like there are many books which describe this kind of algorithms.
Those are just basic and essential optimizations, nothing too surprising here.
The sum of integers is actually a question I ask developers in interviews (works well from juniors to seniors), with the extra problem of what happens if we were to use floating-point instead of integers.
To those who don't know about compiler optimisation, the replacement with a closed form is rather suprising I'd say, especially if someone with Matt Godbolt's experience of all people is saying it is surprising.
Also this series is targeted towards more of a beginner audience to compilers, thus its likely to be suprising to the audience, even if not to you.
Gauss supposedly did it when he was 7. The hardest part for the compiler is figuring out that you have a loop that computes that sum and does nothing else important.
It's something we saw in highschool, I would expect anyone with a CS degree to recognize this optimization.
I barely know anything about compiler optimization, so I have no clue whether a compiler applying this optimization is surprising or something trivial.
“basic and essential” are interesting ways to describe the field of compiler optimization research.
Are you suggesting that the discovery and implementation of SCEV in LLVM is basic and essential? Or that summing integers in a range is basic and essential?
Since GCC is lacking such an essential optimization, you should consider have one of your junior interviewee contribute this basic optimization mainline.
Im curious what exactly you ask here. I consider myself to be a decent engineer (for practical purposes) but without a CS degree, and I might likely have not passed that question.
I know compilers can do some crazy optimizations but wouldn't have guessed it'll transform something from O(n) to O(1). Having said that, I dont still feel this has too much relevance to my actual job for the most part. Such performance knowledge seems to be very abstracted away from actual programming by database systems, or managed offerings like spark and snowflake, that unless you intend to work on these systems this knowledge isn't that useful (being aware they happen can be though, for sure).
He thinks it makes him look clever, or more likely subtlety wants people to think "wow, this guy thinks something is obvious when Matt Godbolt found it surprising".
This kind of question is entirely useless in an interview. It's just a random bit of trivia that either a potential hire happen to have come across, or happens to remember from math class.
Whether they get the question exactly right and can pinpoint the specific compiler passes or algebraic properties responsible for reductions like this is totally irrelevant and not what you’re actually looking for or asking about. It’s a very good jumping point for a conversation about optimization and testing whether they’re the type of developer who has ever looked at the assembly produced in their hotpath or not.
Anyone who dumbly suggests that loops in source code will always result in loops in assembly doesn’t have a clue. Anyone who throws their hands up and says, “I have no idea, but I wonder if there’s some loop invariant or algebraic trick that can be used to optimize this, let’s think about it out loud for a bit” has taken a compiler class and gets full marks. Anyone who says, “I dunno, let’s see what godbolt does and look through the llvm-opt pane” gets an explicit, “hire this one” in the feedback to the hiring manager.
It’s less about what they know and more about if they can find out.
So in other words, it isn't "basic and essential optimizations" that you would expect even a junior engineer to know (as your comment implies), but a mechanism to trigger a conversation to see how they think about problems. In fact, it sounds like something you wouldn't expect them to know.
To provide the solution to the second part of the question, there is no closed-form solution. Since floating point math is not associative, there’s no O(1) optimization that can be applied that preserves the exact output of the O(n) loop.
Technically there is a closed form solution as long as the answer is less than 2^24 for a float32 or 2^53 for a float64, since below those all integers can be represented fully by a floating point number, and integer addition even with floating point numbers is identical if the result is below those caps. I doubt a compiler would catch that one, but it technically could do the optimisation and have the exact same bit answer. If result was intialised to a non-integer number this would not be true however of course.
What type of positions are you interviewing for? Software development is a big tent and I don't think this would be pertinent in a web dev interview, for example.
I’m pretty sure making an algorithm that converts loops to close forms (I’m sure it detects much more than just a summation) is a little bit complicated.
Maybe you have much more experience than Mr Godbolt in compiliers.
Nothing is surprising once you know the answer. It takes some mental gymnastics to put yourself in someone else's shoes before they discovered it and thus making it less "basic".
https://github.com/llvm/llvm-project/blob/release/21.x/llvm/...
https://github.com/llvm/llvm-project/blob/release/21.x/llvm/...
gcc was and is an incredible achievement, but it is traditionally considered difficult to implement many modern compiler techqniques in it. It's at least unpleasant, let's put it this way.
https://en.wikipedia.org/wiki/Wilf%E2%80%93Zeilberger_pair
If you mean graph coloring restricted to planar graphs, yes it can always be done with at most 4 colors. But it could still be less, so the answer is not always the same.
(I know it was probably not a very serious comment but I just wanted to infodump about graph theory.)
This kind of optimization, complete loop removal and computing the final value for simple math loops, is at least 10 years old.
Seems like the author is both surprised and delighted with an optimization they learned of today. Surely you’ve been in the same situation before.
In topics like compiler optimization, is not like there are many books which describe this kind of algorithms.
The sum of integers is actually a question I ask developers in interviews (works well from juniors to seniors), with the extra problem of what happens if we were to use floating-point instead of integers.
Also this series is targeted towards more of a beginner audience to compilers, thus its likely to be suprising to the audience, even if not to you.
I barely know anything about compiler optimization, so I have no clue whether a compiler applying this optimization is surprising or something trivial.
https://github.com/llvm/llvm-project/blob/release/21.x/llvm/...
“basic and essential” are interesting ways to describe the field of compiler optimization research.
Are you suggesting that the discovery and implementation of SCEV in LLVM is basic and essential? Or that summing integers in a range is basic and essential?
I know compilers can do some crazy optimizations but wouldn't have guessed it'll transform something from O(n) to O(1). Having said that, I dont still feel this has too much relevance to my actual job for the most part. Such performance knowledge seems to be very abstracted away from actual programming by database systems, or managed offerings like spark and snowflake, that unless you intend to work on these systems this knowledge isn't that useful (being aware they happen can be though, for sure).
This kind of question is entirely useless in an interview. It's just a random bit of trivia that either a potential hire happen to have come across, or happens to remember from math class.
Anyone who dumbly suggests that loops in source code will always result in loops in assembly doesn’t have a clue. Anyone who throws their hands up and says, “I have no idea, but I wonder if there’s some loop invariant or algebraic trick that can be used to optimize this, let’s think about it out loud for a bit” has taken a compiler class and gets full marks. Anyone who says, “I dunno, let’s see what godbolt does and look through the llvm-opt pane” gets an explicit, “hire this one” in the feedback to the hiring manager.
It’s less about what they know and more about if they can find out.
For you are essentials.
You and the juniors you hire must have a deeper knoledge than him.
I spend a lot of time looking at generated assembly and there are some more impressive ones.
It would be great if you shared it with the world like Matt does instead of being smug about it.
Maybe you have much more experience than Mr Godbolt in compiliers.