Such a good read. I actually went back though it the other day to steal the searching for the least common byte idea out to speed up my search tool https://github.com/boyter/cs which when coupled with the simd upper lower search technique from fzf cut the wall clock runtime by a third.
There was this post from cursor https://cursor.com/blog/fast-regex-search today about building an index for agents due to them hitting a limit on ripgrep, but I’m not sure what codebase they are hitting that warrants it. Especially since they would have to be at 100-200 GB to be getting to 15s of runtime. Unless it’s all matches that is.
Is IRIX experiencing a hobbyist revival or something? This is the second IRIX reference I’ve seen on here in the past two days, and there was a submission a day or two ago (c.f. a Voodoo video card?) as well. I haven’t personally encountered IRIX in the wild since a company I worked at in 2003. I suppose SGI has always had a cool factor but it’s unusual seeing it come up in a cluster of mentions like this.
I was using ripgrep once and it had a bug that led me downa terrifying rabbit hole - I can't recall what it was but it involved not being able to find text that absolutely should have been there.
Eventually I was considering rebuilding the machine completely but for some reason after a very long time digging deep into the rabbit hole I tried plain old grep and there was the data exactly where it should have been.
So it's such a vague story but it was a while back - I don't remember the specifics but I sure recall the panic.
Sometimes I forget that some of the config files I have for CI in a project are under a dot directory, and therefore ignored by rg by default, so I have to repeat the search giving the path to that config files subdirectory if I want to see the results that are under that one (or use some extra flags for rg to not ignore dot directories other than .git)
Was the file in a .gitignore by any chance? I've got my home folder in git to keep track of dot/config files and that always catches me out. Really dislike it defaulting to that ignoring files that are ignored by git.
I had that happen too recently… Basically rg x would show nothing but grep -r x showed the lines for any x. Tried multiple times with different x, then I kept using grep -r at that time. After a few days, I started using rg again and it worked fine but now I tend to use grep -r occasionally too to make sure.
Next time that happens try looking at the paths, adding a pair of -u, or running with --debug: by default rg will ignore files which are hidden (dotfiles) or excluded by ignore files (.gitignore, .ignore, …).
The fun part is it is pretty easy to “rewrite” ripgrep in rust, because burntsushi wrote it as a ton of crates which you can reuse. So you can reuse this to build your own with blackjack and hookers.
> > IMO, as long as the time differences remain small, I'm totally okay with ripgrep being slower by default on smaller corpora if it means being a lot faster by default on bigger corpora.
Note that this is the author of ripgrep replying to a third party commenter asking whether rg isn’t already lightweight, and comparing the two under various possible definitions of “lightweight”.
I don’t understand when people typeset some name in verbatim, lowercase, but then have another name for the actual command. That’s confusing to me.
Programmers are too enarmored with lower-case names. Why not Ripgrep? Then I can surmise that there might not be some program ripgrep(1) (there might be a shorter version), since using capital letters is not traditional for CLI programs.
> Stacked Git, StGit for short, is an application for managing Git commits as a stack of patches.
> ... The `stg` command line tool ...
Now, I’ve been puzzled in the past when inputing `stgit` doesn’t work. But here they call it StGit for short and the actual command is typeset in verbatim (stg(1) would have also worked).
How would you capitalise it? RipGrep? RIPGrep? You’d need to pick a side and lose the pun. (And of course grep itself would need to be GReP if we took it all the way)
It seems to me that `rg` is the number one most important part that enables LLMs to be smart agents in a codebase. Who would have thought that a code search tool would enable AGI?
There was this post from cursor https://cursor.com/blog/fast-regex-search today about building an index for agents due to them hitting a limit on ripgrep, but I’m not sure what codebase they are hitting that warrants it. Especially since they would have to be at 100-200 GB to be getting to 15s of runtime. Unless it’s all matches that is.
It’s fast even on a 300mhz Octane.
Eventually I was considering rebuilding the machine completely but for some reason after a very long time digging deep into the rabbit hole I tried plain old grep and there was the data exactly where it should have been.
So it's such a vague story but it was a while back - I don't remember the specifics but I sure recall the panic.
Sometimes I forget that some of the config files I have for CI in a project are under a dot directory, and therefore ignored by rg by default, so I have to repeat the search giving the path to that config files subdirectory if I want to see the results that are under that one (or use some extra flags for rg to not ignore dot directories other than .git)
I still use it but Ive never trusted it fully since then I double check.
See https://github.com/BurntSushi/ripgrep/blob/master/GUIDE.md#a... for the details.
I think riggrep will not search UTF-16 files by default. I had some such issue once at least.
https://reddit.com/r/rust/comments/1fvzfnb/gg_a_fast_more_li...
Also something-something about dependencies (a Rust staple): https://www.reddit.com/r/rust/comments/1fvzfnb/gg_a_fast_mor...
I don’t understand when people typeset some name in verbatim, lowercase, but then have another name for the actual command. That’s confusing to me.
Programmers are too enarmored with lower-case names. Why not Ripgrep? Then I can surmise that there might not be some program ripgrep(1) (there might be a shorter version), since using capital letters is not traditional for CLI programs.
Look at Stacked Git:
https://stacked-git.github.io/
> Stacked Git, StGit for short, is an application for managing Git commits as a stack of patches.
> ... The `stg` command line tool ...
Now, I’ve been puzzled in the past when inputing `stgit` doesn’t work. But here they call it StGit for short and the actual command is typeset in verbatim (stg(1) would have also worked).
You may be able to download ripgrep, and execute it (!), but god forbid you can create an alias in your shell in a persistant manner.