I wanted to do this so bad back in like 2009. I archived MSN messenger chats and my texts and everything and I figured sometime in the future I'd be able to analyze them and maybe even train a chat bot to imitate me. But over the years I lost hard drives and cell phones and accidentally deleted stuff and yeah it didn't happen. Very cool this guy was able to do it
I used to have saving turned on for my MSN Messenger chats back around 2001-2004. I didn't lose them. 10-15 years later or so I had a look through them and the cringe was so powerful that I deleted them all anyway.
I have my msn logs along with a lot of old stuff in an ecrypted hard drive with a password I have forgotten. I am waiting for divine enlightment to remember the password, or quantum computers to finally come.
Weirdly glad to hear I am not the only one effectively having lost them.
I do some similar charting etc with telegram data dumps that you can still get from the "telegram lite" desktop app even though they removed the export functionality from the main app.
For removing noise you might want to look into TF-IDF instead of the manual method described in the post that I didn't understand. It basically looks for words common across the whole corpus as noise or ones that appear within a specific chat much higher than the whole dataset as interesting.
You can also do some fun stuff by finding phrases used asymmetrically eg more by one person in the convo than the other, or over time.
> 15 close friends, 50 regular contacts, 150 active acquaintances
To me meaningful is just the 15, and 15 close friends is a lot nowadays. Of all the people I know, there’s probably one person that gets close to this and they have a bit of a unstable personality. I don’t think OP‘s numbers are off, but I’m not sure what they mean by active. Is it just online chat, is it grab a beer on a Friday. To me it’s mostly the second. I find it highly subjective.
I don't understand how some people write hundreds of text/chat messages per day. I am communicating by talking to people almost 100% of non-work, most of the discussions are face to face, I write or receive a handful of texts or chats per week, maybe a dozen per month.
I find text messages impersonal and it also takes longer to communicate clearly what we need. There is so much lost. Even chats and emails for work are at risk of creating misunderstanding, especially because English is not the native language of most of my coworkers, all these adds to result in pretty low quality communication.
I do wanna do the same, but at times I fear about getting too aware from the insights of the analysis, I fear my opinions my faith might change in certain people.
I mostly text on Signal with disappearing messages so I wouldn't be able to do this. Most people are fine with disappearing messages at 4 weeks, but a few people like to keep their chats forever.
there's a tool for extracting chat history from signal desktop, you could build a plaintext and attachment archive with that if it runs regularly on your pc and appends new chats from the last run.
I'd be pretty angry if I found out someone I chatted to on Signal was running a service to workaround my message expiry choice and archive my messages. And breaking that trust just to run it through an LLM?
I also use the Note to Self which is built into Signal and appears just like any other conversation. I use that for temporary stuff like addresses and keep it clean.
Weirdly glad to hear I am not the only one effectively having lost them.
For removing noise you might want to look into TF-IDF instead of the manual method described in the post that I didn't understand. It basically looks for words common across the whole corpus as noise or ones that appear within a specific chat much higher than the whole dataset as interesting.
You can also do some fun stuff by finding phrases used asymmetrically eg more by one person in the convo than the other, or over time.
Wordclouds per person are also fun!
> 15 close friends, 50 regular contacts, 150 active acquaintances
To me meaningful is just the 15, and 15 close friends is a lot nowadays. Of all the people I know, there’s probably one person that gets close to this and they have a bit of a unstable personality. I don’t think OP‘s numbers are off, but I’m not sure what they mean by active. Is it just online chat, is it grab a beer on a Friday. To me it’s mostly the second. I find it highly subjective.
I find text messages impersonal and it also takes longer to communicate clearly what we need. There is so much lost. Even chats and emails for work are at risk of creating misunderstanding, especially because English is not the native language of most of my coworkers, all these adds to result in pretty low quality communication.
I also use the Note to Self which is built into Signal and appears just like any other conversation. I use that for temporary stuff like addresses and keep it clean.
Don't think there is a way to recover that.. right?