The Real Shooby is a Twitter bot that mimics the messaging style of myself when I talk to my close friends. It analyzes over 150,000 messages that I have sent to my friend and uses a the Rita.js library to generate markov chain sentences. And through the use of the Twitter API, it can talk to any user who messages it. The majority of the time on this project was spent on working with large data sets. I definitely grew to be more comfortable working with them after this project.
I use Google Hangouts as my main messaging application when talking with my closest friends. Google provides a service called Takeout that allows any user to download their information, which includes their entire Hangouts history in a JSON format. My Hangouts JSON file ended up to be approximately 550mb (2 million lines), which was much more than what I expected.
The JSON file was a bit obtuse. And because it was so big, it was difficult for me to properly analyze it so that I could parse it myself. So I decided to find a tool to parse the JSON for me. The one I ended up using was this one: https://hangoutparser.jay2k1.com/. The problem with this tool is that it only accepts JSON files 500mb or less which meant that I needed to split my current JSON into smaller file sizes. This took a bit of time since I needed to understand the JSON’s data structure in order to not break the file.
After the tool parsed the JSON, it allowed me to download CSV files of each conversation. The ones that interested me were conversations that had the most messages sent which included my close friends and my ex-girlfriends. These CSV files ended up not being totally perfect. There were many things that disrupted the structure such as carriage returns, URLs, and weird emoticons that we liked to use. So I had to clean up these files as well, which led to a little bit of information lost.
I then used a csv-parse library to parse these CSVs. Because this project mainly focuses on my own messaging style, I only had to extract messages that I sent from these conversations. After compiling all of my messages, I noticed that my messaging style is not very conducive to analyze as complete thoughts. I like to message rapidly
in a sort of
stream of consciousness,
So I also needed a way to determine complete thoughts. I ended up appending messages that I sent within 5 seconds of my last message. This was done by finding the difference of time in the timestamp of consecutive messages.
After I was satisfied with the final text file, I used the RiTa library to analyze the text with markov chains and generate sentences. The result was bizarre and familiar.
My main code Bot.js on digitalocean server, run with forever lib
For now, everytime someone DMs the bot, bot generates a sentence from markov chain and replies to the sender. The bot also refollows any user who follows it, allowing the user to easily DM to the bot after following.
I am definitely going to continue working on this bot. I would like to have the bot recognize key words in received messages and reply in a way so that it becomes more "conversational".
Bonus: here is a conversation my ShoobyBot had with Liarbot (a bot that tweets anything you DM it)