My inner bot

The Real Shooby is a Twitter bot that mimics the messaging style of myself when I talk to my close friends. It analyzes over 150,000 messages that I have sent to my friend and uses a the Rita.js library to generate markov chain sentences. And through the use of the Twitter API, it can talk to any user who messages it. The majority of the time on this project was spent on working with large data sets. I definitely grew to be more comfortable working with them after this project.

I use Google Hangouts as my main messaging application when talking with my closest friends. Google provides a service called Takeout that allows any user to download their information, which includes their entire Hangouts history in a JSON format. My Hangouts JSON file ended up to be approximately 550mb (2 million lines), which was much more than what I expected.

The JSON file was a bit obtuse. And because it was so big, it was difficult for me to properly analyze it so that I could parse it myself. So I decided to find a tool to parse the JSON for me. The one I ended up using was this one: The problem with this tool is that it only accepts JSON files 500mb or less which meant that I needed to split my current JSON into smaller file sizes. This took a bit of time since I needed to understand the JSON’s data structure in order to not break the file.



After the tool parsed the JSON, it allowed me to download CSV files of each conversation. The ones that interested me were conversations that had the most messages sent which included my close friends and my ex-girlfriends. These CSV files ended up not being totally perfect. There were many things that disrupted the structure such as carriage returns, URLs, and weird emoticons that we liked to use. So I had to clean up these files as well, which led to a little bit of information lost.

I then used a csv-parse library to parse these CSVs. Because this project mainly focuses on my own messaging style, I only had to extract messages that I sent from these conversations. After compiling all of my messages, I noticed that my messaging style is not very conducive to analyze as complete thoughts. I like to message rapidly

in a sort of

stream of consciousness,


of like


So I also needed a way to determine complete thoughts. I ended up appending messages that I sent within 5 seconds of my last message. This was done by finding the difference of time in the timestamp of consecutive messages.

After I was satisfied with the final text file, I used the RiTa library to analyze the text with markov chains and generate sentences. The result was bizarre and familiar.

My main code Bot.js on digitalocean server, run with forever lib

For now, everytime someone DMs the bot, bot generates a sentence from markov chain and replies to the sender. The bot also refollows any user who follows it, allowing the user to easily DM to the bot after following.

I am definitely going to continue working on this bot. I would like to have the bot recognize key words in received messages and reply in a way so that it becomes more "conversational".

Bonus: here is a conversation my ShoobyBot had with Liarbot (a bot that tweets anything you DM it)


If the bot is followed, the bot will follow the user back and DMs them a message:

If the user responds then the bot will take the user's message and tweet it to the public. 

View the code here.

Node.js Twitterbot

I used a JSON file that contained many music genres and appended the genre to the end of a Youtube search URL. This allows the user to find new genres of music that they might not have heard.

Cheap Twitterbots

The first Twitterbot assignment was to create five bots with the Cheap Bots Done Quick tool. Here are the five bots I created. Each description links to the JSON used to create the bot.

Combines one line from Shakespeare's sonnets and one line from Eminem's Lose Yourself.

Almost all of the data I used were from Darius Kazemi's Corpora Project, except for the Eminem lines. I found it interesting that all of these bots were not ideated until after going through all of the different categories of data in the Corpora Project. I would browse through these categories and keep ones that had potential in my mind. Tracery was fun to use, but I wasn't how to use JSON values nested within objects (for example something like this). This would open up a lot more possibilities to use pre-existing JSON files.

I was surprised at how poetic some of these bots could be, even when I had no intention of creating something of that nature. My favorite is probably still the JuliaChilds one since it was the one I refreshed the most to see what bizarre thing the bot would create.