Skip to content

Overview

Kai edited this page Jul 27, 2022 · 5 revisions

The GroupMe export and parsing processes can be complicated and unintuitive at times, so I will try my best to clear that up here! Let's get into it

The first step is to obtain a folder with all of the chat information. That can be done with the instructions here. Once the export is prepared and downloaded, you're ready to begin!

NOTE: Do NOT rename any files inside of the export folder! The parser relies on finding the files with the default names; changing them will cause the parser to fail.

The file structure looks like this:

[] gallery
   - A variety of files
   - manifest.json
[] likes
   - everyone.json
   - for_me.json
   - mine.json
conversation.json
message.json
poll.json

- Gallery (folder)

Gallery is identical to the eponymous section in GroupMe itself. It contains all of the images, videos, and files sent in the chat. The items are titled with the following format:

  1. An 18-digit number appended by an underscore

Images:

  1. Dimensions of the image (i.e. 200x200) appended by a period
  2. Hash of the image

Videos:

  1. Hash of the video (much smaller than the image hash)
  2. Dimensions and resolution of the video (i.e. 1920x1080r90)

Files:

  1. File title

I am not sure what the 18-digit number represents. My first thought was a UNIX timestamp (epoch time) because all of them began with 16. So I plugged this in to a converter, and it gave me a date in 1970. Randomly, I decided to delete some of the trailing digits. With the first 16 digits of the image, the timestamp is correct. I have no idea what the last two digits mean. I haven't tried to research the hashing algorithm for the images; it didn't seem useful to know. But if you figure it out, let me know and I'll put it on here. These filenames are identical to the ones found in the image attachment links (just ignore the beginning of the URL), so it would be pretty straightforward to reference them using the JSON data. It's probably better to use the local gallery files as calling each of the image links would slow the host machine noticeably.

The last item in the Gallery folder is manifest.json. This is a list of all attachments in the format

{
    type: (string)
    url: (string)
    ip_address: (string)
)

For non-image/videos, the url will be empty (two quotes, not None or null).


- Likes (folder)

everyone.json

This is a collection of message JSONs for which the organizing principle remains unknown. I can't find anything that would cause a message to be in this JSON. They aren't the most-liked messages, there's no common person liking the messages, they don't all have attachments. No idea.

for_me.json

These are all the messages that you sent that were liked by at least one person.

mine.json

The opposite of for_me.json, these are all the messages that you have liked.


Now here are the 3 most important files:

- conversation.json

This JSON contains the metadata of the chat. It contains a variety of information, from the chat name, type, settings, image, number of images sent, to the like icon. However, most importantly, it contains a member list, with the user_id's and nicknames of every user that has ever been in the chat.


- message.json

This is the most important file. It contains the entire message history of the chat at the time of export. The information contained is extensive and useful. This is the standard message format:

{
    "attachments": [],
    "avatar_url": (string),
    "created_at": (epoch time string),
    "favorited_by": [],
    "group_id": (string,
    "id": (message id string),
    "name": (string),
    "sender_id": (user id string),
    "sender_type": (string),
    "source_guid": (string),
    "system": (boolean),
    "text": (string),
    "user_id": (string),
    ["event"]: {} # optional
    "platform": (string) 
}

This information contains virtually everything you need to recreate the GroupMe conversation (although having the other files can save time and complexity).


- poll.json

This file contains information on every poll from the chat. It's convenient if you're just wanting poll information and has a little more data on each poll than in message.json, but if that's not your goal then using the poll Events is quite effective.

Two very important sections of the message format are the Attachments and Events. I spent hours researching these sections from the context of my own exported conversation, and the results are contained in the following wiki pages:

Attachments Meme

Events Meme

Clone this wiki locally