Skip to content

Releases: caleb531/imessage-conversation-analyzer

v3.1.0

29 Jan 23:52
716f51e

Choose a tag to compare

New Features

  • Moved date filtering from Pandas to SQL, which significantly improves memory usage and query performance for large conversations (#1; thanks @rzhade3!)

Fixes

  • The new date-filtering implementation fixes a subtle bug where the the previously date filtering logic where --to-date actually inclusive of the first nanosecond of the specified day, even though the documentation explicitly says the given --to-date date/time should be exclusive
  • The connection to the underlying messages/contacts databases are now closed as soon as ICA is finished reading from them, rather than when the Python process terminates; this should improve memory efficiency over the course of running an ICA analyzer (although I have no empirical data to support this statement 😅)

Housekeeping

  • Cleaned up complexity in certain tests

v3.0.0

13 Jan 22:00
02c046b

Choose a tag to compare

ICA v3 is a massive new release with group chat support, the ability to run SQL queries, and a host of other welcome features/fixes.

See the Migration Guide at the end for details.

Group Chat Support 🎉

Support for group chats is finally here! You can now analyze conversations with three or more people by passing the --contact / -c flag multiple times on the CLI (one for each participant other than yourself).

ica message_totals -c 'Jane Fernbrook' -c 'Thomas Riverstone'

Similarly, when writing a custom analyzer, the ica.get_dataframes() method's contact_name parameter has been replaced with a contacts parameter representing a list of contact identifiers (name, phone number, or email).

cli_args = ica.get_cli_parser().parse_args(namespace=ica.TypedCLIArguments())
dfs = ica.get_dataframes(
    contacts=cli_args.contacts,
    timezone=cli_args.timezone,
    from_date=cli_args.from_date,
    to_date=cli_args.to_date,
    from_people=cli_args.from_people,
)

Per-Participant Metrics

For the message_totals, totals_by_day, and count_phrases analyzers, the generic "From Them" / "By Them" columns and metrics have been replaced with specific counts for each participant in the conversation. This is especially useful for group chats, allowing you to see exactly who sent what.

Breaking Change

If you were programmatically relying on the messages_from_them or reactions_from_them keys in the message_totals output, or the From Them column in other analyzers, you will need to update your code to use the specific participant names (e.g. messages_from_Jane Doe).

Transcript Sender Column

The boolean Is From Me column in the transcript analyzer has been replaced with a new Sender column. This column displays the name of the person who sent the message, which provides much better context for group conversations.

If two senders share the same first name, then the full names will be displayed for those participants. If two senders also share the same first+last names, then the phone number or email of each participant will be displayed as fallbacks.

Emoji Breakdown by Participant

The most_frequent_emojis analyzer now provides a breakdown of emoji usage counts for each participant in the conversation, in addition to the total count. This allows you to see which emojis are most favored by specific people in the chat.

Support for Contact Lookup by Phone Number / Email Address

Previously, the --contact-name / -c CLI options only accepted the full name of the contact whose conversation you wanted to find. ICA v3 now also supports looking up the contact via phone number or email address. For example, all of the following are valid:

ica -c '212-345-6789' message_totals
ica -c '+1 (212) 345-6789' message_totals
ica -c '12123456789' message_totals
ica -c 'jane.fernbrook@example.com' message_totals

Please note that even when specifying a phone number or email, ICA aggregates statistics across all conversations for the specified contacts, even when different addresses are used.

API Change

As part of this change, the --contact-name CLI option has been renamed to --contact (the shorthand is still -c). Similarly, the contact_name parameter of the ica.get_dataframes function has been renamed to contacts.

Old API:

cli_args = ica.get_cli_parser().parse_args(namespace=ica.TypedCLIArguments())
dfs = ica.get_dataframes(
    contact_name=cli_args.contact_name,
    timezone=cli_args.timezone,
    from_date=cli_args.from_date,
    to_date=cli_args.to_date,
    from_people=cli_args.from_people
)

New API:

cli_args = ica.get_cli_parser().parse_args(namespace=ica.TypedCLIArguments())
dfs = ica.get_dataframes(
    # cli_args.contacts is correct because it
    # is a list of all --contact/-c values
    # specified on the CLI
    contacts=cli_args.contacts,
    timezone=cli_args.timezone,
    from_date=cli_args.from_date,
    to_date=cli_args.to_date,
    from_people=cli_args.from_people
)

New Python API Functions for SQL

For developers using the Python API, we've added two new powerful functions to the ica module that allow you to query your conversation data using SQL. This is powered by an in-memory SQLite database that is automatically populated with the available iMessage dataframes.

  • get_sql_connection(dfs): A context manager which creates a temporary in-memory SQLite database from your ICA dataframes, allowing you to operate on them with the ica.execute_sql_query() function (documented below)
  • execute_sql_query(query, con): Executes a SQL query against the connection provided by get_sql_connection; returns a pandas dataframe with the results
import ica

# Retrieve conversation data
dfs = ica.get_dataframes(contacts=["Jane Doe"])

# Run SQL queries against the data
with ica.get_sql_connection(dfs) as con:
    results = ica.execute_sql_query(
        "SELECT * FROM messages WHERE is_from_me = 1",
        con
    )
    print(results)

New from_sql Analyzer

You can now execute arbitrary SQL queries against your conversation data using the new from_sql analyzer! This is perfect for ad-hoc analysis or exploring the data without writing a full Python script.

For security, any queries sent to this analyzer are run on a temporary, in-memory copy of the database.

ica from_sql "SELECT is_from_me, COUNT(*) FROM messages WHERE is_reaction = 0 GROUP BY is_from_me" -c 'Thomas Riverstone'

See the README for the full database schema and available tables.

JSON Output Support

In addition to CSV, Excel, and Markdown, ICA v3 now supports outputting analyzer results as JSON. This makes it easier to integrate ICA into larger pipelines or process the data with tools like jq.

You can use this format by passing json to the --format / -f flag:

ica message_totals -c 'Thomas Riverstone' --format json

Customizable Output Labels

The ica.output_results function now accepts a prettified_label_overrides parameter. This allows you to provide a dict that maps specific raw labels (column names or index values) to their desired display names in the final output. This is particularly useful for acronyms or specific terms that the default prettifier might not handle correctly (e.g. ensuring "gifs" becomes "GIFs" instead of "Gifs", or ensuring that participant names are cased correctly).

ica.output_results(
    df,
    prettified_label_overrides={
        "youtube_videos": "YouTube Videos",
        "gifs": "GIFs"
    }
)

Breaking Change

The prettify_index parameter has been removed from ica.output_results. The new prettified_label_overrides feature provides a more flexible way to control output formatting, rendering the boolean parameter obsolete.

Improved Emoji Search Performance

The most_frequent_emojis algorithm has been rewritten to use the third-party emoji package. This makes the algorithm an order of magnitude faster, improves support for standardized emojis, and even matches emojis more accurately within your messages.

Removed Deprecated Methods

The following deprecated methods have been removed:

ica.get_cli_args()

The ica.get_cli_args() method was used for retrieving the arguments from calling ica on the command line, however it proved to be too brittle when custom arguments needed to be added.

Instead, we advise using the ica.get_cli_parser() method, which is used very similarly, but allows for more flexibility. The usage is almost exactly the same:

cli_args = ica.get_cli_parser().parse_args()

ica.assign_lambda() and ica.pipe_lambda()

These functions were technically never officially documented, but they were introduced as helpers because of type-inferencing limitations in [mypy][mypy]. Fortunately, current versions of mypy no longer exhibit these issues (neither do modern type checkers like [ty][ty]). Therefore, because the underlying behaviors can no longer be observed, these functions have been removed (which is a win because their absence makes for cleaner ICA analyzer code).

To remove them, you can simply unwrap your lambda functions, going from this:

df.messages.pipe(ica.pipe_lambda(lambda df: df['text']))

To this:

df.messages.pipe(lambda df: df['text'])

Fixes

  1. Corrected some flipped logic in the count_phrases analyzer where the given phrases would always be interpreted as regular expressions (even if --use-regex / -r was not supplied)
  2. Fixed a bug in message_totals where Days Missed was not constrained to the specified --to-date
  3. Fixed an error in message_totals that would occur when the user-specified filters produced an empty result set

Other Improvements

  1. Number values for the default table output (i.e. when --format / -f is omitted) are now displayed using thousands separators; this is locale aware, so you get values like 1,234,567 for English, and 1.234.567 for other languages
  2. A new --version flag has been added to the ica CLI command that allows you to check the current version of the package (example output: ica 3.0.0)

Housekeeping

  1. Switched from [mypy][mypy] to [ty][ty] for more performant and robust type-checking across the entire codebase

Migration Guide

Here's an overview of how to update your code from v2 to support v3:

Upgrade Package

Updating ICA to v3 is easy from the Terminal:

pip3 install --upgrade imessage-conversation-analyzer

You can also upgrade via uv if you have it installed globally:

uv tool upgrade imessage-conversation-analyzer

Contact Specification

Support f...

Read more

v2.9.0

08 Nov 03:58
1d99390

Choose a tag to compare

ICA v2.9 is a release focused on type safety and under-the-hood improvements.

Type Safety for CLI Arguments

Previously, the library's CLI arguments parsed via ica.get_cli_parser() would return a Namespace object which lacked proper type information for the built-in ICA parameters. This would allow typos within custom analyzer code to go completely undetected, even by type checkers like mypy.

To solve this, ICA v2.9 adds a TypedCLIArguments class which contains the type information for the core ica CLI arguments. So instead of:

def main(): -> None:
    cli_args = ica.get_cli_parser().parse_args()
    ...

You would write:

def main() -> None:
    cli_args = ica.get_cli_parser().parse_args(
        namespace=ica.TypedCLIArguments()
    )
    ...

This class can also be subclassed if you are writing a custom analyzer which adds additional CLI arguments.

class AnalyzerCLIArguments(ica.TypedCLIArguments):
    new_param: str

def main() -> None:
    cli_args = ica.get_cli_parser().parse_args(
        namespace=AnalyzerCLIArguments()
    )
    ...

[!NOTE] Note
Remember to instantiate your TypedCLIArguments object, since the namespace parameter expects a class instance, not a class itself.

Encouraged, but Optional

It should also be noted that just like Python's type system is optional, this new ICA feature is completely optional. You are never required to specify a custom namespace or add type annotations to your custom analyzer. That said, we still highly encourage you to do so if you use a type checker like mypy or ty.

Deprecated assign_lambda / pipe_lambda

The assign_lambda and pipe_lambda functions, which are technically undocumented but are used by the built-in analyzer examples, have been deprecated. They were originally added to fill in a type safety gap for mypy and pandas, but recent versions of these packages no longer exhibit this gap.

Housekeeping

  1. Migrated the entire test suite from nose2 to pytest
  2. Added missing docstrings to all functions and classes
  3. Adoptedpathlib to represent paths ubiquitously across the codebase (for internal consistency and predictable behavior)
  4. Upgraded uv to 0.9.x
  5. Updated all dependencies to latest versions

v2.8.0

02 Jul 03:09
cc572e3

Choose a tag to compare

  • Added regular expression support and case sensitivity support to the count_phrases analyzer
    • See the README for details

v2.7.0

05 Mar 16:54
7cbed3c

Choose a tag to compare

  • Added a new --result-count option to the most_frequent_emojis analyzer
  • Various cleanup to code and documentation

v2.6.0

02 Feb 00:34
0e1ee3c

Choose a tag to compare

New Features

  • You can now filter any analyzer by date and participant
    • These are available in the CLI via new --from-date, --to-date, and --from-person flags
    • These are also available in the Python API via new input parameters to ica.get_dataframes(): from_date, to_date, and from_person
    • See the README for details on how to use these new filters
  • Added support for iOS 18's emoji-based reactions that allow for reacting with any arbitrary emoji
    • This new support is mainly reflected in the Reactions metrics for the message_totals analyzer

Fixes

  • Fixed some incorrect logic for how YouTube, Spotify, and Apple Music links were counted within the attachment_totals analyzer

Housekeeping

  • Added missing documentation for the count_phrases analyzer to the README
  • Other organizational tweaks and improvements to the README

v2.5.0

03 Jan 18:55
bbe5dbe

Choose a tag to compare

New Features

  • Added a new (built-in) count_phrases analyzer which allow you to count the number of case-insensitive occurrences of any arbitrary strings across all messages in a conversation (excluding reactions)
    • e.g. ica -c count_phrases -c 'Jane Fernbrook' 'i love you'
  • Added a new prettify_index parameter to the ica.output_results function; if you specify it with a value of False, it will disable the default behavior of titleizing index values (see the new count_phrases analyzer for an example)

Deprecations

  • The get_cli_args() function has been deprecated in favor of the new get_cli_parser() method
    • The get_cli_parser() function gives you access to the underlying argparse.ArgumentParser instance, allowing you to add new CLI arguments specific to your analyzer
    • To migrate, replace ica.get_cli_args() with ica.get_cli_parser().parse_args() across your project files

Under-the-Hood Improvements

  • Upgraded all dependencies to their latest versions
  • The CLI now throws an ImportError if a module spec cannot be created (this is unlikely, though)
  • The __main__ entry point module is now fully tested, increasing the code coverage for the library

v2.4.0

27 Dec 21:17
71fe886

Choose a tag to compare

  • Upgraded dependencies to latest versions
    • EDIT: the dependency upgrade actually never got merged; this will be fixed in the next release

v2.3.0

06 Mar 23:09
be7ecf3

Choose a tag to compare

  • Added a count for audio messages to the attachment_totals analyzer
  • The exposed attachments dataframe has been updated to include columns for:
    • The filename of the attachment, if applicable
    • The ID of the associated message
  • The messages dataframe has been updated to include a column for the ID of the message

v2.2.0

22 Feb 04:48
75445ca

Choose a tag to compare

  • Rewrote the most_frequent_emojis analyzer to be substantially faster and more accurate
    • The time complexity of the algorithm has been reduced from O(n^2) to O(n), resulting in significant speedups (e.g. 10s to 3s, or 4s to 2s)
    • The new algorithm also handles combined emojis correctly (e.g. 👨‍💻, which is a combination of 👨 and 💻, is now counted correctly)
  • Small refactoring improvements to clean up the codebase