Releases: caleb531/imessage-conversation-analyzer
v3.1.0
New Features
- Moved date filtering from Pandas to SQL, which significantly improves memory usage and query performance for large conversations (#1; thanks @rzhade3!)
Fixes
- The new date-filtering implementation fixes a subtle bug where the the previously date filtering logic where
--to-dateactually inclusive of the first nanosecond of the specified day, even though the documentation explicitly says the given--to-datedate/time should be exclusive - The connection to the underlying messages/contacts databases are now closed as soon as ICA is finished reading from them, rather than when the Python process terminates; this should improve memory efficiency over the course of running an ICA analyzer (although I have no empirical data to support this statement 😅)
Housekeeping
- Cleaned up complexity in certain tests
v3.0.0
ICA v3 is a massive new release with group chat support, the ability to run SQL queries, and a host of other welcome features/fixes.
See the Migration Guide at the end for details.
Group Chat Support 🎉
Support for group chats is finally here! You can now analyze conversations with three or more people by passing the --contact / -c flag multiple times on the CLI (one for each participant other than yourself).
ica message_totals -c 'Jane Fernbrook' -c 'Thomas Riverstone'Similarly, when writing a custom analyzer, the ica.get_dataframes() method's contact_name parameter has been replaced with a contacts parameter representing a list of contact identifiers (name, phone number, or email).
cli_args = ica.get_cli_parser().parse_args(namespace=ica.TypedCLIArguments())
dfs = ica.get_dataframes(
contacts=cli_args.contacts,
timezone=cli_args.timezone,
from_date=cli_args.from_date,
to_date=cli_args.to_date,
from_people=cli_args.from_people,
)Per-Participant Metrics
For the message_totals, totals_by_day, and count_phrases analyzers, the generic "From Them" / "By Them" columns and metrics have been replaced with specific counts for each participant in the conversation. This is especially useful for group chats, allowing you to see exactly who sent what.
Breaking Change
If you were programmatically relying on the messages_from_them or reactions_from_them keys in the message_totals output, or the From Them column in other analyzers, you will need to update your code to use the specific participant names (e.g. messages_from_Jane Doe).
Transcript Sender Column
The boolean Is From Me column in the transcript analyzer has been replaced with a new Sender column. This column displays the name of the person who sent the message, which provides much better context for group conversations.
If two senders share the same first name, then the full names will be displayed for those participants. If two senders also share the same first+last names, then the phone number or email of each participant will be displayed as fallbacks.
Emoji Breakdown by Participant
The most_frequent_emojis analyzer now provides a breakdown of emoji usage counts for each participant in the conversation, in addition to the total count. This allows you to see which emojis are most favored by specific people in the chat.
Support for Contact Lookup by Phone Number / Email Address
Previously, the --contact-name / -c CLI options only accepted the full name of the contact whose conversation you wanted to find. ICA v3 now also supports looking up the contact via phone number or email address. For example, all of the following are valid:
ica -c '212-345-6789' message_totals
ica -c '+1 (212) 345-6789' message_totals
ica -c '12123456789' message_totals
ica -c 'jane.fernbrook@example.com' message_totalsPlease note that even when specifying a phone number or email, ICA aggregates statistics across all conversations for the specified contacts, even when different addresses are used.
API Change
As part of this change, the --contact-name CLI option has been renamed to --contact (the shorthand is still -c). Similarly, the contact_name parameter of the ica.get_dataframes function has been renamed to contacts.
Old API:
cli_args = ica.get_cli_parser().parse_args(namespace=ica.TypedCLIArguments())
dfs = ica.get_dataframes(
contact_name=cli_args.contact_name,
timezone=cli_args.timezone,
from_date=cli_args.from_date,
to_date=cli_args.to_date,
from_people=cli_args.from_people
)New API:
cli_args = ica.get_cli_parser().parse_args(namespace=ica.TypedCLIArguments())
dfs = ica.get_dataframes(
# cli_args.contacts is correct because it
# is a list of all --contact/-c values
# specified on the CLI
contacts=cli_args.contacts,
timezone=cli_args.timezone,
from_date=cli_args.from_date,
to_date=cli_args.to_date,
from_people=cli_args.from_people
)New Python API Functions for SQL
For developers using the Python API, we've added two new powerful functions to the ica module that allow you to query your conversation data using SQL. This is powered by an in-memory SQLite database that is automatically populated with the available iMessage dataframes.
get_sql_connection(dfs): A context manager which creates a temporary in-memory SQLite database from your ICA dataframes, allowing you to operate on them with theica.execute_sql_query()function (documented below)execute_sql_query(query, con): Executes a SQL query against the connection provided byget_sql_connection; returns a pandas dataframe with the results
import ica
# Retrieve conversation data
dfs = ica.get_dataframes(contacts=["Jane Doe"])
# Run SQL queries against the data
with ica.get_sql_connection(dfs) as con:
results = ica.execute_sql_query(
"SELECT * FROM messages WHERE is_from_me = 1",
con
)
print(results)New from_sql Analyzer
You can now execute arbitrary SQL queries against your conversation data using the new from_sql analyzer! This is perfect for ad-hoc analysis or exploring the data without writing a full Python script.
For security, any queries sent to this analyzer are run on a temporary, in-memory copy of the database.
ica from_sql "SELECT is_from_me, COUNT(*) FROM messages WHERE is_reaction = 0 GROUP BY is_from_me" -c 'Thomas Riverstone'See the README for the full database schema and available tables.
JSON Output Support
In addition to CSV, Excel, and Markdown, ICA v3 now supports outputting analyzer results as JSON. This makes it easier to integrate ICA into larger pipelines or process the data with tools like jq.
You can use this format by passing json to the --format / -f flag:
ica message_totals -c 'Thomas Riverstone' --format jsonCustomizable Output Labels
The ica.output_results function now accepts a prettified_label_overrides parameter. This allows you to provide a dict that maps specific raw labels (column names or index values) to their desired display names in the final output. This is particularly useful for acronyms or specific terms that the default prettifier might not handle correctly (e.g. ensuring "gifs" becomes "GIFs" instead of "Gifs", or ensuring that participant names are cased correctly).
ica.output_results(
df,
prettified_label_overrides={
"youtube_videos": "YouTube Videos",
"gifs": "GIFs"
}
)Breaking Change
The prettify_index parameter has been removed from ica.output_results. The new prettified_label_overrides feature provides a more flexible way to control output formatting, rendering the boolean parameter obsolete.
Improved Emoji Search Performance
The most_frequent_emojis algorithm has been rewritten to use the third-party emoji package. This makes the algorithm an order of magnitude faster, improves support for standardized emojis, and even matches emojis more accurately within your messages.
Removed Deprecated Methods
The following deprecated methods have been removed:
ica.get_cli_args()
The ica.get_cli_args() method was used for retrieving the arguments from calling ica on the command line, however it proved to be too brittle when custom arguments needed to be added.
Instead, we advise using the ica.get_cli_parser() method, which is used very similarly, but allows for more flexibility. The usage is almost exactly the same:
cli_args = ica.get_cli_parser().parse_args()ica.assign_lambda() and ica.pipe_lambda()
These functions were technically never officially documented, but they were introduced as helpers because of type-inferencing limitations in [mypy][mypy]. Fortunately, current versions of mypy no longer exhibit these issues (neither do modern type checkers like [ty][ty]). Therefore, because the underlying behaviors can no longer be observed, these functions have been removed (which is a win because their absence makes for cleaner ICA analyzer code).
To remove them, you can simply unwrap your lambda functions, going from this:
df.messages.pipe(ica.pipe_lambda(lambda df: df['text']))To this:
df.messages.pipe(lambda df: df['text'])Fixes
- Corrected some flipped logic in the
count_phrasesanalyzer where the given phrases would always be interpreted as regular expressions (even if--use-regex/-rwas not supplied) - Fixed a bug in
message_totalswhereDays Missedwas not constrained to the specified--to-date - Fixed an error in
message_totalsthat would occur when the user-specified filters produced an empty result set
Other Improvements
- Number values for the default table output (i.e. when
--format/-fis omitted) are now displayed using thousands separators; this is locale aware, so you get values like1,234,567for English, and1.234.567for other languages - A new
--versionflag has been added to theicaCLI command that allows you to check the current version of the package (example output:ica 3.0.0)
Housekeeping
- Switched from [mypy][mypy] to [ty][ty] for more performant and robust type-checking across the entire codebase
Migration Guide
Here's an overview of how to update your code from v2 to support v3:
Upgrade Package
Updating ICA to v3 is easy from the Terminal:
pip3 install --upgrade imessage-conversation-analyzerYou can also upgrade via uv if you have it installed globally:
uv tool upgrade imessage-conversation-analyzerContact Specification
Support f...
v2.9.0
ICA v2.9 is a release focused on type safety and under-the-hood improvements.
Type Safety for CLI Arguments
Previously, the library's CLI arguments parsed via ica.get_cli_parser() would return a Namespace object which lacked proper type information for the built-in ICA parameters. This would allow typos within custom analyzer code to go completely undetected, even by type checkers like mypy.
To solve this, ICA v2.9 adds a TypedCLIArguments class which contains the type information for the core ica CLI arguments. So instead of:
def main(): -> None:
cli_args = ica.get_cli_parser().parse_args()
...You would write:
def main() -> None:
cli_args = ica.get_cli_parser().parse_args(
namespace=ica.TypedCLIArguments()
)
...This class can also be subclassed if you are writing a custom analyzer which adds additional CLI arguments.
class AnalyzerCLIArguments(ica.TypedCLIArguments):
new_param: str
def main() -> None:
cli_args = ica.get_cli_parser().parse_args(
namespace=AnalyzerCLIArguments()
)
...[!NOTE] Note
Remember to instantiate yourTypedCLIArgumentsobject, since thenamespaceparameter expects a class instance, not a class itself.
Encouraged, but Optional
It should also be noted that just like Python's type system is optional, this new ICA feature is completely optional. You are never required to specify a custom namespace or add type annotations to your custom analyzer. That said, we still highly encourage you to do so if you use a type checker like mypy or ty.
Deprecated assign_lambda / pipe_lambda
The assign_lambda and pipe_lambda functions, which are technically undocumented but are used by the built-in analyzer examples, have been deprecated. They were originally added to fill in a type safety gap for mypy and pandas, but recent versions of these packages no longer exhibit this gap.
Housekeeping
- Migrated the entire test suite from
nose2topytest - Added missing docstrings to all functions and classes
- Adopted
pathlibto represent paths ubiquitously across the codebase (for internal consistency and predictable behavior) - Upgraded uv to 0.9.x
- Updated all dependencies to latest versions
v2.8.0
v2.7.0
v2.6.0
New Features
- You can now filter any analyzer by date and participant
- These are available in the CLI via new
--from-date,--to-date, and--from-personflags - These are also available in the Python API via new input parameters to
ica.get_dataframes():from_date,to_date, andfrom_person - See the README for details on how to use these new filters
- These are available in the CLI via new
- Added support for iOS 18's emoji-based reactions that allow for reacting with any arbitrary emoji
- This new support is mainly reflected in the Reactions metrics for the
message_totalsanalyzer
- This new support is mainly reflected in the Reactions metrics for the
Fixes
- Fixed some incorrect logic for how YouTube, Spotify, and Apple Music links were counted within the
attachment_totalsanalyzer
Housekeeping
- Added missing documentation for the
count_phrasesanalyzer to the README - Other organizational tweaks and improvements to the README
v2.5.0
New Features
- Added a new (built-in)
count_phrasesanalyzer which allow you to count the number of case-insensitive occurrences of any arbitrary strings across all messages in a conversation (excluding reactions)- e.g.
ica -c count_phrases -c 'Jane Fernbrook' 'i love you'
- e.g.
- Added a new
prettify_indexparameter to theica.output_resultsfunction; if you specify it with a value ofFalse, it will disable the default behavior of titleizing index values (see the newcount_phrasesanalyzer for an example)
Deprecations
- The
get_cli_args()function has been deprecated in favor of the newget_cli_parser()method- The
get_cli_parser()function gives you access to the underlyingargparse.ArgumentParserinstance, allowing you to add new CLI arguments specific to your analyzer - To migrate, replace
ica.get_cli_args()withica.get_cli_parser().parse_args()across your project files
- The
Under-the-Hood Improvements
- Upgraded all dependencies to their latest versions
- The CLI now throws an
ImportErrorif a module spec cannot be created (this is unlikely, though) - The
__main__entry point module is now fully tested, increasing the code coverage for the library
v2.4.0
v2.3.0
- Added a count for audio messages to the
attachment_totalsanalyzer - The exposed
attachmentsdataframe has been updated to include columns for:- The filename of the attachment, if applicable
- The ID of the associated message
- The
messagesdataframe has been updated to include a column for the ID of the message
v2.2.0
- Rewrote the most_frequent_emojis analyzer to be substantially faster and more accurate
- The time complexity of the algorithm has been reduced from O(n^2) to O(n), resulting in significant speedups (e.g. 10s to 3s, or 4s to 2s)
- The new algorithm also handles combined emojis correctly (e.g. 👨💻, which is a combination of 👨 and 💻, is now counted correctly)
- Small refactoring improvements to clean up the codebase