-
Notifications
You must be signed in to change notification settings - Fork 90
Description
Singer SDK Version
0.53.5
Is this a regression?
- Yes
Python Version
NA
Bug scope
Targets (data type handling, batching, SQL object generation, etc.)
Operating System
No response
Description
Three methods in SQLConnector create a new sa.inspect(self._engine) on every call. Since SQLAlchemy's reflection cache (info_cache) is an instance-level dictionary on Inspector, each new instance starts with an empty cache and the previous one is garbage collected — completely defeating dialect-level caching optimizations.
Affected methods
table_exists()schema_exists()get_table_columns()— most critical, called frequently during data loading
For contrast, discover_catalog_entries() already correctly creates a single Inspector and reuses it throughout the method.
How SQLAlchemy's reflection cache works
SQLAlchemy stores reflection results in Inspector.info_cache, a plain dictionary. The @reflection.cache decorator on dialect methods (e.g., Snowflake's _get_schema_columns()) uses this dict to avoid redundant queries. New Inspector = new empty dict = all cached results lost.
Real-world impact (Snowflake)
The Snowflake SQLAlchemy dialect optimizes by querying all columns in a schema at once (SELECT ... FROM information_schema.columns WHERE table_schema = :schema) and caching the result. Subsequent get_columns() calls for other tables in the same schema should return from cache with zero queries.
Because the SDK creates a throwaway Inspector each time, this optimization is completely negated.
This affects any SQL dialect that uses @reflection.cache for optimization, not just Snowflake.
Suggested fix
Cache the Inspector instance at the connector level (e.g., a lazy _inspector property) so the info_cache persists across calls. A clear_cache() escape hatch can be provided for edge cases where schema changes during a sync.
Link to Slack/Linen
No response