Skip to content

[FEA] Consider a normalized string type in PQ for case insensitive col selection #21864

@mhaseeb123

Description

@mhaseeb123

Description

Consider defining and using a NormalizedString class to improve semantics of case-insensitive column matching/selection in the Parquet reader.

Here's an example of what it could look like:

struct NormalizedHash {
    using is_transparent = void;

    size_t operator()(std::string_view s) const {
        return std::hash<std::string>{}(normalize(s));
    }
};

struct NormalizedEq {
    using is_transparent = void;

    bool operator()(std::string_view a, std::string_view b) const {
        return normalize(std::string(a)) == normalize(std::string(b));
    }
};

using NormalizedNamesSet = std::unordered_set<std::string, NormalizedHash, NormalizedEq>;

That you could then use for all of these kinds of operations? It seems like you're repeating the pattern of calling normalize fairly often in this PR.

Originally posted by @vyasr in #21700 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    0 - BacklogIn queue waiting for assignment

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions