Coercion from `Dictionary(UInt8, LargeUtf8)` to `Utf8View`

### Describe the bug

I have a query predicate `left = right` where `left` is a column with type `Utf8View` and `right` is a column with type `Dictionary(UInt8, LargeUtf8)`.  The SQL planner returns `Err` for this query because it cannot find a coercion path for these two types. However, a coercion pathway exists: modifying the query to use `left = arrow_cast(right, 'LargeUtf8')` allows it to succeed.

### To Reproduce

I am unable to attach a `.rs` file for some reason, so below is the contents of my reproducer which can very simply be compiled with datafusion as the only dependency.

```
#[cfg(test)]
mod tests {
    use std::sync::Arc;

    use datafusion::catalog::MemTable;
    use datafusion::common::Result as DataFusionResult;
    use datafusion::common::arrow::array::{
        DictionaryArray, Int64Array, LargeStringArray, StringViewArray, UInt8Array, UInt64Array,
    };
    use datafusion::common::arrow::datatypes::{DataType, Field, Schema};
    use datafusion::common::arrow::record_batch::RecordBatch;
    use datafusion::prelude::SessionContext;

    const STATES: [&str; 50] = [
        "AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "FL", "GA", "HI", "ID", "IL", "IN", "IA",
        "KS", "KY", "LA", "ME", "MD", "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ",
        "NM", "MY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", "TX", "UT", "VT",
        "VA", "WA", "WV", "WI", "WY",
    ];

    async fn create_table_with_text() -> DataFusionResult<MemTable> {
        let states_schema = Arc::new(Schema::new(vec![
            Field::new("rank", DataType::UInt64, false),
            Field::new("state", DataType::Utf8View, false),
        ]));

        let (ranks, states) = STATES
            .iter()
            .enumerate()
            .collect::<(Vec<usize>, Vec<&str>)>();

        let states_batch = RecordBatch::try_new(
            Arc::clone(&states_schema),
            vec![
                Arc::new(ranks.iter().map(|r| *r as u64).collect::<UInt64Array>()),
                Arc::new(states.iter().map(Some).collect::<StringViewArray>()),
            ],
        )?;

        MemTable::try_new(states_schema, vec![vec![states_batch]])
    }

    async fn create_table_with_dictionary() -> DataFusionResult<MemTable> {
        let gub_schema = Arc::new(Schema::new(vec![
            Field::new("row_id", DataType::UInt64, false),
            Field::new(
                "state",
                DataType::Dictionary(DataType::UInt8.into(), DataType::LargeUtf8.into()),
                true,
            ),
        ]));

        let (ids, state_keys) = (1..=1000)
            .map(|i| (i as u64, ((i + 10 * i) % STATES.len()) as u8))
            .collect::<(Vec<u64>, Vec<u8>)>();
        let state_values = STATES.iter().map(Some).collect::<LargeStringArray>();

        let batch = RecordBatch::try_new(
            Arc::clone(&gub_schema),
            vec![
                Arc::new(UInt64Array::from(ids)),
                Arc::new(DictionaryArray::new(
                    UInt8Array::from(state_keys),
                    Arc::new(state_values),
                )),
            ],
        )?;

        MemTable::try_new(gub_schema, vec![vec![batch]])
    }   
        
    /// Test evaluating equality of the `states.states` string column against the `gub.states` dictionary column.
    ///
    /// We run the same query twice. First we provide the coercion pathway ourselves using the
    /// `arrow_cast` function to demonstrate that there is a straightforward coercion.
    /// Then we evaluate the query without the explicit coercion to demonstrate that the SQL
    /// planner does not find this coercion.
    #[tokio::test]
    async fn join() -> DataFusionResult<()> {
        let df: SessionContext = Default::default();
        df.register_table("states", Arc::new(create_table_with_text().await?))?;
        df.register_table("gub", Arc::new(create_table_with_dictionary().await?))?;

        let expect_batch = vec![RecordBatch::try_new(
            Arc::new(Schema::new(vec![Field::new(
                "count(*)",
                DataType::Int64,
                false,
            )])),
            vec![Arc::new(Int64Array::from(vec![1000]))],
        )?];

        let query_cast = format!(
            "SELECT count(*) FROM gub JOIN states ON states.rank = ((gub.row_id + 10 * gub.row_id) % {}) WHERE states.state = arrow_cast(gub.state, 'Utf8')",
            STATES.len()
        );
        let frame_cast = df.sql(&query_cast).await.expect("query_cast error");
        let batch_cast = frame_cast
            .collect()
            .await
            .expect("query_cast evaluation error");
        assert_eq!(batch_cast, expect_batch);

        let query_nocast = format!(
            "SELECT count(*) FROM gub JOIN states ON states.rank = ((gub.row_id + 10 * gub.row_id) % {}) WHERE states.state = gub.state",
            STATES.len()
        );
        let frame_nocast = df.sql(&query_nocast).await.expect("query_nocast error");
        let batch_nocast = frame_nocast
            .collect()
            .await
            .expect("query_nocast evaluation error");
        assert_eq!(batch_nocast, expect_batch);

        Ok(())
    }
}
```

### Expected behavior

The SQL planner should find and deploy the coercion pathway on its own - the author of the query should not have to deploy the `arrow_cast` function.

### Additional context

Arguably this is a feature request but I am filing it as a bug because I have experienced it as such.

The reproducer above is adapted from a test I have written for my application which creates the same data but using SQL rather than the `MemTable`.  That is, we have `CREATE TABLE states (rank INT, state TEXT)` and then the other table uses a custom table provider which results in the dictionary type.

In DataFusion 44, the SQL created the `state` column with Arrow data type Utf8 or LargeUtf8.
In DataFusion 50, the SQL created the `state` column with Arrow data type Utf8View.

This has broken my test because there previously was a coercion pathway for comparing the `state` column against my Dictionary type column, but now there is no such coercion pathway.

This looks related to #12511.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Coercion from `Dictionary(UInt8, LargeUtf8)` to `Utf8View` #19290

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Coercion from Dictionary(UInt8, LargeUtf8) to Utf8View #19290

Description

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Coercion from `Dictionary(UInt8, LargeUtf8)` to `Utf8View` #19290