Skip to content

mojibake in GCP Cloud Storage Bucket documents #320

@yangm2

Description

@yangm2

In the GCP Cloud Storage Bucket (that was indexed into the Data Store that is referenced by the RAG app), there are encoding issues that seem to be affecting the consistency and accuracy of TFA.

Here's a snippet from the Cloud Storage Bucket ...

      90.394 Termination of tenancy for failure to pay rent. The landlord may terminate the rental agreement for nonpayment of rent and take possession as provided in ORS 105.100 to 105.168, as follows:

      (1) When the tenancy is a week-to-week tenancy, by delivering to the tenant at least 72 hours’ written notice of nonpayment and the landlord’s intention to terminate the rental agreement if the rent is not paid within that period. The landlord shall give this notice no sooner than on the fifth day of the rental period, including the first day the rent is due.

      (2) For all tenancies other than week-to-week tenancies, by delivering to the tenant:

      (a) At least 10 days’ written notice of nonpayment and the landlord’s intention to terminate the rental agreement if the rent is not paid within that period. The landlord shall give this notice no sooner than on the eighth day of the rental period, including the first day the rent is due; or

Note ’ in (a) At least 10 days’ written notice of nonpayment and the landlord’s intention. This is apparently known as mojibake.

Compare that to the clean file checked into the repository ...

90.394 Termination of tenancy for failure to pay rent. The landlord may terminate the rental agreement for nonpayment of rent and take possession as provided in ORS 105.100 to 105.168, as follows:
(1) When the tenancy is a week-to-week tenancy, by delivering to the tenant at least 72 hours’ written notice of nonpayment and the landlord’s intention to terminate the rental agreement if the rent is not paid within that period. The landlord shall give this notice no sooner than on the fifth day of the rental period, including the first day the rent is due.
(2) For all tenancies other than week-to-week tenancies, by delivering to the tenant:
(a) At least 10 days’ written notice of nonpayment and the landlord’s intention to terminate the rental agreement if the rent is not paid within that period. The landlord shall give this notice no sooner than on the eighth day of the rental period, including the first day the rent is due; or

The likely result of this mojibake in the Data Store is probably some combination of:

  1. chat agent receives garbled text
    1. treats the passage as low-confidence or corrupted and hedges rather than apply the rule firmly
    2. lowers the relevance score of results with mojibake
  2. The RAG does not properly return relevant results

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinginfrastructurePull requests related to infrastructure and underlying workflows

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions