-
Notifications
You must be signed in to change notification settings - Fork 346
Description
Issue
Hello, I noticed that many of the unit tests in the spring-cloud-gcp-data-spanner module that verify SQL query statements could potentially be unreliable or nondeterministic due to the column ordering in the SQL queries. Specifically, the issue may arise from the storage of the column names in the unordered HashSet in SpannerPersistentEntityImpl, which technically results in nondeterministic iteration when converting to the SQL string (even though it's probably stable currently due to the underlying JVM).
The way I detected these tests was via the Nondex tool, which flags Java tests that are potentially nondeterministic due to underlying assumptions regarding the Java API. For unordered collections like HashSet, Nondex will shuffle the items every run (to simulate nondeterminism) and see if the tests still pass. To see the Nondex output for the module, you can run:
mvn -pl spring-cloud-gcp-data-spanner edu.illinois:nondex-maven-plugin:2.1.7:nondex
There are currently 15 unit tests that fail this Nondex check due to SQL column ordering (there are others that also fail but due to other reasons). These are the 15 tests:
com.google.cloud.spring.data.spanner.core.SpannerTemplateTests.findKeySetTestEager
com.google.cloud.spring.data.spanner.core.SpannerTemplateTests.readAllTestEager
com.google.cloud.spring.data.spanner.repository.query.SpannerQueryLookupStrategyTests.getChildrenRowsQueryTest
com.google.cloud.spring.data.spanner.repository.query.SpannerQueryLookupStrategyTests.getColumnsStringForSelectMultipleTest
com.google.cloud.spring.data.spanner.repository.query.SpannerQueryLookupStrategyTests.getColumnsStringForSelectTest
com.google.cloud.spring.data.spanner.repository.query.SqlSpannerQueryTests.compoundNameConventionTest
com.google.cloud.spring.data.spanner.repository.query.SqlSpannerQueryTests.noPageableParamQueryTest
com.google.cloud.spring.data.spanner.repository.query.SqlSpannerQueryTests.pageableParamQueryTest
com.google.cloud.spring.data.spanner.repository.query.SqlSpannerQueryTests.sortAndPageableQueryTest
com.google.cloud.spring.data.spanner.repository.query.SqlSpannerQueryTests.sortParamQueryTest
com.google.cloud.spring.data.spanner.repository.query.SpannerStatementQueryTests.compoundNameConventionCountTest
com.google.cloud.spring.data.spanner.repository.query.SpannerStatementQueryTests.compoundNameConventionTest
com.google.cloud.spring.data.spanner.repository.query.SpannerStatementQueryTests.pageableNotLastParameterTest
com.google.cloud.spring.data.spanner.repository.query.SpannerStatementQueryTests.pageableTest
com.google.cloud.spring.data.spanner.repository.query.SpannerStatementQueryTests.sortTest
Even if the tests are stable now, it's not guaranteed that they will pass in the future if the underlying environment changes for any reason. Therefore, it might be beneficial to address this.
Potential Fix
A potential fix I propose is to make the column names deterministic by sorting them alphabetically during the SQL query creation, specifically in the getColumnStringsForSelect() method in SpannerStatementQueryExecutor. Something like:
OLD
final String sql = String.join(", ", spannerPersistentEntity.columns());
NEW
ArrayList<String> sortedColumns = new ArrayList<>(spannerPersistentEntity.columns());
Collections.sort(sortedColumns);
final String sql = String.join(", ", sortedColumns);
Then, for the 15 tests, the expected SQL string would be changed to have columns ordered alphabetically. I'm assuming that column name ordering in the SQL statements is not important, due to the use of the unordered HashSet in the existing implementation; apologies if I've misunderstood or missed something. If it is necessary to preserve a certain order (ie the ones present currently in expected strings of the unit tests), or if there is a better way to address this issue, please let me know. Thank you!