parsec: provide suggestions in IllegalItemError error message by Scott-Owen-James · Pull Request #7283 · cylc/cylc-flow

Scott-Owen-James · 2026-04-23T13:56:52Z

When someone has an error returns the likely list in error messages when they are incorrectly set to save users having to dig through the docs to find out what the correct format is.
closes #4662

Check List

I have read CONTRIBUTING.md and added my name as a Code Contributor.
Contains logically grouped changes (else tidy your branch by rebase).
Does not contain off-topic changes (use other PRs for other changes).
Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
Tests are included (or explain why tests are not needed).
Changelog entry included if this is a change that can affect users
Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

oliver-sanders · 2026-04-24T15:02:00Z

codecov/patchSuccessful in 1s — 100.00% of diff hit (target 97.00%)

💯

samuel-denton · 2026-04-24T15:03:10Z

+    filtered_keys: list[tuple[str, float]] = []
+    key_counter = Counter(key)
+    for possible_key in possible_keys:
+        possible_key_counter = Counter(possible_key)
+
+        # simple ratio for whole key
+        similarity = key_counter & possible_key_counter
+        ratio = (similarity.total() * 2) / (len(key) + len(possible_key))
+
+        # possible more accurate ratios for individual words
+        parsed_possible_key = possible_key.split(" ")
+        for possible_word in parsed_possible_key:
+            possible_word_counter = Counter(possible_word)
+            similarity = key_counter & possible_word_counter
+            word_ratio = (similarity.total() * 2)
+            word_ratio = word_ratio / (len(key) + len(possible_word_counter))
+            if word_ratio > ratio:
+                ratio = word_ratio
+        filtered_keys.append((possible_key, ratio))
+
+    filtered_keys.sort(key=lambda x: x[1], reverse=True)
+    if filtered_keys[0][1] < 0.2:
+        return []
+
+    final_keys = [filtered_keys[0]]
+    if len(filtered_keys) > 1 and filtered_keys[1][1] > final_keys[0][1] * .9:
+        final_keys.append(filtered_keys[1])
+
+    return [
+        final_keys[i][0]
+        for i in range(0, len(final_keys))
+    ]


Not sure I can easily follow this logic. Is it looking for any keys or words within keys which closely match the length of the given key, or words within the given key?
I have no idea if that is robust to the types of keys that will be seen here. I would be tempted to look up some fuzzy matching algorithms to see if there is anything simple that could match by letter rather than length?
Search ranking algorithms can be made fairly simple with some regex.

It works finding the letters shared between the entered key and the possible key, totalling them, then dividing them by the total number of letters. The second part about more accurate ratios for individual words is for the case that a person only wrote part of a possible key i.e., a key of 'execution' would find 'execution polling intervals', 'execution retry delays'. It does match by letter but I don't know about using regex.

Proper fuzzy search would be great, but probably quite hard to implement.

Because this is a one-off use, we don't really want to add an external dependency to do this, so unless there's something in the Python standard library, I'm happy with something crude which does the job.

I think I follow this now. A docstring showing step by step might be good since it's non trivial to follow, at least for me.

I believe what it's doing is breaking down all possible keys and the given key into lists of contained chars, so something like 'foo' becomes {'f': 1, 'o': 2} then intersects the given key against the possible keys:
similarity = key_counter & possible_key_counter
Leaving just the numbers of each char that are present in both, then ranks them by total common character count and checks the similarity of the top ranked result is over a threshold.

I guess this is a type of fuzzy matching, I've not seen it done like this before but it seems like it should work, especially given the relatively constrained number of possible keys.

Docstring added to clarify.

samuel-denton · 2026-04-24T15:09:01Z

+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+from cylc.flow.parsec.util import filter_keys
+import pytest


This could use a docstring too

oliver-sanders · 2026-04-24T15:23:55Z

+    filtered_keys: list[tuple[str, float]] = []
+    key_counter = Counter(key)
+    for possible_key in possible_keys:
+        possible_key_counter = Counter(possible_key)
+
+        # simple ratio for whole key
+        similarity = key_counter & possible_key_counter
+        ratio = (similarity.total() * 2) / (len(key) + len(possible_key))
+
+        # possible more accurate ratios for individual words
+        parsed_possible_key = possible_key.split(" ")
+        for possible_word in parsed_possible_key:
+            possible_word_counter = Counter(possible_word)
+            similarity = key_counter & possible_word_counter
+            word_ratio = (similarity.total() * 2)
+            word_ratio = word_ratio / (len(key) + len(possible_word_counter))
+            if word_ratio > ratio:
+                ratio = word_ratio
+        filtered_keys.append((possible_key, ratio))
+
+    filtered_keys.sort(key=lambda x: x[1], reverse=True)
+    if filtered_keys[0][1] < 0.2:
+        return []
+
+    final_keys = [filtered_keys[0]]
+    if len(filtered_keys) > 1 and filtered_keys[1][1] > final_keys[0][1] * .9:
+        final_keys.append(filtered_keys[1])
+
+    return [
+        final_keys[i][0]
+        for i in range(0, len(final_keys))
+    ]


Proper fuzzy search would be great, but probably quite hard to implement.

Because this is a one-off use, we don't really want to add an external dependency to do this, so unless there's something in the Python standard library, I'm happy with something crude which does the job.

Co-authored-by: Oliver Sanders <[email protected]>

samuel-denton · 2026-04-27T11:50:52Z

Not sure if it's intended, but this includes keys from global.cylc. This would be fine, except it seemed to give a different response than expected. The illegal items I have tested in flow.cylc give pretty good suggestions, however global.cylc gave me this, where neither suggestion is even close (note I have updated the error text for testing):

IllegalItemError: [install]template variables - template variables is not a valid configuration, did you mean max depth (score: 0.44), symlink dirs (score: 0.40)

Two things there, 1). 'template variables' will be valid soon so not a perfect example but 2). there are less allowed keys for global.cylc so it has less to pick from.

When given a user key in global.cylc which is closer to a match, it does find and suggest that match:

IllegalItemError: [install]max depthh - max depthh is not a valid configuration, did you mean max depth (score: 0.95)

So, I think some playing with the threshold value is needed so we don't suggest completely different keys, unless that is a design choice that we would rather recommend something than nothing?

samuel-denton · 2026-04-27T11:55:13Z

Despite the threshold needing some tweaking for excluding unrelated suggestions, it is quite impressive and far more helpful that it only suggests keys for the section the illegal key is found in. So the same illegal key in different sections of the document produce different suggestions:

IllegalItemError: runtiming - (runtiming is not a valid configuration, did you mean runtime (score: 0.75))

IllegalItemError: [runtime][foo]runtiming - (runtiming is not a valid configuration, did you mean environment filter (score: 0.71), environment (score: 0.71))

This turns it from just a bit of a spell checker to something pretty powerful imo.

samuel-denton · 2026-04-27T11:57:38Z

+        filtered_keys.append((possible_key, ratio))
+
+    filtered_keys.sort(key=lambda x: x[1], reverse=True)
+    if filtered_keys[0][1] < 0.2:


I think this threshold value needs some tweaking as it allows a key to be suggested if even a few letters are the same.

IllegalItemError: [install]template variable - template variable is not a valid configuration, did you mean max depth (score: 0.46)

Yeah, you could probably tweak it to get higher or lower accuracy, the reason I went with 20% is because I was aiming for some to clear a very low bar, while I was testing some of them did only reach that high but were valid, such as typos in short words, or someone entering a different word in a multi word key.

samuel-denton · 2026-04-27T11:58:42Z

+    if len(filtered_keys) > 1 and filtered_keys[1][1] > final_keys[0][1] * .9:
+        final_keys.append(filtered_keys[1])


Whats the reason for seeing if the second recommended key is close to the first, rather than just using the same threshold value for both? This works fine but seems unnecessary.

This is to avoid returning one very accurate key, and then a second very innaccurate i.e. the best match is 90% and the second best was 40%, it's mainly there to allow multiple options to be returned without accidentally returning 17.

oliver-sanders · 2026-04-27T12:05:53Z

Not sure if it's intended, but this includes keys from global.cylc. This would be fine, except it seemed to give a different response than expected. The illegal items I have tested in flow.cylc give pretty good suggestions, however global.cylc gave me this, where neither suggestion is even close (note I have updated the error text for testing):

@samuel-denton

It should give the relevant suggestions for file being worked on (i.e, should work for flow.cylc and global.cylc).

The example you gave:

    IllegalItemError: [install]template variables - template variables is not a valid configuration, did you mean max depth (score: 0.44), symlink dirs (score: 0.40)

Is a config being added in #7223

Was this the result of a mixed-checkout or rebase of some form?

Scott-Owen-James requested review from oliver-sanders and samuel-denton April 23, 2026 13:57

Scott-Owen-James force-pushed the 4662 branch from 34ce869 to 7f074db Compare April 24, 2026 08:17

Scott-Owen-James marked this pull request as draft April 24, 2026 09:10

Scott-Owen-James self-assigned this Apr 24, 2026

Created filter_keys function and tests, closes cylc#4662

8e20a54

Scott-Owen-James force-pushed the 4662 branch from 6842dd3 to 8e20a54 Compare April 24, 2026 14:23

Scott-Owen-James marked this pull request as ready for review April 24, 2026 14:24

oliver-sanders changed the title ~~4662~~ parsec: provide suggestions in IllegalItemError error message Apr 24, 2026

oliver-sanders added this to the 8.7.0 milestone Apr 24, 2026

samuel-denton reviewed Apr 24, 2026

View reviewed changes

Comment thread tests/integration/validate/test_filter.py

samuel-denton reviewed Apr 24, 2026

View reviewed changes

oliver-sanders reviewed Apr 24, 2026

View reviewed changes

Scott-Owen-James and others added 6 commits April 27, 2026 09:11

Update cylc/flow/parsec/util.py

8078335

Co-authored-by: Oliver Sanders <[email protected]>

Update cylc/flow/parsec/util.py

f174fd0

Co-authored-by: Oliver Sanders <[email protected]>

Update util.py

49b622d

Update util.py

fe08e17

Update test_filter.py

3d24599

Update test_filter.py

981224a

samuel-denton reviewed Apr 27, 2026

View reviewed changes

		if len(filtered_keys) > 1 and filtered_keys[1][1] > final_keys[0][1] * .9:
		final_keys.append(filtered_keys[1])

Conversation

Scott-Owen-James commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oliver-sanders commented Apr 24, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Scott-Owen-James Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samuel-denton commented Apr 27, 2026

Uh oh!

samuel-denton commented Apr 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oliver-sanders commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Scott-Owen-James commented Apr 23, 2026 •

edited

Loading

Scott-Owen-James Apr 24, 2026 •

edited

Loading

oliver-sanders commented Apr 27, 2026 •

edited

Loading