Skip to content

parsec: provide suggestions in IllegalItemError error message#7283

Open
Scott-Owen-James wants to merge 7 commits intocylc:masterfrom
Scott-Owen-James:4662
Open

parsec: provide suggestions in IllegalItemError error message#7283
Scott-Owen-James wants to merge 7 commits intocylc:masterfrom
Scott-Owen-James:4662

Conversation

@Scott-Owen-James
Copy link
Copy Markdown
Contributor

@Scott-Owen-James Scott-Owen-James commented Apr 23, 2026

When someone has an error returns the likely list in error messages when they are incorrectly set to save users having to dig through the docs to find out what the correct format is.
closes #4662

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • Changelog entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@Scott-Owen-James Scott-Owen-James marked this pull request as ready for review April 24, 2026 14:24
@oliver-sanders oliver-sanders changed the title 4662 parsec: provide suggestions in IllegalItemError error message Apr 24, 2026
@oliver-sanders oliver-sanders added this to the 8.7.0 milestone Apr 24, 2026
@oliver-sanders
Copy link
Copy Markdown
Member

codecov/patchSuccessful in 1s — 100.00% of diff hit (target 97.00%)

💯

Comment thread cylc/flow/parsec/util.py
Comment on lines +430 to +461
filtered_keys: list[tuple[str, float]] = []
key_counter = Counter(key)
for possible_key in possible_keys:
possible_key_counter = Counter(possible_key)

# simple ratio for whole key
similarity = key_counter & possible_key_counter
ratio = (similarity.total() * 2) / (len(key) + len(possible_key))

# possible more accurate ratios for individual words
parsed_possible_key = possible_key.split(" ")
for possible_word in parsed_possible_key:
possible_word_counter = Counter(possible_word)
similarity = key_counter & possible_word_counter
word_ratio = (similarity.total() * 2)
word_ratio = word_ratio / (len(key) + len(possible_word_counter))
if word_ratio > ratio:
ratio = word_ratio
filtered_keys.append((possible_key, ratio))

filtered_keys.sort(key=lambda x: x[1], reverse=True)
if filtered_keys[0][1] < 0.2:
return []

final_keys = [filtered_keys[0]]
if len(filtered_keys) > 1 and filtered_keys[1][1] > final_keys[0][1] * .9:
final_keys.append(filtered_keys[1])

return [
final_keys[i][0]
for i in range(0, len(final_keys))
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I can easily follow this logic. Is it looking for any keys or words within keys which closely match the length of the given key, or words within the given key?
I have no idea if that is robust to the types of keys that will be seen here. I would be tempted to look up some fuzzy matching algorithms to see if there is anything simple that could match by letter rather than length?
Search ranking algorithms can be made fairly simple with some regex.

Copy link
Copy Markdown
Contributor Author

@Scott-Owen-James Scott-Owen-James Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works finding the letters shared between the entered key and the possible key, totalling them, then dividing them by the total number of letters. The second part about more accurate ratios for individual words is for the case that a person only wrote part of a possible key i.e., a key of 'execution' would find 'execution polling intervals', 'execution retry delays'. It does match by letter but I don't know about using regex.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proper fuzzy search would be great, but probably quite hard to implement.

Because this is a one-off use, we don't really want to add an external dependency to do this, so unless there's something in the Python standard library, I'm happy with something crude which does the job.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I follow this now. A docstring showing step by step might be good since it's non trivial to follow, at least for me.

I believe what it's doing is breaking down all possible keys and the given key into lists of contained chars, so something like 'foo' becomes {'f': 1, 'o': 2} then intersects the given key against the possible keys:
similarity = key_counter & possible_key_counter
Leaving just the numbers of each char that are present in both, then ranks them by total common character count and checks the similarity of the top ranked result is over a threshold.

I guess this is a type of fuzzy matching, I've not seen it done like this before but it seems like it should work, especially given the relatively constrained number of possible keys.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring added to clarify.

Comment thread tests/integration/validate/test_filter.py
Comment thread tests/unit/test_filter.py
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
from cylc.flow.parsec.util import filter_keys
import pytest
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a docstring too

Comment thread cylc/flow/parsec/util.py
Comment thread cylc/flow/parsec/util.py
Comment thread cylc/flow/parsec/util.py
Comment on lines +430 to +461
filtered_keys: list[tuple[str, float]] = []
key_counter = Counter(key)
for possible_key in possible_keys:
possible_key_counter = Counter(possible_key)

# simple ratio for whole key
similarity = key_counter & possible_key_counter
ratio = (similarity.total() * 2) / (len(key) + len(possible_key))

# possible more accurate ratios for individual words
parsed_possible_key = possible_key.split(" ")
for possible_word in parsed_possible_key:
possible_word_counter = Counter(possible_word)
similarity = key_counter & possible_word_counter
word_ratio = (similarity.total() * 2)
word_ratio = word_ratio / (len(key) + len(possible_word_counter))
if word_ratio > ratio:
ratio = word_ratio
filtered_keys.append((possible_key, ratio))

filtered_keys.sort(key=lambda x: x[1], reverse=True)
if filtered_keys[0][1] < 0.2:
return []

final_keys = [filtered_keys[0]]
if len(filtered_keys) > 1 and filtered_keys[1][1] > final_keys[0][1] * .9:
final_keys.append(filtered_keys[1])

return [
final_keys[i][0]
for i in range(0, len(final_keys))
]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proper fuzzy search would be great, but probably quite hard to implement.

Because this is a one-off use, we don't really want to add an external dependency to do this, so unless there's something in the Python standard library, I'm happy with something crude which does the job.

@samuel-denton
Copy link
Copy Markdown
Contributor

Not sure if it's intended, but this includes keys from global.cylc. This would be fine, except it seemed to give a different response than expected. The illegal items I have tested in flow.cylc give pretty good suggestions, however global.cylc gave me this, where neither suggestion is even close (note I have updated the error text for testing):

IllegalItemError: [install]template variables - template variables is not a valid configuration, did you mean max depth (score: 0.44), symlink dirs (score: 0.40)

Two things there, 1). 'template variables' will be valid soon so not a perfect example but 2). there are less allowed keys for global.cylc so it has less to pick from.

When given a user key in global.cylc which is closer to a match, it does find and suggest that match:

IllegalItemError: [install]max depthh - max depthh is not a valid configuration, did you mean max depth (score: 0.95)

So, I think some playing with the threshold value is needed so we don't suggest completely different keys, unless that is a design choice that we would rather recommend something than nothing?

@samuel-denton
Copy link
Copy Markdown
Contributor

Despite the threshold needing some tweaking for excluding unrelated suggestions, it is quite impressive and far more helpful that it only suggests keys for the section the illegal key is found in. So the same illegal key in different sections of the document produce different suggestions:

IllegalItemError: runtiming - (runtiming is not a valid configuration, did you mean runtime (score: 0.75))

IllegalItemError: [runtime][foo]runtiming - (runtiming is not a valid configuration, did you mean environment filter (score: 0.71), environment (score: 0.71))

This turns it from just a bit of a spell checker to something pretty powerful imo.

Comment thread cylc/flow/parsec/util.py
filtered_keys.append((possible_key, ratio))

filtered_keys.sort(key=lambda x: x[1], reverse=True)
if filtered_keys[0][1] < 0.2:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this threshold value needs some tweaking as it allows a key to be suggested if even a few letters are the same.

IllegalItemError: [install]template variable - template variable is not a valid configuration, did you mean max depth (score: 0.46)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you could probably tweak it to get higher or lower accuracy, the reason I went with 20% is because I was aiming for some to clear a very low bar, while I was testing some of them did only reach that high but were valid, such as typos in short words, or someone entering a different word in a multi word key.

Comment thread cylc/flow/parsec/util.py
Comment on lines +466 to +467
if len(filtered_keys) > 1 and filtered_keys[1][1] > final_keys[0][1] * .9:
final_keys.append(filtered_keys[1])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the reason for seeing if the second recommended key is close to the first, rather than just using the same threshold value for both? This works fine but seems unnecessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to avoid returning one very accurate key, and then a second very innaccurate i.e. the best match is 90% and the second best was 40%, it's mainly there to allow multiple options to be returned without accidentally returning 17.

@oliver-sanders
Copy link
Copy Markdown
Member

oliver-sanders commented Apr 27, 2026

Not sure if it's intended, but this includes keys from global.cylc. This would be fine, except it seemed to give a different response than expected. The illegal items I have tested in flow.cylc give pretty good suggestions, however global.cylc gave me this, where neither suggestion is even close (note I have updated the error text for testing):

@samuel-denton

It should give the relevant suggestions for file being worked on (i.e, should work for flow.cylc and global.cylc).

The example you gave:

    IllegalItemError: [install]template variables - template variables is not a valid configuration, did you mean max depth (score: 0.44), symlink dirs (score: 0.40)

Is a config being added in #7223

Was this the result of a mixed-checkout or rebase of some form?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cfgspec: display valid options in errors where present

3 participants