Skip to content

Comments

Introduce an initial LLM Usage Policy#2318

Open
sirosen wants to merge 3 commits intojazzband:mainfrom
sirosen:initial-llm-usage-policy
Open

Introduce an initial LLM Usage Policy#2318
sirosen wants to merge 3 commits intojazzband:mainfrom
sirosen:initial-llm-usage-policy

Conversation

@sirosen
Copy link
Member

@sirosen sirosen commented Jan 31, 2026

Driven by our prior discussion, this lays out an initial policy which is meant to be simple to understand.

After consideration, and in particular looking at the current pip contribution policy1, I have taken us back to the original two "columns" I suggested for our policy: "Disclosure" and "Ownership".

The policy is stated as meant for "LLM Generated Contributions". Although during earlier discussion I suggested that we avoid singling out these tools, on review (especially with some recent PRs), I am not sure that is wise. I would like it to be very clear to LLM users that we have some additional standards for them -- which I view as offsetting the ease with which they can spam projects and do harm.

The policy states that it is "to protect our maintainers as well as our contributors"; hopefully this is a clear hint that the maintainers even need some level of protection, and will help new contributors understand why we have a policy.

Echoing some prior discussion about "Don't let AI speak for you" / "Don't let AI think for you", there's a line included that draws a distinction between "typing" and "thinking".

To give us a clear out, in case we have truly problematic github users show up, the policy calls out "extreme cases" as spam/slop.

Finally, the policy itself links back to the original discussion as an open invitation for anyone who wants to advocate for us refining this policy.

Footnotes

  1. It's very short. See: https://pip.pypa.io/en/stable/development/contributing/

    While contributors may use whatever tools they like when developing a pull request, it is the contributor’s responsibility to ensure that submitted code meets the project requirements, and that they understand the submitted code well enough to respond to review comments.

    In particular, we will mark LLM-generated slop as spam without additional discussion.

Copy link
Member

@webknjaz webknjaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you seen https://github.com/chaoss/wg-ai-alignment/tree/main/moderation#readme? It's got the links from my gist and some more. It's a centralized effort worth watching periodically.

I've also found Sebastián's framing interesting: https://bsky.app/profile/tiangolo.com/post/3mc6mjosfa22s / https://fastapi.tiangolo.com/contributing/#automated-code-and-ai. It's a bit less formal.

This one is quirky https://curl.se/.well-known/security.txt


Do you think we could add some informal tone to the policy?

CONTRIBUTING.md Outdated

Although contributors are free to use whatever tools they like, `pip-tools` has a policy regarding LLM contributions to protect our maintainers as well as our contributors.

1. **Disclosure**: contributors should indicate when they have use an LLM to generate a part of their work.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **Disclosure**: contributors should indicate when they have use an LLM to generate a part of their work.
1. **Disclosure**: contributors should indicate when they have used an LLM to generate a part of their work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: should or must?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking on LLMs as a tool is evolving. I am doing my best to mentally bucket them with other productivity enhancing tools such as an IDE or a language server. I understand LLMs are vastly different, but at the end of the day it is still a tool that requires human thought, direction, judgment, and taste to guide toward successful outcomes. It may even be a new class of tool in our career field that requires more training before use, similar to operating dangerous heavy machinery.

A secondary concern is something I am working on personally, which is that I find myself negatively biased towards code under review if it was LLM generated. That is both good and bad. Good because it causes me to be more vigilant and careful in the review. Bad in that I am looking down on the code as "lesser" and tend to be more critical.

If I use an LLM to rubber-duck some code or make a rough prototype, then I hammer it into its final shape and only 10-20% the original LLM-generated code remains, I have enough sweat equity in it to call it "my code" and would not want to taint it with the "LLM" label.

If code is majority LLM-generated but carefully reviewed and understood by the author, that seems worthy of full disclosure.

Wholly LLM-generated code that is not understood by the submitter is not work considering.

There are degrees to this problem. At this point it is probably ok to require any LLM use to be disclosed but we may want to reevaluate that in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I 99% agree. If the submitter reworks it (line by line or even less, but in significant measure), the code is probably going to be fine. If they didn't work on it at all after it got vibe-coded for them, it's a waste product and should be discarded.

A secondary concern is something I am working on personally, which is that I find myself negatively biased towards code under review if it was LLM generated. That is both good and bad. Good because it causes me to be more vigilant and careful in the review. Bad in that I am looking down on the code as "lesser" and tend to be more critical.

I've experienced this as well, but not only with LLM-generated code. I've also experienced it a bit with junior engineers -- or a lot working with people who have a track record of sloppiness.

When I review code from DeveloperA, I know that she always writes solid unit tests, defines clear internal APIs, and thinks particularly hard about performance. So when I review her code, I mostly scan for anything unusual and read the unit tests to help me understand subtle behaviors.

When I review code from DeveloperB, I often see important untested cases or bugs which indicate that the code never ran. I read every line using the tiny Python interpreter in my brain (🫠) because my trust has been broken.

In this framing, what kind of developer is an LLM? Is it one that has earned your trust, and we can mostly ignore and skim the details? Or is it one that has broken your trust, that needs to always be watched for logical errors?

A lot of ink has been spilled over the ways in which LLMs trip up our heuristics for quality. But I hold that it behaves much more like an inexperienced or sloppy developer.

So yeah, I read every line of LLM-generated code much more closely. And I don't consider that something we should work on changing in ourselves. We should read it more closely. "Looking down on it", would be a problem. But not trusting the code is the correct choice.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate that your wording uses "should" but I suggest lightening that further to "are encouraged to" - if a disclosure statement is going to exist at all. If such a statement exists, please ask that contributors include the specifics about which model version via what framework when doing so. It needs to be sincerely seen as optional due to the discrimination people may face, and if it is included without that level of detail it doesn't convey any information.

Requiring disclosure, even implicitly, leads to discrimination. That drives contributors away.

We cannot pass a meaningful judgement on a contribution on such non-technical attributes such as "an LLM was used". It is a meaningless indicator. Like gender, skin color, or emacs user.

Generalizing on the term "LLM" doesn't make any sense either. Because it depends significantly on what specific model version was used within what framework, what tooling it was given, what skills around using those were in place, and how the user worked with it to produce the result.

Phrases like "it behaves much more like an inexperienced or sloppy developer" are a perfect example of this in action: That fails to define what "it" is and yet is stereotyping based on tooling having "LLM" in the name. The capabilities of foundation models increase exponentially every 2-3 months. Any generalized impression formed is obsolete by the time you think you've got a handle on it.

Policies should focus more on the desired outcomes and respect for maintainers time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate that your wording uses "should" but I suggest lightening that further to "are encouraged to" - if a disclosure statement is going to exist at all.

Very much worth noting that based on other feedback, I switched from "should" to "must", which is even stronger.

However, I agree that disclosure in itself is not particularly useful. The issue is really a form of dishonesty, in which a contribution is presented as though there is a human author, with whom I am expected to engage as a maintainer, but there is effectively no human in the loop and the interaction is completely automated. (I would count "clueless user feeds all of my replies to their LLM" as having no human in the loop as well.)

Requiring disclosure, even implicitly, leads to discrimination. That drives contributors away.

On the other hand, non-transparent use of these tools to thoughtlessly submit issues and change requests to open source projects is bad for maintainers. I understand what you're saying but I also want to ask newcomers to be thoughtful and to be open about the degree to which they understand the changes they are submitting. It's a fine line to tread.

We cannot pass a meaningful judgement on a contribution on such non-technical attributes such as "an LLM was used". It is a meaningless indicator. Like gender, skin color, or emacs user.

I don't agree that it is a meaningless indicator or non-technical.

In fact, I think we would struggle to define what it means for an attribute to be "non-technical". A person's "non-technical" attributes may have bearing on the code they write -- the abstractions, metaphors, and references which are natural to them are influenced by their background. e.g., The tool name, black, is not a universally understood reference. This is a much broader and deeper topic than we should try to unpack in this thread.

Consider this basic matrix:

Experienced Dev Inexperienced Dev
Using an LLM A B
Not Using an LLM C D

A and C are more alike than B and D. I don't think that's even open to debate? If it is, then we're very far apart on this, and I'm not sure how we should proceed in discussion.

Phrases like "it behaves much more like an inexperienced or sloppy developer" are a perfect example of this in action: That fails to define what "it" is and yet is stereotyping based on tooling having "LLM" in the name. The capabilities of foundation models increase exponentially every 2-3 months. Any generalized impression formed is obsolete by the time you think you've got a handle on it.

One of the problems with even discussing the LLM tools is that there's a very wide range in quality and the pace of development has been extremely rapid.

LLMs as a broad category of tools can produce cosmetically professional outputs which contain trivial, silly mistakes. This is what I mean by "sloppy" (in the traditional sense of that word...), and I don't think we should shame developers -- including self-shaming -- for looking more closely and critically at the output of these tools than we look at changes where there is a only a human author.
I think the capacity of these systems to make mistakes which are out of step with the overall quality of their outputs is well understood at this point.

If we can't find a way to talk about these tools as an aggregate category -- in the same way that we're able to talk about "editor macros" without being specific to vim, emacs, etc. -- then I can't even think through how to write a policy. Surely there are some properties which are common to all of these tools? e.g., They make it possible to produce very large amounts of text and large numbers of pull requests with relatively little human effort.

Policies should focus more on the desired outcomes and respect for maintainers time.

And here we 100% agree.


I really wish I could write a policy which didn't even mention LLMs. Just a general "tool assisted pull requests". If you want to write a libcst refactor and apply it to the codebase, then tell me that that's what you did to produce the PR. Likewise if you ask an LLM to scan for "good first issue" and fix a bug (without ever looking at the output), tell me. And if you're ashamed to tell me that you're doing, then maybe don't do that. This presupposes that you won't be ashamed to say "I had claude write the tests and then I wrote the fix" (or whatever), which only seems to be true of a subset of developers. So such a policy has some pretty significant problems.


I didn't get to doing another draft of the policy this past weekend, but I'll be working on it. And this thread will definitely factor into my thinking. I'm going to look again at the CPython docs for this to try to see how it threads the needle here, as I think there's no disclosure ask in that policy.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I agree that disclosure in itself is not particularly useful. The issue is really a form of dishonesty, in which a contribution is presented as though there is a human author, with whom I am expected to engage as a maintainer, but there is effectively no human in the loop and the interaction is completely automated.

The text I was replying to (earlier draft, sorry, i need to go read the latest but will wait for your next update) did not say "fully automated no human in the loop". The text I wrote the reply in response to said "when they have use an LLM to generate a part of their work" which is hugely different from the openclaw style automated non-genuine bot unattended maintainer time wasting scenario that maintainers rightfully want less of.

What this appears to be wanting is to know if someone is experienced and if they've meaningfully reviewed and understand the change proposal themselves. Instead what is being effectively demanded is "did you use an LLM?" allowing one to draw conclusions (often wrong) from that rather than asking the real questions.

I really wish I could write a policy which didn't even mention LLMs.

In the end this is what I think we need. It's not about the LLM. Its all about maintainer time, burnout, and asymmetry in the face of automation that lacks respect for maintainers.

Thanks for understanding! The future is indeed weird.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great insights! Thanks Sam, Stephen and Greg for opinions while I've been away.. I like the idea of getting the contributors to feel responsible for spending the maintainer time and post-processing any tool output, effectively taking responsibility for it.

Among other thoughts swarming in my head, I think we should make an effort to make it clear that "good first issues" aren't supposed to be pasted into automatic tools and their purpose and the reason for their existence (and being unsolved) is educating humans, attracting contributors who'd be involved for longer. And so we might want ask people about their motivation for contributing. One experiment I'm running in @aio-libs is asking people about the maintenance burden, here's an example of such PR template filled out: aio-libs/aiohttp#12136. I've noticed, though, that lately the LLM users ignore the PR templates and paste something generated their tool produced w/o even reading what was asked — I'd explore auto-closing such PRs not having the template and asking to fill them out properly when they re-open.


I also asked an ML friend (not an open-sourcer) to look into this PR and here's a summary of our chats:

I find myself negatively biased towards code under review if it was LLM generated

"This is true, because it's unclear what's the 'slop ratio' ('think for you' / 'type for you'); the requirement to understand every line is good because rules are pointless unless enforced and 'slop ratio' might be a metric'. The checkboxes may work as self-filtration. Motivation maters — the effort invested to understand the conventions might make sense for somebody who's going to keep contributing but a one-time thingy might just end up in a fork; and it doesn't make sense for the maintainers to spend time explaining the conventions either."

"It'd be good if the conventions / standards were written down".


So this all got me thinking that maybe we should additionally focus on making sure we don't encourage / reward unmotivated reputation harvesting w/o a human in the loop.

CONTRIBUTING.md Outdated
In extreme cases, slop PRs may be closed as spam.
:::

_The [rationale and discussion](https://github.com/jazzband/pip-tools/discussions/2278) behind this policy is open and public. Feel free to raise new issues or make suggestions there._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the link detached like others.

@webknjaz
Copy link
Member

webknjaz commented Feb 2, 2026

I feel like (1) we might not want to use word "policy" but rather something along the lines of "expectations", plus maybe (2) it'd be good to have guidelines for reviewers. And have some explanatory wording of "this is how we'll perceive submissions/interactions and here's why".

Here's some more on on what I like in other policy examples:


On a related note, I like CPython's triage process explanations (https://devguide.python.org/triage/triaging/ / https://devguide.python.org/triage/triage-team/) and think that it's a good source to take into account and steer people towards in terms of showing that it's important to get familiar with the community and the code base. Something similar to what you @sirosen wanted to write last year regarding contributing to CPython.

@sirosen
Copy link
Member Author

sirosen commented Feb 2, 2026

I haven't had time in the past couple of days to circle back and apply changes (and it's late in my local time today), but I wanted to drop a quick note.

I'm 👍 on all of the small changes suggested, but I want to review some of these other contrib docs before making more edits. Not all are familiar to me. e.g., I just read, Sebastian's policy for FastAPI projects, and I like it. It strikes a really good balance of brevity and explanation.

My tone was more formal, but I'll reconsider that. I'll need to try a few different versions of this out to see what works best. Possibly I'll post some samples of different possible text when I work through it, but if I really like a result I might just update the PR.

@webknjaz
Copy link
Member

webknjaz commented Feb 2, 2026

Oh, and I forgot one more thing: I think we should explicitly call out that things marked as a "good first issue" are best solved w/o LLMs, explaining that they are likely to be a good learning experience and generated submissions will probably harm this process.

@sirosen sirosen modified the milestones: 7.5.3, 7.5.4 Feb 6, 2026
@webknjaz
Copy link
Member

@sirosen so over on the pytest discord server, @0cjs shared this:

  1. I like the disclosure policy. In theory, it doesn't matter if the contributor used an LLM or not. In practice, if this gives them pause when they have to disclose this and check their work better, that's probably a good thing.,
  2. I think perhaps telling folks you expect them to be able to justify every character changed might help; even if you don't actually ask them to, that they should be able to should make them think about what they're changing.,
  3. Possibly related to #2: make the commit sequence tell a story? I won't explain that in detail here unless asked; I suspect you get it.,
  4. Actually, probably subsuming #2 and #3: design your commits for review.

And looking at that Claude summary, I'm feeling now that perhaps some faster way to figure out if a submission isn't living up to #4 above isn't a good place to start. I might, for example, start with the output of

git log --oneline --no-decorate --reverse \
    main@{u}..origin/dev/submitter/2025-02-15/some-stuff

And say, "Sorry, I'm not seeing from this the overall story of what's being changed. Rewrite this so that we get a decent overview from that, and then we can move on with more detailed review." (This is basically just the first step of the review procedure I describe here.)

@webknjaz
Copy link
Member

@sirosen so this is just crazy, was just shared in one of the discords I'm on: matplotlib/matplotlib#31132 (comment) / https://sethmlarson.dev/automated-public-shaming-of-open-source-maintainers 🤯

h/t @savannahostrowski and @sethmlarson

@webknjaz
Copy link
Member

Another opinion from private spaces:

One thing I think we as maintainers of open projects have to do is to alert our management that if we have to deal with AI slop bug reports or patches then this will make us less productive, and automating responses to that will not help. If we have to face actual abuse or attacks from AI agents in blocking or removing their AI slop PRs, then that will impact our mental health.

Copy link

@samdoran samdoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some of my thoughts. I think it's great that this is being discussed.

CONTRIBUTING.md Outdated

Although contributors are free to use whatever tools they like, `pip-tools` has a policy regarding LLM contributions to protect our maintainers as well as our contributors.

1. **Disclosure**: contributors should indicate when they have use an LLM to generate a part of their work.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking on LLMs as a tool is evolving. I am doing my best to mentally bucket them with other productivity enhancing tools such as an IDE or a language server. I understand LLMs are vastly different, but at the end of the day it is still a tool that requires human thought, direction, judgment, and taste to guide toward successful outcomes. It may even be a new class of tool in our career field that requires more training before use, similar to operating dangerous heavy machinery.

A secondary concern is something I am working on personally, which is that I find myself negatively biased towards code under review if it was LLM generated. That is both good and bad. Good because it causes me to be more vigilant and careful in the review. Bad in that I am looking down on the code as "lesser" and tend to be more critical.

If I use an LLM to rubber-duck some code or make a rough prototype, then I hammer it into its final shape and only 10-20% the original LLM-generated code remains, I have enough sweat equity in it to call it "my code" and would not want to taint it with the "LLM" label.

If code is majority LLM-generated but carefully reviewed and understood by the author, that seems worthy of full disclosure.

Wholly LLM-generated code that is not understood by the submitter is not work considering.

There are degrees to this problem. At this point it is probably ok to require any LLM use to be disclosed but we may want to reevaluate that in the future.

CONTRIBUTING.md Outdated
Although contributors are free to use whatever tools they like, `pip-tools` has a policy regarding LLM contributions to protect our maintainers as well as our contributors.

1. **Disclosure**: contributors should indicate when they have use an LLM to generate a part of their work.
You may also be able to help maintainers understand what they are looking at by sharing how you used a tool, e.g. what prompts you used.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is very good guidance, almost worthy of inclusion in an issue template. The reason being that the prompt actually reveals the thinking behind the code. The code from the LLM may or may not reflect the author's original intention because it is an imperfect tool. But having a better understanding of the thinking behind code is always the most beneficial thing that comes out of code review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I actually started suggesting asking for the LLM name and the prompt in issue templates in ansible and aio-libs some time last year but didn't get any support as people thought this might be alienating. I'd still like that, though..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely think about changing the PR and issue templates, and I agree that some good form for "what tools did you use?" could be the path forward.

But for now I want to focus on just the contrib doc piece, and just setting down the rules we can agree upon! 🫶

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely think about changing the PR and issue templates

Also automate closing those that remove the template/form. Looking at a bunch of the last PRs at https://github.com/aio-libs/aiohttp/pulls?q=is%3Apr+is%3Aopen+"%23%23+Problem"+"%23%23+Solution"+"%23%23+Changes"+, they just stopped filling out the template: https://github.com/aio-libs/aiohttp/blob/master/.github/PULL_REQUEST_TEMPLATE.md

@webknjaz
Copy link
Member

Looks like GH is noticing the slop storm even more now: https://github.blog/open-source/maintainers/welcome-to-the-eternal-september-of-open-source-heres-what-we-plan-to-do-for-maintainers/

Copy link
Member Author

@sirosen sirosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks much for the reviews and feedback! I've just posted a new version of the document here.

I could lean in harder on making the text structured (bulleted lists, tables, etc), but I just didn't like it when I tried. So I'm sharing, as I think I secretly knew I would, only one new revision, as an update to the PR.

I added another section in this draft, for the good first issue label. Intentionally, it's not nested under the LLM guidelines, but is the next-sibling after it in the docs.

CONTRIBUTING.md Outdated
Although contributors are free to use whatever tools they like, `pip-tools` has a policy regarding LLM contributions to protect our maintainers as well as our contributors.

1. **Disclosure**: contributors should indicate when they have use an LLM to generate a part of their work.
You may also be able to help maintainers understand what they are looking at by sharing how you used a tool, e.g. what prompts you used.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely think about changing the PR and issue templates, and I agree that some good form for "what tools did you use?" could be the path forward.

But for now I want to focus on just the contrib doc piece, and just setting down the rules we can agree upon! 🫶

CONTRIBUTING.md Outdated

Although contributors are free to use whatever tools they like, `pip-tools` has a policy regarding LLM contributions to protect our maintainers as well as our contributors.

1. **Disclosure**: contributors should indicate when they have use an LLM to generate a part of their work.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I 99% agree. If the submitter reworks it (line by line or even less, but in significant measure), the code is probably going to be fine. If they didn't work on it at all after it got vibe-coded for them, it's a waste product and should be discarded.

A secondary concern is something I am working on personally, which is that I find myself negatively biased towards code under review if it was LLM generated. That is both good and bad. Good because it causes me to be more vigilant and careful in the review. Bad in that I am looking down on the code as "lesser" and tend to be more critical.

I've experienced this as well, but not only with LLM-generated code. I've also experienced it a bit with junior engineers -- or a lot working with people who have a track record of sloppiness.

When I review code from DeveloperA, I know that she always writes solid unit tests, defines clear internal APIs, and thinks particularly hard about performance. So when I review her code, I mostly scan for anything unusual and read the unit tests to help me understand subtle behaviors.

When I review code from DeveloperB, I often see important untested cases or bugs which indicate that the code never ran. I read every line using the tiny Python interpreter in my brain (🫠) because my trust has been broken.

In this framing, what kind of developer is an LLM? Is it one that has earned your trust, and we can mostly ignore and skim the details? Or is it one that has broken your trust, that needs to always be watched for logical errors?

A lot of ink has been spilled over the ways in which LLMs trip up our heuristics for quality. But I hold that it behaves much more like an inexperienced or sloppy developer.

So yeah, I read every line of LLM-generated code much more closely. And I don't consider that something we should work on changing in ourselves. We should read it more closely. "Looking down on it", would be a problem. But not trusting the code is the correct choice.

Driven by our prior discussion, this lays out an initial policy which is
meant to be simple to understand.

After consideration, and in particular looking at the current `pip`
contribution policy[^1], I have taken us back to the original two
"columns" I suggested for our policy: "Disclosure" and "Ownership".

The policy is stated as meant for "LLM Generated Contributions". Although
during earlier discussion I suggested that we avoid singling out these
tools, on review (especially with some recent PRs), I am not sure that is
wise. I would like it to be very clear to LLM users that we have some
additional standards for them -- which I view as offsetting the ease with
which they can spam projects and do harm.

The policy states that it is "to protect our maintainers as well as
our contributors"; hopefully this is a clear hint that the maintainers
even _need_ some level of protection, and will help new contributors
understand why we have a policy.

Echoing some prior discussion about "Don't let AI speak for you" /
"Don't let AI think for you", there's a line included that draws a
distinction between "typing" and "thinking".

To give us a clear out, in case we have truly problematic github users
show up, the policy calls out "extreme cases" as spam/slop.

Finally, the policy itself links back to the original discussion as an
open invitation for anyone who wants to advocate for us refining this
policy.

[^1]: It's very short.
      See: https://pip.pypa.io/en/stable/development/contributing/
Switch to a more casual tone and expand out the disclosure and ownership
sections into paragraphs, rather than a bulleted list.

A new note is added, inspired by the FastAPI contrib doc and others, to
suggest that contributors think about what they are adding *beyond* just
prompting an LLM.

A new section is added on the "good first issue" label to explain that it
should not be fed into LLMs.

The policy discussion link is moved to be a detached link.

A new doc fragment links to the contrib one.
@sirosen sirosen force-pushed the initial-llm-usage-policy branch from fe3eeb0 to cddc504 Compare February 14, 2026 21:06
@0cjs
Copy link

0cjs commented Feb 19, 2026

Here's the TLDR of what I want to see in a policy; justification of it is below.

  1. Any change you have made includes any refactorings reasonable to decrease code size and complexity. If your change touches code that should be refactored and doesn't, we will not accept it until you add those refactorings.
  2. You understand and can justify every line of code in your change. Look at every line you add or change and, if you don't know why it's added or changed, remove it. (If your change doesn't work after that, now you can figure out why it's necessary.)

My current consulting job is essentially, "Help rescue a company that's written hundreds of thousands of lines of code with an LLM so that they can continue operating." Perhaps ironically, I'm mainly a Haskell and Python guy, but they're using JS, so I'm relying heavily on that LLM myself to help me with coding.

The biggest problem I've found is simply that LLMs produce exponential code increases: they produce "working well enough" code, albeit too much of it, and every fix for any problem adds more code. (That of course adds more, though often more subtle, problems, requiring more code to fix, and you can easily see where that goes.) TLDR: They never refactor. (If that did not make you sit up and scream, read on; if it did, you already know what's coming.) I am not the only one to point this out.

There are claims that just now, finally (literally days after the release of Opus 4.6) LLMs are able to write code to the level where good programmers will no longer be needed. (This is not the first time this claim has been made the last year or two, or the last decade, or even in the last century.) But I'm not buying it not only because of all this past experience but because I've I just upgraded the Claude I've been using for a while to that exact same model, with all the extras turned on, and I can, as an experienced programmer, easily point out where it's failing. Let me give an example that came up within literally an hour of the upgrade. (This is simple code, but not simplified: it's actual production code that I am using every day and maintaining for the rest of my time at this company.)

Today (again with Opus 4.6 and "extra thinking" turned on) I asked it to stop a whole load of of spew from a tofu plan command, which was producing dozens of bold (in my terminal) aws_instance.foo: Refreshing state... [id=i-123456789abcdef] lines. It suggested updating the script to add 2>&1 | grep -v ... after that command (you can fill in the general idea for my elision ...; it's not important) to that line. This is, to an experienced programmer, is clearly stupid because, along with an imperfect fix for the old problem, it's introducing two new problems. A simple prompt of "isn't there a command line option to suppress those messages" immediately made it come back with adding just -concise to the command line, which is what it should have done in the first place.

(For those who do not understand the huge difference between the first and second solutions, in terms of long-term maintainability: the first combines stderr with stdout which means that callers of the script using this can no longer suppress stdout for other reasons without making error messages disappear, and the grep is a guess at what we should match that may randomly suppress or allow other messages. These may seem small and unimportant, but they are exactly the sort of subtle "oh, could that ever be a problem, really?" things that will come back to bite you in the ass in the future. Yes, only one in a hundred of them, but see below.)

Those of us who build large systems well know that everybody says every single week, "oh, little things like this are no big deal" and two years later you're crushed under thousands of "little things" like this that have hemmed in the directions in which you can progress (removing only 1 degree there, 0.5 degrees there) such that there's no direction in which you can move any more. This is exactly what I'm dealing with right now: things like there are literally a several thousand catch clauses that issue a 500 Internal Server Error HTTP response that I'm sure do only about a dozen different things, total, but refactoring this to a dozen instances of that instead of several thousand is a lot of work. (Assuming it's a dozen; who knows. Had they clarified their intentions at the time and refactored to that, the problem wouldn't exist.)

My feeling is that it really doesn't matter much at all if LLMs are ten times better at reading and dealing with large code bases than I am: if they're expanding the code bases such that you need someone a thousand times better than I am, they'll be just as lost as me, and sink pretty much as quickly into "every time I fix a bug I add a new bug."

Dijkstra explained all this decades ago: after a certain (very early) point programming is not a problem of being able to generate code: it's a problem of being able to simplify your intentions so you can control the complexity. And no amount of "I can create an AI to handle more complex stuff" is going to compete against, "if I do random shit I can increase complexity faster than the entire universe can handle."

All of which is just long-winded justification for my suggestion above: in another form, "How did you make our program/project less complex, or at least no more complex when adding the additional functionality?"

@webknjaz
Copy link
Member

@sirosen I still haven't reviewed your last changes but have you seen this injection chaoss/wg-ai-alignment#20 (comment) ?


@0cjs love the simplification framing!

@sirosen
Copy link
Member Author

sirosen commented Feb 19, 2026

I hadn't seen that one. I like that it goes at the very end of the file; basically lets a human reader stop at that point.

I agree, pushing people for brevity and proper refactoring is probably a good inclusion.

However, I want to avoid "thrashing" by changing the PR content too quickly (which makes review difficult). I'm giving it a while for any feedback on the current content before making a next draft. Currently, my plan is:

  • wait until the weekend for feedback
  • update to include an end-of-doc set of instructions for "autonomous agents" (per that citation)
  • update to include a note/section about brevity and prioritizing maintainability
  • post an update comment indicating that I've made these changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants