Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 716: Normalization of Project Names in Metadata and Filenames #3171

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

dstufft
Copy link
Member

@dstufft dstufft commented Jun 12, 2023

Basic requirements (all PEP Types)

  • Read and followed PEP 1 & PEP 12
  • File created from the latest PEP template
  • PEP has next available number, & set in filename (pep-NNNN.rst), PR title (PEP 123: <Title of PEP>) and PEP header
  • Title clearly, accurately and concisely describes the content in 79 characters or less
  • Core dev/PEP editor listed as Author or Sponsor, and formally confirmed their approval
  • Author, Status (Draft), Type and Created headers filled out correctly
  • PEP-Delegate, Topic, Requires and Replaces headers completed if appropriate
  • Required sections included
    • Abstract (first section)
    • Copyright (last section; exact wording from template required)
  • Code is well-formatted (PEP 7/PEP 8) and is in code blocks, with the right lexer names if non-Python
  • PEP builds with no warnings, pre-commit checks pass and content displays as intended in the rendered HTML
  • Authors/sponsor added to .github/CODEOWNERS for the PEP

Standards Track requirements

  • PEP topic discussed in a suitable venue with general agreement that a PEP is appropriate
  • Suggested sections included (unless not applicable)
    • Motivation
    • Rationale
    • Specification
    • Backwards Compatibility
    • Security Implications
    • How to Teach This
    • Reference Implementation
    • Rejected Ideas
    • Open Issues
  • Python-Version set to valid (pre-beta) future Python version, if relevant
  • Any project stated in the PEP as supporting/endorsing/benefiting from the PEP formally confirmed such
  • Right before or after initial merging, PEP discussion thread created and linked to in Discussions-To and Post-History

📚 Documentation preview 📚: https://pep-previews--3171.org.readthedocs.build/pep-0716/

@dstufft dstufft requested a review from a team as a code owner June 12, 2023 02:00
@dstufft
Copy link
Member Author

dstufft commented Jun 12, 2023

Y'all are gonna get sick of me soon, but here's another one :)

@hugovk
Copy link
Member

hugovk commented Jun 12, 2023

Y'all are gonna get sick of me soon, but here's another one :)

We'll just reserve the rest of the 7xx series for you ;)

Copy link
Member

@hugovk hugovk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some typos via https://github.com/python/peps/pull/3171/checks#step:5:93, you can also run make spellcheck locally.

pep-0716.rst Outdated Show resolved Hide resolved
pep-0716.rst Outdated Show resolved Hide resolved
pep-0716.rst Outdated Show resolved Hide resolved
pep-0716.rst Outdated Show resolved Hide resolved
pep-0716.rst Outdated Show resolved Hide resolved
pep-0716.rst Outdated Show resolved Hide resolved
pep-0716.rst Outdated Show resolved Hide resolved
pep-0716.rst Outdated
optional spelling of ``_``, which would normalize to ``-``.
3. :pep:`503` was accepted, which specified a normalization of the project name
*when* querying the Simple API for a project.
4. The spec for ``.dist-info`` required normalization ofthe name (using the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. The spec for ``.dist-info`` required normalization ofthe name (using the
4. The spec for ``.dist-info`` required normalization ofthe name (using the

that are not normalized, and thus instructs tools to expect ``.dist-info``
directories with unnormalized values, but that all tools must write normalized
values going forward.
5. It was `noted <https://discuss.python.org/t/5605>`__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. It was `noted <https://discuss.python.org/t/5605>`__
5. It was `noted <https://discuss.python.org/t/5605>`__

pep-0716.rst Outdated Show resolved Hide resolved
pep-0716.rst Outdated Show resolved Hide resolved
pep-0716.rst Outdated Show resolved Hide resolved
pep-0716.rst Outdated Show resolved Hide resolved
pep-0716.rst Outdated Show resolved Hide resolved
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
@dstufft
Copy link
Member Author

dstufft commented Jun 12, 2023

I've committed most of the suggestions.. I think the remaining ones are just adding line breaks between entries in the numbered list? If y'all think that makes it clearer I'll add them, but the suggestions only did some of them, so I think it makes sense to do either some of them or none of them? Otherwise it feels weird to me to have half of them with extra line breaks, and half not.

WDYT?

@CAM-Gerlach CAM-Gerlach self-requested a review June 13, 2023 00:41
@CAM-Gerlach CAM-Gerlach added the new-pep A new draft PEP submitted for initial review label Jun 13, 2023
Copy link
Member

@CAM-Gerlach CAM-Gerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard reminder: You can directly apply all the suggestions you want in one go by going to Files changed -> Clicking Add to batch on each suggestion -> When done, clicking Commit

Top-level comments:

  • Might be consistent to mention that RFC 2119 terms are used here too, like you do in PEP 717
  • In the spec, missing a clear definition or at least a reference to what the actual "normalization" being mandated or prohibited is (i.e. :ref:packaging:name-normalization`).
  • Seems like it might be at least worth mentioning whether there will be security implications here, since normalization or lack thereof can have security-relevant properties in terms of dependency confusion, etc.
  • Per the above discussion, adding spaces between list items for lists that have multi-paragraph items is both clearer and is syntactically required in some cases (or the rendered lists do not have the correct spacing, as you can see in the PEP preview), so I suggest doing so in a commit (at least previously, whitespace around GitHub suggestions tends to get stripped when applying them, so I would not try to apply Hugo's suggestions and instead just do this locally.

The argument for changing filename normalization rules once again and mandating filenames not be normalized seems rather weak to me relative to the benefits, but I'll save that for the discussion thread, of course.

Title: Normalization of Project Names in Metadata and Filenames
Author: Donald Stufft <donald@stufft.io>
PEP-Delegate: Paul Moore <p.f.moore@gmail.com>
Discussions-To:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard reminder to update Discussions-To and Post-History with the PEP discussion thread created just after merging this

Abstract
========

This PEP standardizes on where and when project names should and should not be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This PEP standardizes on where and when project names should and should not be
This PEP standardizes where and when project names should and should not be

Elide unnecessary word

Motivation
==========

Historically there was effectively little to no requirements on the valid values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Historically there was effectively little to no requirements on the valid values
Historically, there was effectively little to no restriction on the valid values

Fix grammar errors (agreement)


Over the intervening years, various PEPs have ratcheted down various pieces of
metadata such as version numbers (:pep:`440`), filenames in bdists (:pep:`427`),
names in the Simple API (:pep:`503`), and filenames in sdists (:pep:`625`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
names in the Simple API (:pep:`503`), and filenames in sdists (:pep:`625`).
names in the Simple Repository API (:pep:`503`), and filenames in sdists (:pep:`625`).

Specify the actual name of the API, and be clear at a glance what API it actually is

names in the Simple API (:pep:`503`), and filenames in sdists (:pep:`625`).

Unfortunately, a complex interaction between these various standards *and*
changes made to the specifications without an associated PEP, have created a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
changes made to the specifications without an associated PEP, have created a
changes made to the specifications without an associated PEP have created a

Stray comma

Comment on lines +259 to +260
* Tools that are currently emitting the names in the simple API (outside of the URL
itself) as normalized, which is either allowed or required by the spec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Tools that are currently emitting the names in the simple API (outside of the URL
itself) as normalized, which is either allowed or required by the spec
* Tools that are currently emitting normalized project names in the Simple Repository API
(outside of the URL itself), which is either allowed or required by the spec
  • Clarify hard to parse and potentially-confusing awkward phrasing
  • Use the same full, title-cased form of the Simple Repo API name as used elsewhere, for clarity and consistency


* Tools that are currently emitting the names in the simple API (outside of the URL
itself) as normalized, which is either allowed or required by the spec
currently are immediately not longer complaint and must be updated to emit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
currently are immediately not longer complaint and must be updated to emit
currently are immediately no longer complaint and must be updated to emit

Fix typo

Require Normalization Everywhere
--------------------------------

One other possible idea is to simply require normalization everywhere, however
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
One other possible idea is to simply require normalization everywhere, however
One other possible idea is to simply require normalization everywhere; however,

Fix grammar error (comma splice)

Require Normalization in Filenames
----------------------------------

Filenames sit in a weird place, in most cases they are produced by software
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Filenames sit in a weird place, in most cases they are produced by software
Filenames sit in a weird place; in most cases they are produced by software

Fix another comma splice grammar error


Although they are often a software-to-software identifier, they are also used by
humans when sharing and manually downloading the software. They appear in places
like the PyPI UI, GitHub Releases, downstream Linux repositories, etc. In some
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downstream Linux repositories? Huh? Linux distros as a rule don't expose to users files straight off PyPI, they repack them into their own archive format following their own (nearly always normalized) naming convention. Furthermore, AFAIK most if not nearly all have naming and normalization policies similar or substantially stricter than PEP 503's for the canonical user-facing name of their packages, not just the file names. Therefore, I'm rather confused how this is relevant here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the wheel filename was explicitly designed to be for machine consumption, so saying it's not a software-to-software identifier is misrepresenting the intention of the design. (Thanks to compatibility tags, no-one would say wheel filenames are for human consumption 😉)

IMO, this whole argument is the weakest point of the whole PEP, and the overall proposal would be changed very little if it was changed to require normalisation of the project name in wheel and sdist filenames, but was left otherwise unchanged. If that isn't actually the case, then this is the section that should explain what the actual problem with normalising is. That would be a far better rejection reason than the current (frankly, largely subjective) argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, entirely agreed—I'd briefly mentioned it in my top-level review, but particularly as a PEP editor decided to save my own detailed commentary on this for the PEP discussion thread. However, as it ended up getting discussed at some length on Barry's thread that kicked off this PEP, it sounds like Donald mentioned there that he'd be revising this accordingly (hopefully to require normalization, which seemed to be the general consensus, rather than just allow it).

@hugovk
Copy link
Member

hugovk commented Jun 13, 2023

I've committed most of the suggestions.. I think the remaining ones are just adding line breaks between entries in the numbered list? If y'all think that makes it clearer I'll add them, but the suggestions only did some of them, so I think it makes sense to do either some of them or none of them? Otherwise it feels weird to me to have half of them with extra line breaks, and half not.

WDYT?

Yeah, currently only some of them have breaks so looks uneven:

screenshot

image

https://pep-previews--3171.org.readthedocs.build/pep-0716/#motivation

My suggestions were to add the missing ones between 3-4 and 5-6, so they're all consistent. Either that, or remove them from the others. (It's a big list so I favour extra space.)

Rationale
=========

This PEP follows two guiding principals:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This PEP follows two guiding principals:
This PEP follows two guiding principles:


Tools that read an arbitrary ``.dist-info`` directory **MUST** be prepared to
accept unnormalized values, however tools that work only on *new* ``.dist-info``
directories **SHOULD** validate that all values are normalized.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not grammar, but rather a point of semantics (which may need to be raised on Discourse, but if I don't mention it now, I'll forget 🙂). I'd prefer this to be MAY. We should never insist (even mildly) on validation from consumers. Suggesting it is fine.

respectively).

The ``name`` field **MUST** be non-normalized, with the exception that any ``-``
**MUST** be escaped to be ``_``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to object to this on Discourse... At a minimum, there should be an explanation here of why you propose this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I have an update to this PEP that changes the proposal, this was my first cut, and I didn't get a chance to publish it before the direction of the conversation changed. So hold off till I get an update written for this :)


Tools that accept an arbitrary distribution **MUST** be prepared to accept both
non-normalized and normalized filenames. However, tools that only work on *new*
distributions **SHOULD** validate that the distribution filenames are not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, MAY rather than SHOULD IMO.

Simple Repository API
---------------------

The project name, when returned in the "index" URL (e.g. ``/simple/``)
Copy link
Member

@pfmoore pfmoore Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should clarify where exactly, as I had to check PEP 503 to understand it. Also "in the index URL" is confusing.

Suggested change
The project name, when returned in the "index" URL (e.g. ``/simple/``)
The project name, when used in the *text* of the anchor tag in the page served at the root URL (e.g. ``/simple/``)

The project name when used in the URL (e.g. ``/simple/$project/``) **MUST** be
normalized.

The project name, when used on the Project detail page
Copy link
Member

@pfmoore pfmoore Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the HTML version of the index, the project name is never used (except in any additional content not covered by the spec). So this requirement can be omitted - unless you want to cover the JSON form as defined in PEP 691, in which case

Suggested change
The project name, when used on the Project detail page
The project name, when referenced in the ``"name"`` field of the JSON format project detail page

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Fixed mistaken single backticks in the above comment)

* Tools that are currently emitting the names in the simple API (outside of the URL
itself) as normalized, which is either allowed or required by the spec
currently are immediately not longer complaint and must be updated to emit
non-normalized names.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PEP should probably list which tools this is known to affect. After reading the PEP to here, I'm sufficiently overwhelmed as to be unsure whether this is referring to PyPI or Artifactory, or some other index software.

Copy link
Member

@CAM-Gerlach CAM-Gerlach Jun 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression here was that this specific bit referred to neither, and rather referred to installers using the normalized rather than the unnormalized name in user-facing UI text—but clearly, given there is evident confusion here it would be good to clarify appropriately.

@JelleZijlstra
Copy link
Member

Fixed the merge conflict, but there's a lot of comments above to address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-pep A new draft PEP submitted for initial review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants