-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define EBNF in specification #363
Comments
@dawud and I made an initial stab at an EBNF .. it is not 100% correct but raised some interesting observations:
and here is obligatory python program generated from the grammar. Is seems like for this EBNF to be useful that we would need to have a detailed grammar for each type, enforcing required qualifiers ... I also do wonder if some of the current rules of pURL (eg. not supporting utf-8 or the vagaries of IRI) could result in a valid pURL but an invalid URL. Maybe the way to go is to define something basic and maybe consider higher precision if ever a vnext is to be considered ? |
It looks like this parser fails when it reaches a I don't know if this will work. The PURL parse algorithm parses from both ends of the string, so if you encounter a A really gross example is
I think the reason anchore/packageurl-go, giterlizzi/perl-URI-PackageURL, maennchen/purl, package-url/packageurl-go, package-url/packageurl-java, package-url/packageurl-js, package-url/packageurl-python, sonatype/package-url-java (all 8 failing implementations tested) fail is because these PURL parsers are based on existing URI/URL parsers, and PURL uses an incompatible parsing algorithm. |
ah right, ya I have not completed it fully thx!
useful (pathological) test cases ;)
once we fully implement EBNF we will know ;) but as prev mentioned I think we will need to drill down and generate rules for each type. |
updated EBNF and test parse program which can be run as follows (enclose pURL in {})
which emits the following parse tree (in xml)
still not quite right but a little better ... To generate test parser:
Note - The invoke escape the ? and = chars with \ |
When asking for the error message, there's a bug in the generated code where while f > 0:
if (f & 1) != 0 and 0 <= j and j < len(test.TOKEN):
tokenSet.append(test.TOKEN[j])
size += 1
j += 1
f >>= 1 With this adapter, the code can be loaded into purl-survey and tested using the test suite: https://gist.github.com/matt-phylum/60037cad76af18a7359650b8f319ca36 As expected given the current state, it fails most tests. There might be some bugs in parts of the adapter that are currently unreachable (eg the subpath might have an extra # character on the front, but subpaths don't work at all so I don't know). |
thx for trying out - great idea re integrating with purl-survey ... I am afraid its going to take me a few iterations - when I think it is ready for proper testing will raise a PR. |
The canonical PURL for
Parse / Decode:
NOTE: |
The canonical PURL is All complete implementations tested parse the canonical PURL except maennchen/purl and package-url/packageurl-ruby which both underdecode. URI-PackageURL parses the non-canonical PURL incorrectly. |
- Improved parsing of non-canonical PURL (package-url/purl-spec#363) - Improved "URI::VersionRange->constraint_contains" - Updated "maven" repository URL - FIX typo in documentation - Synced "test-suite-data.json" from "package-url/purl-spec"
Right!
I improved the URI-PackageURL parser and added this non-canonical PURL in the tests. |
@JimFuller-RedHat thanks to you, TIL about @GuntherRademacher 's https://github.com/GuntherRademacher/rex-parser-generator which looks like a freaking awesome piece of tool! |
It would be handy to have an EBNF definition of the pURL grammar ... motivated by discussions here #296
The text was updated successfully, but these errors were encountered: