-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New manpage format #1921
base: gh-pages
Are you sure you want to change the base?
New manpage format #1921
Conversation
I just triggered a pair of workflow runs to update the manual pages and to update the translated manual pages, fetched the result and rendered it locally. Here are two examples:
Personally, I cannot spot any difference, apart from the version number (because this here PR branch is based on v2.46.2 while the updated manual pages include v2.47.0) and the incorrect Even looking at the HTML of the synopses (taking the French version, so that there is a known difference), I only see this: diff --git a/before b/after
index 1a87d1348..6185fb72b 100644
--- a/before
+++ b/after
@@ -1,5 +1,5 @@
<pre class="content"><em>git config list</em> [<option-de-fichier>] [<option-d-affichage>] [--includes]
-<em>git config get</em> [<option-de-fichier>] [<option-d-affichage>] [--includes] [--all] [--regexp=<regexp>] [--value=<valeur>] [--fixed-value] [--default=<default>] <nom>
+<em>git config get</em> [<option-de-fichier>] [<option-d-affichage>] [--includes] [--all] [--regexp] [--value=<valeur>] [--fixed-value] [--default=<default>] <nom>
<em>git config set</em> [<option-de-fichier>] [--type=<type>] [--all] [--value=<valeur>] [--fixed-value] <nom> <valeur>
<em>git config unset</em> [<option-de-fichier>] [--all] [--value=<valeur>] [--fixed-value] <nom> <valeur>
<em>git config rename-section</em> [<option-de-fichier>] <ancien-name> <nouveau-name> @jnavila what am I missing? |
The manpage of git-config has not been converted yet. After importing, here is the result: I'm not satisfied with the styles, particularly when dealing with inline formats: you can test by yourself locally, and tell me your judgment. |
The new style makes the code spans lighter and more integrated into the text. The new style also makes the code spans more readable and less intrusive. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
This commit adds a upcoming manpage format to the AsciiDoc backend. The new format changes are: * The synopsis is now a section with a dedicated style. This "synopsis" style allows to automatically format the keywords as monospaced and <placeholders> as italic. * the backticks are now used to format synopsis-like syntax in inline elements. All the manpages are processed with this format. It may upset the formatting for older manpages, making it not consistent across a page, but this will be a mild side effect, as this was not really consistent before. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
3cb6e5e
to
9bd765a
Compare
@dscho I updated the CSS, so it is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnavila I've added a few questions. Thanks for this contribution.
|
||
def process parent, reader, attrs | ||
outlines = reader.lines.map do |l| | ||
l.gsub(/(\.\.\.?)([^\]$.])/, '`\1`\2') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a line of comment wouldn't hurt with these regexes. Maybe best with an example:
l.gsub(/(\.\.\.?)([^\]$.])/, '`\1`\2') | |
l.gsub(/(\.\.\.?)([^\]$.])/, '`\1`\2') # wrap ellipsis in backticks: ...something => `...`something |
I think the intended use is for [...<more>]
? Should we include the [
and ]
in the regex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is trying to differentiate the three dots in different contexts, where they have a different meaning and require different formatting.
First there is the form <commit1>...<commit2> when describing a range of commits, where the three dots are a "keyword" understood by git and must be formatted as code.
Then there is the forms used in the grammar to express repetition, such as in "<path> ..." with optionally square brackets, such as "[<path>...]" which usually appear at the end of the command line. These three dots must not be formatted as code, but left as is.
This line matches the former case and forces the corresponding format. I'll add a comment in the same line as yours.
def process parent, reader, attrs | ||
outlines = reader.lines.map do |l| | ||
l.gsub(/(\.\.\.?)([^\]$.])/, '`\1`\2') | ||
.gsub(%r{([\[\] |()>]|^)([-a-zA-Z0-9:+=~@,/_^\$]+)}, '\1{empty}`\2`{empty}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I don't know what this one is for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is the line that matches all the words which are not placeholders and not grammar signs, and format them as code. These words (in the general sense here) are keywords (option names, enum strings, two or three dot notation, etc).
outlines = reader.lines.map do |l| | ||
l.gsub(/(\.\.\.?)([^\]$.])/, '`\1`\2') | ||
.gsub(%r{([\[\] |()>]|^)([-a-zA-Z0-9:+=~@,/_^\$]+)}, '\1{empty}`\2`{empty}') | ||
.gsub(/(<([[:word:]]|[-0-9.])+>)/, '__\\1__') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to dig deep to find what [[:word:]]
does, but it seems to be a Ruby non-POSIX bracket expression: https://docs.ruby-lang.org/en/master/Regexp.html#class-Regexp-label-POSIX+Bracket+Expressions. Personally I'm not a fan, what's the advantage over \w
?
Also why are the inner brackets round brackets?
I just wonder if we can simplify to:
.gsub(/(<([[:word:]]|[-0-9.])+>)/, '__\\1__') | |
.gsub(/(<[^>]+>)/, '__\\1__') |
And one more question, why the double backslash in the replacement string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The '\w` is for ascii, but here, we are going to process internationalized texts (because placeholders are translated), and this processing requires the special form with double brackets. I'm not an expert in Ruby regexes; this is the form I have found to work well with the translations.
As for the using a more generic regex (expecting everything between brackets to be the placeholder's name), the placeholder's names are not supposed to contain spaces, which is perfect when we have to match something like:
$ git foo < in-file > out-file
if node.type == :monospaced | ||
node.text.gsub(/(\.\.\.?)([^\]$.])/, '<code>\1</code>\2') | ||
.gsub(%r{([\[\s|()>.]|^|\]|>)(\.?([-a-zA-Z0-9:+=~@,/_^\$]+\.{0,2})+)}, '\1<code>\2</code>') | ||
.gsub(/(<([[:word:]]|[-0-9.])+>)/, '<em>\1</em>') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we more or less need to repeat the regexes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's unfortunate, but the two regex are very alike, except that this one processes the text after some pre-processing steps, and the transformations need to be in final result form (with tags and escaped characters).
I evaluated the opportunity for factorization, but it makes the code more messy than it is already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really uneasy with this large amount of hard-to-understand regular expressions. Not only makes this bugs easy to hide, it also inadvertently opens the door to DoS attacks. Here is an example where something like this has had a really high impact.
It would probably make much more sense to implement a StringScanner
-based parser that is much easier to reason about and whose performance is well-understood, e.g. following this tutorial.
These regexes are basically the same ones that I already pushed to git/git. They are applied during the conversion phase, not in live, and only on the quoted strings of text, after the initial asciidoc parsing has been performed. Anyway, as they are cryptic, I can try to convert the code to a parser, but I doubt this will be a lot clearer. |
Changes
This PR changes the way the asciidoc source of manpage is processed, by adding the "synopsis" paragraph style and reworking the backtick format.
Context
The style change has been pushed to master and will be applied to git-clone and git-init in the next version.