Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New manpage format #1921

Open
wants to merge 2 commits into
base: gh-pages
Choose a base branch
from
Open

New manpage format #1921

wants to merge 2 commits into from

Conversation

jnavila
Copy link
Contributor

@jnavila jnavila commented Nov 17, 2024

Changes

This PR changes the way the asciidoc source of manpage is processed, by adding the "synopsis" paragraph style and reworking the backtick format.

Context

The style change has been pushed to master and will be applied to git-clone and git-init in the next version.

@jnavila jnavila changed the base branch from gh-pages to main November 17, 2024 21:57
@dscho
Copy link
Member

dscho commented Nov 18, 2024

I just triggered a pair of workflow runs to update the manual pages and to update the translated manual pages, fetched the result and rendered it locally. Here are two examples:

language before after
English image image
French image image

Personally, I cannot spot any difference, apart from the version number (because this here PR branch is based on v2.46.2 while the updated manual pages include v2.47.0) and the incorrect =<regexp> on the "before" side of the French version (fixed on the "after" side).

Even looking at the HTML of the synopses (taking the French version, so that there is a known difference), I only see this:

diff --git a/before b/after
index 1a87d1348..6185fb72b 100644
--- a/before
+++ b/after
@@ -1,5 +1,5 @@
 <pre class="content"><em>git config list</em> [&lt;option-de-fichier&gt;] [&lt;option-d-affichage&gt;] [--includes]
-<em>git config get</em> [&lt;option-de-fichier&gt;] [&lt;option-d-affichage&gt;] [--includes] [--all] [--regexp=&lt;regexp&gt;] [--value=&lt;valeur&gt;] [--fixed-value] [--default=&lt;default&gt;] &lt;nom&gt;
+<em>git config get</em> [&lt;option-de-fichier&gt;] [&lt;option-d-affichage&gt;] [--includes] [--all] [--regexp] [--value=&lt;valeur&gt;] [--fixed-value] [--default=&lt;default&gt;] &lt;nom&gt;
 <em>git config set</em> [&lt;option-de-fichier&gt;] [--type=&lt;type&gt;] [--all] [--value=&lt;valeur&gt;] [--fixed-value] &lt;nom&gt; &lt;valeur&gt;
 <em>git config unset</em> [&lt;option-de-fichier&gt;] [--all] [--value=&lt;valeur&gt;] [--fixed-value] &lt;nom&gt; &lt;valeur&gt;
 <em>git config rename-section</em> [&lt;option-de-fichier&gt;] &lt;ancien-name&gt; &lt;nouveau-name&gt;

@jnavila what am I missing?

@dscho dscho changed the base branch from main to gh-pages November 18, 2024 19:01
@jnavila jnavila marked this pull request as draft November 18, 2024 21:36
@jnavila
Copy link
Contributor Author

jnavila commented Nov 18, 2024

The manpage of git-config has not been converted yet.
I pushed a branch "test-refactor" on git-html-l10n, where I hand-edited fr/git-add.txt.

After importing, here is the result:

image

I'm not satisfied with the styles, particularly when dealing with inline formats:

image

you can test by yourself locally, and tell me your judgment.

The new style makes the code spans lighter and more integrated into
the text. The new style also makes the code spans more readable and
less intrusive.

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
This commit adds a upcoming manpage format to the AsciiDoc
backend. The new format changes are:

 * The synopsis is now a section with a dedicated style. This
 "synopsis" style allows to automatically format the keywords as
 monospaced and <placeholders> as italic.
 * the backticks are now used to format synopsis-like syntax in inline
 elements.

All the manpages are processed with this format. It may upset the
formatting for older manpages, making it not consistent across a page,
but this will be a mild side effect, as this was not really consistent
before.

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
@jnavila jnavila marked this pull request as ready for review November 30, 2024 16:19
@jnavila
Copy link
Contributor Author

jnavila commented Dec 22, 2024

@dscho I updated the CSS, so it is ready for review.

Copy link
Contributor

@To1ne To1ne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnavila I've added a few questions. Thanks for this contribution.


def process parent, reader, attrs
outlines = reader.lines.map do |l|
l.gsub(/(\.\.\.?)([^\]$.])/, '`\1`\2')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a line of comment wouldn't hurt with these regexes. Maybe best with an example:

Suggested change
l.gsub(/(\.\.\.?)([^\]$.])/, '`\1`\2')
l.gsub(/(\.\.\.?)([^\]$.])/, '`\1`\2') # wrap ellipsis in backticks: ...something => `...`something

I think the intended use is for [...<more>]? Should we include the [ and ] in the regex?

Copy link
Contributor Author

@jnavila jnavila Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is trying to differentiate the three dots in different contexts, where they have a different meaning and require different formatting.

First there is the form <commit1>...<commit2> when describing a range of commits, where the three dots are a "keyword" understood by git and must be formatted as code.

Then there is the forms used in the grammar to express repetition, such as in "<path> ..." with optionally square brackets, such as "[<path>...]" which usually appear at the end of the command line. These three dots must not be formatted as code, but left as is.

This line matches the former case and forces the corresponding format. I'll add a comment in the same line as yours.

def process parent, reader, attrs
outlines = reader.lines.map do |l|
l.gsub(/(\.\.\.?)([^\]$.])/, '`\1`\2')
.gsub(%r{([\[\] |()>]|^)([-a-zA-Z0-9:+=~@,/_^\$]+)}, '\1{empty}`\2`{empty}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't know what this one is for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is the line that matches all the words which are not placeholders and not grammar signs, and format them as code. These words (in the general sense here) are keywords (option names, enum strings, two or three dot notation, etc).

outlines = reader.lines.map do |l|
l.gsub(/(\.\.\.?)([^\]$.])/, '`\1`\2')
.gsub(%r{([\[\] |()>]|^)([-a-zA-Z0-9:+=~@,/_^\$]+)}, '\1{empty}`\2`{empty}')
.gsub(/(<([[:word:]]|[-0-9.])+>)/, '__\\1__')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to dig deep to find what [[:word:]] does, but it seems to be a Ruby non-POSIX bracket expression: https://docs.ruby-lang.org/en/master/Regexp.html#class-Regexp-label-POSIX+Bracket+Expressions. Personally I'm not a fan, what's the advantage over \w?

Also why are the inner brackets round brackets?

I just wonder if we can simplify to:

Suggested change
.gsub(/(<([[:word:]]|[-0-9.])+>)/, '__\\1__')
.gsub(/(<[^>]+>)/, '__\\1__')

And one more question, why the double backslash in the replacement string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The '\w` is for ascii, but here, we are going to process internationalized texts (because placeholders are translated), and this processing requires the special form with double brackets. I'm not an expert in Ruby regexes; this is the form I have found to work well with the translations.

As for the using a more generic regex (expecting everything between brackets to be the placeholder's name), the placeholder's names are not supposed to contain spaces, which is perfect when we have to match something like:

 $ git foo < in-file > out-file

if node.type == :monospaced
node.text.gsub(/(\.\.\.?)([^\]$.])/, '<code>\1</code>\2')
.gsub(%r{([\[\s|()>.]|^|\]|&gt;)(\.?([-a-zA-Z0-9:+=~@,/_^\$]+\.{0,2})+)}, '\1<code>\2</code>')
.gsub(/(&lt;([[:word:]]|[-0-9.])+&gt;)/, '<em>\1</em>')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we more or less need to repeat the regexes here?

Copy link
Contributor Author

@jnavila jnavila Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's unfortunate, but the two regex are very alike, except that this one processes the text after some pre-processing steps, and the transformations need to be in final result form (with tags and escaped characters).

I evaluated the opportunity for factorization, but it makes the code more messy than it is already.

Copy link
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really uneasy with this large amount of hard-to-understand regular expressions. Not only makes this bugs easy to hide, it also inadvertently opens the door to DoS attacks. Here is an example where something like this has had a really high impact.

It would probably make much more sense to implement a StringScanner-based parser that is much easier to reason about and whose performance is well-understood, e.g. following this tutorial.

@jnavila
Copy link
Contributor Author

jnavila commented Jan 4, 2025

These regexes are basically the same ones that I already pushed to git/git. They are applied during the conversion phase, not in live, and only on the quoted strings of text, after the initial asciidoc parsing has been performed.

Anyway, as they are cryptic, I can try to convert the code to a parser, but I doubt this will be a lot clearer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants