Stream: Google Season of Docs

Topic: BRL-CAD's Documentation Infrastructure


view this post on Zulip Dashamir Hoxha (Apr 13 2021 at 17:00):

I am interested about your GSoD'21 project proposal. I found about it on the season-of-docs repo. It is a well defined project and I am sure it will be accepted.

Most of the documentation projects have an implicit requirement that you get familiar with the software itself, since you are expected to improve the content of the documentation. In my opinion this is the most difficult part of the project. But this project only seeks to improve the infrastructure of the documentation, and that's why I find it easier than the others.

Of course it needs a lot of work and it may have its own difficulties, but I am familiar with documentation systems, conversion tools, etc. I am sure that I will find some way to automate the conversion of the existing Docbook XML docs to AsciiDoc. Even if there are no existing tools that are appropriate for this task, I can develop my own conversion tools; Docbook XML is easy to parse and process by a program and I have some experience with doing this (many years ago). Before being a technical writer, I am a computer engineer, software developer and hacker, so I am sure that I can find a way.

You can find some links to my previous documentation work on this page: https://gitlab.com/-/snippets/1861138
Please let me know if you have any questions, or would like to know more details.

view this post on Zulip Sean (Apr 13 2021 at 19:01):

Hi @Dashamir Hoxha and thank you for your optimism. Hope it gets accepted too!

You are quite correct -- if you were working on improving or even re-arranging BRL-CAD's documentation, it would help to have some basic familiarity and we have tutorials to help (literally anyone at any skill level) with that. You still might want to check them out for better understand, but it's not strictly required for GSoD.

Automated conversion is almost certainly going to be essential, but we know it won't cover all existing docs, so some manual conversion will likely still be required.

view this post on Zulip Sean (Apr 16 2021 at 19:30):

Relevant FOSDEM presentations:
https://fosdem.org/2021/schedule/event/ttdpostgresdocbook/

view this post on Zulip Sean (Apr 16 2021 at 19:30):

https://fosdem.org/2021/schedule/event/ttdasciidocantora/

view this post on Zulip Sean (Apr 16 2021 at 19:30):

https://fosdem.org/2021/schedule/event/ttddoctoolchain/

view this post on Zulip starseeker (Apr 16 2021 at 19:56):

So... is the intent then to replace the DocBook->html and DocBook->man page tool chains in our current build?

view this post on Zulip starseeker (Apr 16 2021 at 20:02):

To duplicate the current capability to bootstrap the doc building toolchain from only the main compiler, it looks like we'd have to bootstrap ruby and then build asciidoctor (plus whatever deps it may have...)

view this post on Zulip Dashamir Hoxha (Apr 16 2021 at 20:16):

Thanks @Sean , I am going to have a look at these presentations.
This project is the first in my preference list. If you accept me, then I don't have to apply to any other projects. Instead, I can start working on this one (getting familiar, etc.)

view this post on Zulip Sean (Apr 16 2021 at 21:36):

@starseeker multiple intents... one is to seriously give something new a trial for docs, and another is to more towards docs as a submodule, and yet another is to simply invigorate doc infrastructure activity.

view this post on Zulip Sean (Apr 16 2021 at 21:37):

the hope, of course, is to unify with our other docs (e.g., presentations, wiki), find something more easily editable/readable for non-dev contributors, still produce web+pdf+man outputs.

view this post on Zulip Sean (Apr 16 2021 at 22:00):

That was part of the strategic interest in Antora/Asciidoc as it is supported by github, somewhat compatible with markdown, also a simple readable/editable format, and converts to docbook very well. That keeps some processing options open for us if we find, for example, we want or need to keep the existing manpage tooling.

I'm not strongly committed yet to Antora, but after weeks of research, it checked the most requirement boxes. It's a bit of a bikeshed considering mk/rst/adoc doc formats and their associated presentation systems (antora, sphynx, rtd, mkdocs, hugo, docusaurus, etc), but adoc is a really close fit with with our current docbook so I'm hopeful that strike a balance for representing the things we do in docbook while still providing trivial authoring. and it's got a lot of development momentum currently, which doesn't hurt.

view this post on Zulip Sean (Apr 16 2021 at 22:03):

The postgresql folks talking about their 20 years with docbook and how it's been a love+hate relationship riddled with issues hits a little close to home... :)

view this post on Zulip Dashamir Hoxha (Apr 16 2021 at 22:42):

From the presentation, it seems that Antora+AsciiDoc is the right tool for the job.

view this post on Zulip starseeker (Apr 17 2021 at 02:32):

/me looks at ruby and asciidoc, winces - if we go that route, it's probably going to mean requiring an installed Ruby/asciidoc setup. I doubt I'll have time to hammer that into a bootstrapped toolchain build cross platform...

view this post on Zulip starseeker (Apr 17 2021 at 02:41):

@Sean Would it be possible to consider using OpenBSD's mandoc for just the man pages? In some ways they're the most crucial to have bundled with the CAD system proper, and mandoc is almost certainly a lot easier to bootstrap...

view this post on Zulip starseeker (Apr 17 2021 at 02:49):

If we need asciidoc to feed Antora, we could see if mandoc's markdown output and https://github.com/asciidoctor/kramdown-asciidoc could act as an adapter.

view this post on Zulip Sean (Apr 17 2021 at 03:22):

starseeker said:

/me looks at ruby and asciidoc, winces - if we go that route, it's probably going to mean requiring an installed Ruby/asciidoc setup. I doubt I'll have time to hammer that into a bootstrapped toolchain build cross platform...

I wouldn't worry about integration issues until after decent progress is made proving the end-result is worthwhile.

view this post on Zulip Sean (Apr 17 2021 at 03:29):

Looks like there's a javascript port that should work cross-platform without much fuss (e.g., via cscript.exe or wsript.exe on windows), but I think it shouldn't be too hard to detect whatever is required and/or provide it from a devtooling submodule.

view this post on Zulip Sean (Apr 17 2021 at 03:47):

I agree that manpages are critical but I think all the docs need to be reliably generated and integrated at least as well as they are now. Towards that, I'm more inclined to leave the docbook infrastructure for manpages until it's sorted reliably. I don't think we should revert back to man/mdoc/troff files... that'd be quite a regression, atrocious editability.

view this post on Zulip Sean (Apr 17 2021 at 04:44):

Looks like the asciidoc format was recently updated to make it even more concise, and coincidentally aligns more closely with markdown syntax. Previous processor is implemented in python. That's another option, though obviously only viable if we limit ourselves to the previous syntax/features.

view this post on Zulip starseeker (Apr 17 2021 at 12:53):

mdoc is probably less mainstream than docbook, but I wouldn't have thought it would be an edit-ability regression (at least compared to docbook) for that specific purpose... It should at least be much less verbose. I suppose it might be less editable than asciidoc... I should try to translate some man pages into both and see how steep the respective learning curves are. As you say though, only comes up once decent asciidoc progress has been demonstrated.

view this post on Zulip Sean (Apr 19 2021 at 04:30):

I must be missing something... mdoc is the old manpage format with a set of prescribed macros, right? Or did some group name their format same as the old one? You do realize BRL-CAD's manpages used to all be in that format, right?? They were not better than the current setup by any stretch.

Docbook is verbose, but it's declarative and relatively unencoded. You can infer what it represents simply by reading it. That makes editing (and reading source form) straightforward albeit technically tedious. That can't be said of mdoc.

Reading an existing page, there's no way to even guess what ".Nm This is something" or ".Qq yay arcane syntax" is going to do without reading a manual (hah) and/or seeing rendered output. It's not friendly to non-developers; heck I'd argue it's not developer-friendly either. It was fairly broadly abandoned as a source format some 20 years ago for that and many other reasons.

view this post on Zulip Sean (Apr 19 2021 at 04:56):

For what it's worth, the direction I have had in mind for our docs is prioritizing ease of non-technical authorship, web+pdf output, and feasibility for online editing. Note I intentionally prioritize those three things over tooling as the toolchain challenges can be managed. While it does still need to be proven, I do believe Antora can work out well for BRL-CAD with asciidoc as a source format. Asciidoc is less markup than Markdown while retaining the expressiveness of Docbook; so docs remain unencoded and become even simpler text files. Markdown would be a fallback option that Docusaurus appears to excel at, but markdown does not have as compatible or consistent a representation and offline/pdf output becomes much harder. Both are known for being about as good as it gets right now in terms of ease of non-technical editing, both have very good online editing options, both supported by Github too, but then Asciidoc reportedly handling technical documents much better and the structure lends itself to much easier pdf generation too. That's some of the motivation for trailblazing in this direction.

view this post on Zulip Dashamir Hoxha (Apr 19 2021 at 10:43):

Automated conversion is almost certainly going to be essential, but we know it won't cover all existing docs, so some manual conversion will likely still be required.

From the project plan: 2.2 What is not in scope?:

There are extensive docs not in Docbook XML format in our repository, on the website wiki, in other formats, and in other files that are desirable and may also be migrated (e.g., for testing purposes), but they are secondary priority and need not be considered in scope.

Wiki docs are written in markdown format, right? So, I think that it should not be difficult to convert them automatically to AsciiDoc. If there are no better tools, even a bash or sed script might do the job.

In general, if the docs have an editable format, which is later converted to a presentation format (like PDF, HTML, etj.) chances are that they can be converted automatically to AsciiDoc. If they exist only in a presentation format (PDF, HTML), most probably a full automatic conversion is not possible and some manual intervention and editing would be required.

view this post on Zulip starseeker (Apr 19 2021 at 11:56):

@Sean That's fair. (I think OpenBSD's mdoc is a variation on the original man page syntax.) I never directly edited much in the original format for the man pages or OpenBSD's mdoc variation, so I don't have any real sense of how hard it is to work with for that purpose (I do know it's not well suited for general typesetting.) When we made the docbook conversion OpenBSD's mdoc/mandoc didn't exist, so it wouldn't have been a factor in the original decision. My initial thought was the potential simplicity of that toolchain has a lot to recommend it, but if the format it's based on is specifically on the to-avoid list then it won't change the calculus.

The phrase "toolchain challenges can be managed" makes me flinch a little, but if we get significantly better results from the contribution and display side we can give it a go.

view this post on Zulip Sean (Apr 19 2021 at 12:24):

With the old troff format (which is essentially what man pages are), you can describe macros that essentially do prescribed typesetting. There used to be a bunch of competing sets that all did similar things to create what we recognize as the general standard sections of a man page. Looking at the mdoc manual page, it's simply one of those macro sets that they settled on.

view this post on Zulip Sean (Apr 19 2021 at 12:25):

Which is to say that it's nothing "new" per se, they just slapped a name on a convention. It's still troff/man format. Which isn't to say it's good or bad -- there are some nice things about man pages. That was just a ship that literally sailed 2 decades ago... :)

view this post on Zulip Sean (Apr 19 2021 at 12:27):

starseeker said:

The phrase "toolchain challenges can be managed" makes me flinch a little, but if we get significantly better results from the contribution and display side we can give it a go.

That's totally fair. Maybe one of the things for the GSoD project can be to also proof the integration challenges to make sure there's not a show-stopper.

view this post on Zulip starseeker (Apr 19 2021 at 14:58):

/me nods. The potential game changer for me was the mandoc program, which (if I'm understanding correctly) can produce html, man, PDF and Postscript from a single source document while being completely self contained. Assuming it really can, that's even better than our docbook toolchain (we need Apache FOP for PDF output), although it's probably a question how sophisticated its formatting is for things like tables.

Oh well. Maybe I can teach mandoc to read asciidoc man page inputs.

view this post on Zulip Sean (Apr 19 2021 at 15:59):

We really need doc unification too. I mean that's arguably the biggest obstacle we've had with docbook. Setup is such a chore. It's a bit of a bear to convert something that's been formatted in InDesign or even Word.

To me, our current setup is a perfectly sufficient solution for man pages (there's not much value/point to PDF manual pages other than completeness) because it generates good manual pages flawlessly. It's a solution for the tutorials too because they're converted. The pain is new non-man docs, online editing, importing existing docs, etc.

view this post on Zulip Sean (Apr 23 2021 at 15:49):

Well now that's an interesting option. I just successfully compiled asciidoctor.js to a standalone binary executable. There's a javascript compiler that worked surprisingly well.

So it's at least feasible that we could precompile for all concerned environments and pull them from a repo en lieu of requiring the ruby toolchain. The asciidoctor.js tool can also simply be run directly on some platforms (e.g., on Windows).

Push comes to shove, looking at the size and complexity of the code, I think we could transcode asciidoctor from ruby or javascript to C++ in a week too, though that'd be a burden to keep in sync with upstream changes.

view this post on Zulip Dashamir Hoxha (Apr 30 2021 at 07:15):

Dashamir Hoxha said:

I am interested about your GSoD'21 project proposal.

I am familiar with documentation systems, conversion tools, etc. I am sure that I will find some way to automate the conversion of the existing Docbook XML docs to AsciiDoc. Even if there are no existing tools that are appropriate for this task, I can develop my own conversion tools; Docbook XML is easy to parse and process by a program and I have some experience with doing this (many years ago). Before being a technical writer, I am a computer engineer, software developer and hacker, so I am sure that I can find a way.

You can find some links to my previous documentation work on this page: https://gitlab.com/-/snippets/1861138
Please let me know if you have any questions, or would like to know more details.

Is it useful/desirable to have an automatic conversion from AsciiDoc to Docbook as well (after the migration to AsciiDoc is done)? Or you just want to get rid of Docbook and its toolchain?

view this post on Zulip Sean (May 03 2021 at 19:03):

@Dashamir Hoxha I don't think we will know that for a while.

view this post on Zulip Sean (May 03 2021 at 19:04):

I mean, I do believe less complexity is better and -- in general -- eliminating complexity and/or dependencies can be a good thing for maintainability. BUT... the key consideration here is ease of authorship / editing and unification of BRL-CAD's documentation both online and offline.

view this post on Zulip Simos (May 12 2021 at 10:10):

I am interested in applying for BRLCAD.

view this post on Zulip Simos (May 12 2021 at 11:33):

I am Simos Xenitellis, https://blog.simos.info/

  1. My first interaction with DocBook has been this guide I wrote, http://ospkibook.sourceforge.net/docs/OSPKI-2.4.7/ It is in SGML :grimacing:
  2. I've been a fan of DocBook XML since then. Dealing with issues about i18n, trying different GUI editors, using vi in the end.
  3. The conversion to AsciiDoc requires correctness. I'll compare the HTML output of DocBook XML with the corresponding HTML output of AsciiDoc.
  4. The conversion task may require Shell scripting, Python or Perl. I use these. I have a background in system administration.
  5. The tool for the conversion is likely going to be DocBookRx. It is written in Ruby. It says that it's not complete yet, and I'll investigate if any of the forks is more viable (such as the OpenSUSE fork). Btw, OpenSUSE opted to keep DocBook XML but support AsciiDoc as an additional format. New documents are maintained in AsciiDoc and converted to DocBook XML in their case (https://opensuse.github.io/daps/).

Last updated: Jan 10 2025 at 00:48 UTC