packet:xrouter:manpages:parsing
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
packet:xrouter:manpages:parsing [2025/04/20 07:20] – m0mzf | packet:xrouter:manpages:parsing [2025/04/22 02:40] (current) – removed m0mzf | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Script to parse Xrouter' | ||
- | <file awk parse-pzt-manhlp.sh> | ||
- | #!/bin/bash | ||
- | ################################## | ||
- | # by Jason M0MZF (not a programmer!) | ||
- | # bash / awk / hammer / nail etc. | ||
- | # License - MIT. Crack on people. | ||
- | # | ||
- | # Script to parse Paula G8PZT' | ||
- | # "some simple markup language" | ||
- | # https:// | ||
- | # | ||
- | # The intention is to parse all MAN / HLP files within the folders and | ||
- | # write them with appropriate formatting to files which can then be | ||
- | # pasted directly into the wiki. | ||
- | # | ||
- | # This could also be done with groff > HTML > pandoc > ssml but pandoc' | ||
- | # output format for SSML doesn' | ||
- | # don't know Lua. Yet. Maybe something like this with a custom output formatter: | ||
- | # cat ${manpage} | groff -Thtml -P -l -mmandoc 2>/ | ||
- | # But when all you've got is awk, everything looks like a record / field... ;) | ||
- | # | ||
- | ################################## | ||
- | # Instructions (destructions? | ||
- | # | ||
- | # - This script does not take any arguments | ||
- | # - The only required configuration is to set the following path | ||
- | # | ||
- | BASEPATH=/ | ||
- | # | ||
- | # This folder should contain the two directories " | ||
- | # and " | ||
- | # called " | ||
- | # concatenated and reformatted files. A manually-created | ||
- | # index page exists in https:// | ||
- | # top-level contents, and the pages linked therein have their contents | ||
- | # copypasta' | ||
- | # | ||
- | ################################## | ||
- | # Changelog | ||
- | # 20250418 - Implemented MAN page parsing | ||
- | # 20250419 - Implemented HLP page parsing | ||
- | # 20250419 - Tidy up, more awk less bash, remove .MAN / .HLP from outputted headers | ||
- | ################################## | ||
- | # Globals | ||
- | DATE=$(date +" | ||
- | MANFILES=" | ||
- | HLPFILES=" | ||
- | OUTPUTDIR=" | ||
- | |||
- | # Handy functions | ||
- | echoRed () { | ||
- | echo -e " | ||
- | } | ||
- | echoGreen () { | ||
- | echo -e " | ||
- | } | ||
- | checkRoot () { | ||
- | if [[ $UID -eq 0 ]]; then; | ||
- | echoRed " | ||
- | fi | ||
- | } | ||
- | |||
- | # Use awk to: | ||
- | # strip out comment lines and remove any <CR> from < | ||
- | # turn the MAN page header into a code block, it contains a revision date | ||
- | # find every subsequent MAN page header and turn it into a docuwiki header and | ||
- | # | ||
- | # | ||
- | # (the final encapsulation is done using " | ||
- | awkParseMan=' | ||
- | if (NR==1 || NR==2) # For the first two lines | ||
- | { | ||
- | gsub(" | ||
- | if (/^;/ || NF==0) {next} # skip the subsequent print function for comment or empty lines | ||
- | print "< | ||
- | } | ||
- | |||
- | if (NR> | ||
- | { | ||
- | if (/^[A-Z]/) # If the line begins with a character | ||
- | { | ||
- | starthead="</ | ||
- | endhead=" | ||
- | gsub(" | ||
- | print starthead $0 endhead # and output the line | ||
- | } | ||
- | else # else for all other lines | ||
- | { | ||
- | if (/^;/) {next} # skip comment lines | ||
- | gsub(" | ||
- | print $0 # and output the line | ||
- | } | ||
- | } | ||
- | } | ||
- | ' | ||
- | # Use awk to: | ||
- | # strip out comment lines (this is always line 1, sometime 2 and 3) and remove any <CR> from < | ||
- | # insert a start code block in place of the now-empty line 1 | ||
- | # (the final encapsulation is done using " | ||
- | awkParseHlp=' | ||
- | endhead="< | ||
- | gsub(" | ||
- | if (NR==1) {print endhead} # | ||
- | if (/^;/ || NF==0) {next} # skip comment / empty lines | ||
- | print $0 # output the refined line | ||
- | } | ||
- | ' | ||
- | # Use awk to extract a section name from the directory structure | ||
- | awkSectionHeader=' | ||
- | BEGIN { FS="/" | ||
- | { # / | ||
- | header=" | ||
- | print header $(NF-1) header # the penultimate field is the section name | ||
- | } | ||
- | ' | ||
- | # Use awk to extract a name from the filename.extension | ||
- | awkFileHeader=' | ||
- | BEGIN { FS=" | ||
- | { # because we want the file name from FILENAME.MAN | ||
- | header=" | ||
- | print header $1 header # the first field is the file name | ||
- | } | ||
- | ' | ||
- | |||
- | parseFiles () { | ||
- | mkdir -p " | ||
- | # Traverse folders, skipping files in base directory | ||
- | for folder in " | ||
- | do | ||
- | # Get the penultimate field in file path, i.e. the section (folder) name | ||
- | section=$(echo $folder | awk -F/ ' | ||
- | # Format the section name as a docuWiki header | ||
- | echo " | ||
- | # Spit some stuff out to the shell | ||
- | echoRed " | ||
- | # Traverse through files | ||
- | for file in " | ||
- | do | ||
- | # Get the last field in file path, i.e. file name | ||
- | title=$(echo $file | awk -F/ ' | ||
- | # Format the file name as a docuwiki header | ||
- | echo " | ||
- | case " | ||
- | # For MAN files, after awk has done it's job we need to remove the last line; this last line breaks | ||
- | # the following < | ||
- | MANFILES) awk " | ||
- | echo -e "</ | ||
- | ;; | ||
- | # For HLP files we don't want to remove the last line because that truly is real content | ||
- | HLPFILES) awk " | ||
- | echo -e "</ | ||
- | ;; | ||
- | esac | ||
- | done | ||
- | done | ||
- | } | ||
- | |||
- | #Let's go! | ||
- | checkRoot | ||
- | echoGreen " | ||
- | parseFiles MANFILES | ||
- | echoGreen " | ||
- | parseFiles HLPFILES | ||
- | |||
- | </ |
packet/xrouter/manpages/parsing.1745133653.txt.gz · Last modified: by m0mzf