97 mdz_helpers.ts
Shared constants and pure helper functions for mdz parsers.
Used by both the single-pass parser (mdz.ts) and the two-phase lexer+parser (mdz_lexer.ts + mdz_token_parser.ts).
Declarations #
48 declarations
A_LOWER #
A_UPPER #
65 AMPERSAND #
38 APOSTROPHE #
39 ASTERISK #
42 AT #
64 BACKTICK #
96 COLON #
58 COMMA #
44 DOLLAR #
36 EQUALS #
61 EXCLAMATION #
33 extract_single_tag #
(nodes: MdzNode[]): MdzElementNode | MdzComponentNode | null nodes
MdzNode[]returns
MdzElementNode | MdzComponentNode | null HASH #
35 HR_HYPHEN_COUNT #
3 HTTP_PREFIX_LENGTH #
7 HTTPS_PREFIX_LENGTH #
8 HYPHEN #
45 is_at_absolute_path #
(text: string, index: number): boolean Check if position in text is the start of an absolute path (starts with /).
Must be preceded by whitespace or be at the start of the string.
Rejects // (comments/protocol-relative) and / (bare slash).
text
stringindex
numberreturns
boolean is_at_relative_path #
(text: string, index: number): boolean Check if position in text is the start of a relative path (./ or ../).
Must be preceded by whitespace or be at the start of the string.
Requires at least one path character after the prefix.
text
stringindex
numberreturns
boolean is_letter #
(char_code: number): boolean Check if character code is a letter (A-Z, a-z).
char_code
numberreturns
boolean is_tag_name_char #
(char_code: number): boolean Check if character code is valid for tag name (letter, number, hyphen, underscore).
char_code
numberreturns
boolean is_valid_path_char #
(char_code: number): boolean Check if character code is valid in URI path per RFC 3986.
Validates against the pchar production plus path/query/fragment separators.
Valid characters: - unreserved: A-Z a-z 0-9 - . _ ~ - sub-delims: ! $ & ' ( ) * + , ; = - path allowed: : @ - separators: / ? # - percent-encoding: %
char_code
numberreturns
boolean is_word_char #
(char_code: number): boolean Check if character is part of a word for word boundary detection.
Used to prevent intraword emphasis with _ and ~ delimiters.
Formatting delimiters (*, _, ~) are NOT word characters - they're transparent.
Only alphanumeric characters (A-Z, a-z, 0-9) are considered word characters.
This prevents false positives with snake_case identifiers while allowing
adjacent formatting like **bold**_italic_.
char_code
numberreturns
boolean LEFT_ANGLE #
60 LEFT_BRACKET #
91 LEFT_PAREN #
40 MAX_HEADING_LEVEL #
6 MIN_CODEBLOCK_BACKTICKS #
3 NEWLINE #
10 NINE #
57 PERCENT #
37 PERIOD #
46 PLUS #
43 QUESTION #
63 RIGHT_ANGLE #
62 RIGHT_BRACKET #
93 RIGHT_PAREN #
41 SEMICOLON #
59 SLASH #
47 SPACE #
32 TAB #
9 TILDE #
126 trim_trailing_punctuation #
(url: string): string Trim trailing punctuation from URL/path per RFC 3986 and GFM rules. - Trims simple trailing: .,;:!?] - Balanced logic for () only (valid in path components) - Invalid chars like [] {} are already stopped by whitelist, but ] trimmed as fallback
Optimized to avoid O(n²) string slicing - tracks end index and slices once at the end.
url
stringreturns
string UNDERSCORE #
95 Z_LOWER #
122 Z_UPPER #
90 ZERO #
48