STXT Documents
1. Introduction2. Terminology
3. Document Encoding
4. Syntactic Unit: Node
5. Nodes with `:` (container nodes, allow inline value)
6. Nodes with `>>` (text block)
7. Namespaces
8. Indentation and Hierarchy
9. Comments
10. Whitespace normalization
11. Error Rules
12. Conformance
13. File Extension and Media Type
14. Normative Examples
15. Security Considerations
16. Appendix A — Grammar (Informal)
17. Appendix B — Interaction with `@stxt.schema`
18. Appendix B — Interaction with `@stxt.template`
19. End of Document
1. Introduction
This document defines the specification of the STXT (Semantic Text) language.
STXT is a Human-First language, designed so that its natural form is readable, clear, and comfortable for people, while at the same time maintaining a precise structure and easily processable by machines.
STXT is a hierarchical and semantic textual format aimed at:
- Representing documents and data clearly.
- Being extremely simple to read and write.
- Being trivial to parse in any language.
- Allowing both structured content and free text.
- Extending its semantics via
@stxt.schemaor@stxt.template. - Facilitating the creation of parsers while trying to minimize security errors
This document describes the base syntax of the language.
2. Terminology
The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" are to be interpreted as described in RFC 2119.
3. Document Encoding
An STXT document SHOULD be encoded in UTF-8 without BOM.
A parser:
- SHOULD accept documents that begin with a BOM.
- MAY emit a warning for documents that begin with a BOM.
4. Syntactic Unit: Node
Each non-empty line of the document that is not a comment nor part of a >> block defines a node.
There are two forms of node:
- Inline container node (INLINE text node):
Node name: Inline value - Text block node (BLOCK text node):
Node name >>
The node name cannot be empty. A line with only : or >> is not valid.
Example with INLINE nodes:
Node 1: Inline value Node 2 without value: Node 3 with another value: this is the other value
Example with a BLOCK node:
Block node >> This is the content of the text block: - Leading spaces and line breaks are preserved - Right trim is applied - Left trim is NOT applied
A node may optionally include a namespace:
Name (namespace.normal): Name (@namespace.special):
4.1 Node name normalization
The node name is taken from the text between:
- The first character not belonging to the indentation, and
- The first character that belongs to any of:
- The start of a namespace
(, - The character
:, - The operator
>>,
- The start of a namespace
On that fragment, apply:
- Removal of leading and trailing spaces and tabs (trim).
- Compaction of spaces into a single one
The result of this normalization is the node name.
A node whose logical name is the empty string ("") is invalid and MUST cause a parse error.
Equivalent examples at the Node name level:
Node name: Node name: value Node name : value Node name (@a.special.namespace): Node name(a.normal.namespace): Node name >> Node name>>
The definition of a node must always include either : (INLINE container node) or >> (BLOCK text node),
always preceded by a non-empty name.
4.2 Node name restrictions
The node name will only allow alphanumeric characters and the characters -, _, .
Names with diacritics, uppercase and lowercase are allowed.
4.3 Canonical node name
The canonical name is formed from the node name through the following process:
- Unicode decomposition (NFKD)
- Conversion to lowercase
- Removal of diacritics
- Space compaction (not necessary on an already normalized name)
- Replace [^a-z0-9] with
-. Two or more consecutive hyphens are not allowed; they must be compacted into a single one (-) - Remove hyphens (
-) at the beginning and end if present
The canonical name will be used to know whether a node has the same name as another. It will also be used internally by all search or check operations, to know whether it is the same element.
Transformation examples:
A namé with äccent: a-name-with-accent AN NAME with äccent: a-name-with-accent SIZe number 2__ and 3: size-number-2-y-3
4.4 Style rules
The recommended style rules are as follows:
- Separate the name from the definition of a namespace with a single space
- Separate
:from the value with a single space :goes immediately after the name or the namespace if present>>has no character after it- Separate the node name or the namespace with a space before
>> - Do not use more than one space in names
- Namespace without spaces in the definition
(namespace.def)
Examples of correct style:
Name with value: The value Name without value: Name with namespace (the.namespace): Text node >>
5. Nodes with `:` (container nodes, allow inline value)
The form with : defines a node that:
- May have a value (optional).
- May have no value (empty node).
- May have children (nested nodes).
- Its structured content includes:
- The node line itself.
- Its descendants with greater indentation
Examples:
Title: Report
Author: Joan
Node:
Node: Value
Node:
SubNode 1: 123
Another subnode: 456
5.1 Value normalization
The (INLINE) value of a node must be normalized with a trim (right and left).
Example:
Name: value 1 Name: value 1 # in both cases, the inline value of Name is "value 1", even though in the # second there are spaces before and after.
Strong normalization applies only to structural identifiers. Values are literals, although a simple normalization is applied: left and right trim.
6. Nodes with `>>` (text block)
The form with >> defines a literal text block.
Valid examples:
Description >>
Line 1
Line 2
Section>>
Accepts the operator without a space
6.1 Formal rules
- The
>>node line MUST NOT contain meaningful content after>>, except optional spaces. - All lines with indentation strictly greater than that of the
>>node belong to the textual content of the block. - Within the block:
- The parser MUST NOT interpret any line as a structured node, even if it contains
:or other STXT syntax. - The parser MUST NOT interpret lines that begin with
#as comments; all lines are literal text.
- The parser MUST NOT interpret any line as a structured node, even if it contains
- The block ends when a non-empty line that is not a comment appears whose indentation is less than or equal to the indentation of the
>>node. - Empty lines within the block are preserved and MUST NOT close the block, regardless of their indentation.
6.2 Example
Block >>
Text
Child: value YES allowed, it is text, it is not parsed
Another child: YES allowed
# This is also text
Next Node: value
In this example:
- Everything indented below
Block >>is literal text. Child: valueandAnother child: YES allowedare not nodes, but text.Next Node: valueis outside the>>block.
7. Namespaces
A namespace is optional and is specified like this:
Node (com.example.docs): Another node (another.namespace): More nodes (@a.special.name):
Rules:
- A namespace MAY start with
@. - It MUST use hierarchical format (
a.b.c), with at least 2 elements (a.b). - It is inherited by child nodes.
- If no namespace is specified, by default it is the empty namespace
"". - The empty namespace cannot be specified as
Node name (). - A node’s empty namespace is always overwritten with the parent’s namespace
- A child node may redefine its namespace by indicating
(another.namespace), in which case it uses that namespace instead of the inherited one. - Only characters within the range [a-z0-9] are allowed, optionally preceded by
@, to indicate that it is a special namespace. - A parser MUST lowercase a namespace. Thus it must convert from
Name (COM.DEMO.DOCS)toName (com.demo.docs)internally - By style rules a namespace should be written in lowercase
8. Indentation and Hierarchy
Indentation defines the structured hierarchy of the document.
8.1 Allowed Indentation
An STXT document:
- MAY use spaces or tabs for indentation.
- It is not recommended to mix spaces and tabs on the same line. A parser MAY warn in that case. If they are mixed, spaces fewer than 4 are discarded if a tab appears. Following the ** Human First** principle, this rule is to ensure that a document that looks correct really is. That is, that it is not necessary to review lines character by character to know whether something is correct or not; it should be visible at a glance.
- If it uses spaces:
- It MUST use multiples of 4 spaces to go up a level.
- If it uses tabs:
- Each tab represents exactly 1 level.
- All possible cases of going up a level are shown:
- 0 Spaces + 1 TAB
- 1 Spaces + 1 TAB
- 2 Spaces + 1 TAB
- 3 Spaces + 1 TAB
- 4 Spaces + 0 TAB
- Once a level has been increased, the previous rules apply again. That is, the count is reset at each level increase.
8.2 Special indentation examples
In the following examples . is shown to identify a space, and |--> to identify a Tab.
The tab will be shown with the characters missing until reaching the next column, like a text editor.
Example with tabs:
Node level 0: Value level 0 |-->Node level 1: |-->Another node level 1: |-->|-->Level 2: |-->|-->Level 2: |-->Level 1: |-->Level 1:
Example with spaces:
Node level 0: Value level 0 ....Node level 1: ....Another node level 1: ........Level 2: ........Level 2: ....Level 1: ....Level 1:
Example with a mix of spaces and tabs.
Allowed, though not recommended by style. A parser MAY warn about mixing on the same line. This example has the same indentation as the two previous ones.
Node level 0: Value level 0 .|->Node level 1: Space + 1 TAB: level 1 ..|>Another node level 1: 2 Spaces + 1 TAB: level 1 ...>..|>Level 2: 3 Spaces + 1 TAB, 2 Spaces + 1 TAB: level 2 |-->....Level 2: 1 TAB, 4 Spaces: level 2 ..|>Level 1: 2 Spaces + 1 TAB: level 1 .|->Level 1: 1 Space + 1 TAB: level 1
8.3 Level errors
A parser MUST raise a parse error in the following cases:
- Non-consecutive levels:
Level 0: ....Level 1: ............Level3: ERROR, you cannot go from level 1 to level 3
- Not reaching a multiple of 4 when using spaces or mixing
Level 0: ....Level 1 ...Almost level 1: ERROR: 3 spaces (does not reach 4) Level 0: ....Level 1: .|->..Almost level 2: ERROR: 1 spaces + 1TAB, 2 spaces Level 0: ....Level 1: ..........More than level 2: ERROR: 4 spaces, 4 spaces, 2 spaces
8.4 Hierarchy
- Indentation MUST increase consecutively (no jumps allowed).
- Child nodes MUST have greater indentation than their parent.
- Indentation within a
>>block does not affect the structural hierarchy: it is simply text.
9. Comments
Outside >> blocks, a line is a comment if, after its indentation, the first character is #.
Example:
# Root comment
Node:
# Inner comment
9.1 Comments inside `>>` blocks
Inside a >> block:
- Any line with indentation equal to or greater than the minimum indentation of the block
MUST be treated as literal text, even if it starts with
#. - A line less indented than the block that is not a comment ends the block
Example:
# A normal comment (level 0)
Document:
# Another normal comment
# This is also a comment! Outside >> block
Text >>
# This is text
Normal line
# This is also normal text
# This is a comment
# This is also a comment
Here the node text continues
9.2 Comment style
- It is recommended that the comment be at the same level as the next node. That is, comments for the next node.
- Comments inside a text block are not recommended, since visually they are strange.
10. Whitespace normalization
This section defines how whitespace must be normalized to ensure that different implementations produce the same logical representation from the same STXT text.
10.1 Inline values (`:`)
When parsing a node with ::
-
The parser takes all characters from immediately after
:to the end of the line. -
The inline value MUST be normalized by applying:
- Removal of leading spaces and tabs (left trim).
- Removal of trailing spaces and tabs (right trim).
This implies that the following lines are equivalent at the parsing level:
Name: Joan Name: Joan Name: Joan Name: Joan
In all cases, the logical value of the Name node is "Joan".
If after trim the value is empty, the inline value is considered the empty string ("").
10.2 Lines inside `>>` blocks
For each line that belongs to a >> block:
- The parser determines the content of the line from the text that follows the minimum indentation of the block (i.e., it removes only the block indentation, but preserves any additional indentation as part of the text).
- On that content, the parser MUST remove all trailing spaces and tabs (right trim).
- Empty lines are preserved in all cases, except lines that are real comments, with indentation lower than the block.
Example of line canonicalization:
Block >>
Hello
World
Logical representation of the block content:
- Line 1:
"Hello" - Line 2:
" World"(the 4 additional spaces after the minimum indentation are preserved, trailing spaces are removed)
10.3 Empty lines in `>>` blocks
- Intermediate empty lines within the block (intermediate or final) MUST be preserved as empty lines (
"") in the logical representation of the text. - Only right trim is applied to each individual line (remove spaces/tabs at end of line, as already done).
- No empty line is removed, neither intermediate nor final.
Example:
Text >>
Line 1
Line 2
Logical content of the block:
- Line 1:
"Line 1" - Line 2:
"" - Line 3:
"Line 2" - Line 4:
""
11. Error Rules
A document is invalid if any of these conditions occur:
- Spaces that are not multiples of 4 (when spaces are used for indentation).
- Jumps in indentation levels.
- A
>>node contains meaningful inline content on the same line as>>. - A node contains neither
:nor>>.
A conforming parser MUST reject the document.
12. Conformance
An STXT implementation is conforming if:
- It implements the syntax described in this document.
- It applies the strict indentation and hierarchy rules.
- It correctly interprets nodes with
:and>>blocks. - It interprets comments outside
>>blocks. - It treats everything inside
>>blocks as literal text. - It applies the whitespace normalization rules of section 10.
- It rejects invalid documents according to section 11.
13. File Extension and Media Type
13.1 File Extension
STXT documents SHOULD use the extension: .stxt
13.2 Media Type (MIME)
- Official media type:
text/stxt - Compatible alternative:
text/plain
14. Normative Examples
14.1 Valid document
Document (com.example.docs):
Author: Joan
Date: 2025/12/03
Summary >>
This is a text block.
With multiple lines.
Config:
Mode: Active
14.2 Block with empty lines
Text>>
Line 2
Logical content of the block:
"""Line 2"
14.3 Comments inside and outside blocks
Document:
Body >>
# This is text
More text
# This is a comment
15. Security Considerations
STXT has been designed with parsing security as a fundamental priority, minimizing the attack surface compared to other structured textual formats.
A conforming STXT parser is inherently resistant to common classes of vulnerabilities:
- Immune to entity expansion attacks (such as "billion laughs" or XXE): the format defines no entities, external references, or inclusion of remote resources.
- Immune to arbitrary code execution: there are no dynamic features, custom tags, loaders, or object deserialization. The only resulting structure is a simple tree of nodes and textual values.
- Immune to injection inside literal blocks: all content inside a
>>node is treated as literal text without any interpretation, even if it contains:,>>,#, or other STXT syntax. - Low risk of denial of service: strict consecutive indentation rules and the absence of circular references or anchors limit structural complexity. Implementations SHOULD impose a reasonable limit on nesting depth (recommended: ≤ 100 levels) and total document size.
- Optional external schemas: semantic validation is a separate layer. A basic parser MAY operate without loading external schemas, eliminating risks associated with their resolution.
Consequently, STXT is especially suitable for processing documents from untrusted sources (remote configurations, user input, data exchange) where parser security is critical.
Implementations MUST reject invalid documents according to section 11 and MUST NOT introduce extensions that allow external loading or dynamic evaluation without explicit security measures.
16. Appendix A — Grammar (Informal)
Document = { Line }
Line = [Indentation] ( Comment | Node | BlockContinuation | EmptyLine )
Node = Indentation Name [Namespace] ( Inline | BlockStart )
Inline = ":" [Space] [InlineText]
BlockStart = [Space] ">>" [TrailingSpaces]
Namespace = "(" ["@"] Ident { "." Ident } ")"
Ident = [a-z0-9]+ ; lowercase and numbers only according to style and normalization rules
Comment = "#" { any character until end of line }
BlockContinuation = IndentationGreaterThanPreviousBlock { any text } ; literal text
Indentation = Allowed mix of spaces and tabs according to section 8
- Pure spaces: exact multiples of 4 per level
- Pure tabs: 1 tab = 1 level
- Mixed in line: tab wins, spaces <4 are ignored
Name = Normalized text (trim + space compaction) according to section 4.1
Key notes for implementers:
-
The parser must process the document line by line, maintaining state of:
- Current indentation level of the parent node.
- Base indentation and active
>>block state (if any). - Current inherited namespace.
-
Basic parsing flow:
- Read line and compute its effective indentation (according to section 8 rules).
- If there is an active
>>block:- If indentation ≥ minimum indentation of the block → add line as literal text (right trim).
- If indentation ≤ indentation of the
>>node and line is non-empty and line is not a comment → close block and process as a new node.
- If there is no active block:
- Empty line → ignore (does not affect hierarchy).
- Starts with
#→ comment. - Otherwise → new node (normalize name, detect namespace, type : or >>).
-
Namespace inheritance:
- The root node namespace is empty by default.
- Each node inherits its parent’s namespace.
- If a node defines its own namespace in
(), it replaces the inherited one for it and all its descendants.
-
Additional normalization:
- Node names: according to sections 4.1–4.3.
- Namespaces: internally converted to lowercase (section 7).
- Inline values: left and right trim (section 10.1).
- Block lines: preserve relative indentation + right trim + preserve all empty lines (sections 10.2–10.3). Comment = Indent "#" Text ; Only outside '>>' blocks
17. Appendix B — Interaction with `@stxt.schema`
The schema system allows adding semantic validation to STXT documents without modifying the base syntax of the language.
The STXT core does not define how an implementation should react: the behavior belongs exclusively to the schema system (STXT-SCHEMA-SPEC).
A schema is an STXT document whose namespace is: @stxt.schema
and whose goal is to define the structural rules, value types, and cardinalities of nodes belonging to a specific namespace.
The STXT core does not interpret these rules; it only defines how they are expressed and how they are combined via namespaces.
17.1. Associating a schema to a namespace
To associate a schema to the namespace com.example.docs, write a document:
Schema (@stxt.schema): com.example.docs Node: Email Children: Child: From Child: To Child: Cc Child: Bcc Child: Title Max: 1 Child: Body Content Min: 1 Max: 1 Child: Metadata (com.google) Max: 1 Node: From Node: To Node: Cc Node: Bcc Node: Title Node: Body Content Type: TEXT
17.2. Application to STXT documents
A document that declares the same namespace:
Document (com.example.docs):
Field1: value
Text: one
Text: two
can be validated by an implementation that supports STXT schemas:
- Validating the presence of nodes according to
Nodein the schema. - Validating value types (
TEXT,DATE,NUMBER, etc.). - Validating cardinalities defined in
Child.
17.3. Core independence
STXT MUST NOT impose semantic rules coming from schemas. The schema system is a separate and optional component that operates on the already-parsed STXT.
It also MAY act as part of the parsing process. In that case it SHOULD be weakly coupled with it. This would allow detecting errors without having to wait until the end of parsing.
18. Appendix B — Interaction with `@stxt.template`
The template system allows adding semantic validation to STXT documents without modifying the base syntax of the language.
The STXT core does not define how an implementation should react: the behavior belongs exclusively to the template system (STXT-TEMPLATE-SPEC).
A template is an STXT document whose namespace is: @stxt.template
and whose goal is to define the structural rules, value types, and cardinalities of nodes belonging to a specific namespace.
The Template system is analogous to schemas, but with a simplified syntax, oriented toward rapid prototypes. Even so, it is a perfectly valid system for all kinds of documents. It could be considered syntactic sugar, since internally it can use the same representation as a schema.
The template system MAY coexist alongside a schema system, since in the end a Template defines the same information as a schema.
18.1. Associating a schema to a template
To associate a schema to the namespace com.example.docs with templates, write a document:
Template (@stxt.template): com.example.docs Structure >> Email: From: To: Cc: Bcc: Title: (?) Body Content: (1) TEXT Metadata (com.google): (?)
Once declared, templates fulfill the same function as schemas. A standard validator SHOULD prioritize a schema over a template.