STXT - Semantic Text
Built for humans. Reliable for machines.

STXT Schemas (@stxt.schema)

1. Introduction
2. Terminology
3. Relationship between STXT and Schema
4. General structure of a Schema
5. One schema per namespace
6. Node Definitions (`Node:`)
7. Children (`Children:`) and cross namespaces
8. Cardinalities
9. Types
10. Normative Examples
11. Schema Errors
12. Conformance
13. Schema of the Schema (`@stxt.schema`)
14. End of Document

1. Introduction

This document defines the specification of the STXT Schema language, a mechanism to validate STXT documents through formal semantic rules.

A schema:

2. Terminology

The keywords "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" must be interpreted according to RFC 2119.

Terms such as node, indentation, namespace, inline and block >> keep their meaning in STXT-SPEC.

3. Relationship between STXT and Schema

Schema validation happens after STXT parsing:

  1. Parsing into STXT hierarchical structure.
  2. Resolution of the logical namespace (inheritance).
  3. Application of the corresponding schema.

Optionally it MAY act during the parsing process, as long as it is weakly coupled with it. In this way errors can be detected earlier.

4. General structure of a Schema

A schema is a document whose root node is: Schema (@stxt.schema): <target_namespace>

Example:

Schema (@stxt.schema): com.example.docs
    Description: Schema for example documents
    Node: Document
        Type: GROUP
        Children:
        	Child: Metadata (@com.google.html)
        		Max: 1
        	Child: Autor
        	Child: Fecha
        		Max: 1
        	Child: Content
        		Min: 1
        		Max: 1
    Node: Autor
    Node: Fecha
        Type: DATE
    Node: Content
        Type: TEXT

5. One schema per namespace

For each logical namespace:

6. Node Definitions (`Node:`)

6.1 Basic form

Node: Node Name
	Descrip: Node description
    Type: Type
    Children:
    	Child: name_child. It can have a namespace in case it is different from the target namespace
    		Min: optional, indicates the minimum number of childs that can appear
    		Max: optional, indicates the maximum number of childs that can appear

Rules:

6.2 Values in ENUM types

Node: Node Name
	Descrip: Node description
    Type: ENUM
    Children:
    	Child: name_child. It can have a namespace in case it is different from the target namespace
    		Min: optional, indicates the minimum number of childs that can appear
    		Max: optional, indicates the maximum number of childs that can appear
    Values:
    	Value: value 1
    	Value: value 2
    	Value: value 3

The ENUM type (and only ENUM) can specify a Values node with the allowed values (Value nodes). At least one Value must exist.

7. Children (`Children:`) and cross namespaces

A node can have a Children entry. If it has it, it must have one or more Child nodes, with the information of the allowed childs.

A Child can belong to another namespace, in which case it is indicated in the child name. Example:

Node: node name
	Children:
		Child: child name (child.namespace)
			Min: 0
			Max: 1

7.1. Nodes must be explicitly shown in schemas.

Every node that appears in Children must have its own definition as Node: in its corresponding schema.

This way we avoid “ghost” children and guarantee that all nodes have defined semantics.

This implies:

8. Cardinalities

Cardinalities are done through the Min and Max nodes of Child. They will be optional non-negative integers. If they exist they indicate the minimum or maximum number of appearances of the child.

Rules:

9. Types

Types define:

  1. The form of the node value (inline, block >>, or none).
  2. Whether the node is compatible with children.
  3. Content validation.

They are defined in the Node, with a Type element. Example:

Node: node name
	Type: NODE_TYPE
	Children:
		Child: a child name

Other considerations:

9.1. Basic structural types

A parser MUST allow these types and MUST validate the structure.

Type Text forms Compatible children Description / Validation
INLINE INLINE YES Inline text :. Default type. Can have children.
BLOCK BLOCK NO Only text block >>.
TEXT INLINE/BLOCK NO Generic text. Inline : or block >>. Cannot have children.
GROUP NONE YES Empty text. Only allowed children. Node container.

9.2. Basic INLINE content types

A parser MUST allow these types and SHOULD validate the structure.

Type Text forms Compatible children Description / Validation
BOOLEAN INLINE YES true / false.
NUMBER INLINE YES JSON-format number.
DATE INLINE YES YYYY-MM-DD.
ENUM INLINE YES Only specified values (see 9.6)

9.3. Extended INLINE content types

A parser SHOULD allow these types and SHOULD validate the structure.

Type Text forms Compatible children Description / Validation
INTEGER INLINE YES Number without decimals (positive and negative).
NATURAL INLINE YES Numbers greater than or equal to 0 without decimals.
TIME INLINE YES ISO 8601, hh:mm:ss
TIMESTAMP INLINE YES Full ISO 8601.
UUID INLINE YES UUID
URL INLINE YES URL/URI
EMAIL INLINE YES EMAIL

9.4 Binary content types INLINE/BLOCK

A parser SHOULD allow these types and MAY validate the structure.

Type Text forms Compatible children Description / Validation
HEXADECIMAL INLINE / BLOCK NO [0-9A-Fa-f]+. Hexadecimal string
BINARY INLINE / BLOCK NO [01]+ Binary string.
BASE64 INLINE / BLOCK NO Base64 block.

9.5 ENUM Type

The ENUM type is special, since it allows enumerating the allowed values for that node. Characteristics:

Example:

Node: Node Name
    Type: ENUM
    Values:
    	Value: value 1
    	Value: value 2
    	Value: value 3

A schema parser MUST check ENUM types with their allowed values, and throw errors if they are not met.

10. Normative Examples

10.1. Schema with cross-namespace references

Schema (@stxt.schema): com.example.docs
    Node: Document
        Type: GROUP
        Children:
            Child: Metadata (@com.google.html)
            	Max: 1
            Child: Content
            	Min: 1
            	Max: 1
    Node: Content
        Type: BLOCK

And in com.google.html:

Schema (@stxt.schema): com.google.html
    Node: Metadata
    	Type: INLINE

10.2. Valid document

Document (@com.example.docs):
    Metadata (@com.google.html): info
    Content>>
        Line 1
        Line 2

11. Schema Errors

A schema is invalid if:

  1. It defines two Node with the same name.
  2. It uses an unknown Type.
  3. It defines Children in a Node whose type does not allow children.
  4. The cardinality is invalid.
  5. A child appears in Children whose Node is not defined in its corresponding schema.

12. Conformance

An implementation is conforming if:

13. Schema of the Schema (`@stxt.schema`)

This section defines the official schema of the schema system itself: the meta-schema that validates all documents in the @stxt.schema namespace.

13.1. Considerations

13.2. Full Meta-Schema

Schema (@stxt.schema): @stxt.schema
    Node: Schema
        Children:
            Child: Description
                Max: 1
            Child: Node
                Min: 1
    Node: Node
        Children:
            Child: Type
                Max: 1
            Child: Children
                Max: 1
            Child: Description
                Max: 1
            Child: Values
                Max: 1
    Node: Children
       	Type: GROUP
        Children:
            Child: Child
                Min: 1
    Node: Description
        Type: TEXT
    Node: Child
        Children:
            Child: Min
                Max: 1
            Child: Max
                Max: 1
    Node: Min
        Type: NATURAL
    Node: Max
        Type: NATURAL
    Node: Type
    Node: Values
    	Children:
    		Child: Value
    			Min: 1
    Node: Value

13.3. Quick reading

13.4. Minimal valid example

Schema (@stxt.schema): com.example.docs
    Node: Document

13.5. Complete example

Schema (@stxt.schema): com.example.docs
    Description: Example schema
    Node: Document
        Type: GROUP
        Children:
        	Child: Title
        		Min: 1
        		Max: 1
        	Child: Author
        	Child: Metadata (@com.google.html)
        		Max: 1
    Node: Title
        Type: INLINE
    Node: Author
        Type: INLINE

14. End of Document