.\" Copyright 2011-2023 David Robillard <d@drobilla.net>
.\" SPDX-License-Identifier: ISC
.Dd May 04, 2023
.Os Serd 1.1.1
.Nm serd-pipe
.Nd read and write RDF data
.Nm serd-pipe
.Op Fl CVh
.Op Fl B Ar base
.Op Fl I Ar syntax
.Op Fl O Ar syntax
.Op Fl R Ar root
.Op Fl b Ar bytes
.Op Fl k Ar bytes
.Op Fl l Ar level
.Op Fl o Ar filename
.Op Fl s Ar string
.Op Ar input ...
is a fast command-line utility for streaming RDF data.
It reads one or more files and writes the data again,
possibly in a different form.
writes statements immediately as they are read,
so it uses little memory and is suitable for use in pipelines and with huge files.
Typical uses include checking syntax,
converting to another syntax,
merging files,
expanding URIs,
and so on.
By default,
syntaxes are guessed from file extensions where possible,
making use with filenames most convenient.
For example,
most common tasks can be accomplished with simple commands like:
.Dl $ serd-pipe -o pretty.ttl input.nt
.Ar input
operands are processed in command-line order.
.Ar input
.Ar -
or absent,
reads from standard input.
Similarly, output defaults to standard output.
If syntax isn't given and can't be determined from filenames,
then input is read as TriG and output is written as NQuads
(which will function properly with Turtle and NTriples, respectively).
The options are as follows:
.Bl -tag -width 3n
.It Fl B Ar base
Base URI, path, or
.Cm rebase
to use the output path.
This is used to resolve relative URI references in the input.
If the input is a file,
its path is used by default,
so relative paths are written as they are in the input.
The special
.Cm rebase
argument will instead use the output path set by the
.Fl o
so paths are written relative to the output file.
The distinction matters when reading from bundles of files that refer to each other.
For example,
when copying
.Pa in/manifest.ttl
.Pa out/manifest.ttl ,
the relative URI reference
.Ql <data.ttl>
will be written as
.Ql <../in/data.ttl>
when using
.Fl o
.Cm rebase .
.It Fl C
Convert literals to canonical form.
Literals with supported XSD datatypes will be parsed and rewritten canonically.
Invalid literals will cause an error.
All numeric datatypes are supported, as well as
.Vt boolean ,
.Vt duration ,
.Vt datetime ,
.Vt time ,
.Vt hexBinary ,
.Vt base64Binary .
.It Fl I Ar syntax
Set an input syntax or option.
May be given multiple times.
The case-insensitive
.Ar syntax
can be
.Cm NQuads ,
.Cm NTriples ,
.Cm TriG ,
.Cm Turtle ,
or an option:
.Bl -tag -width 3n
.It Cm lax
Tolerate invalid input where possible.
Warnings will be printed for syntax errors,
but parsing will attempt to continue.
Note that data may be lost when using this option!
.It Cm variables
Support parsing variable nodes.
Variables can be written in SPARQL style, for example
.Li ?var
.Li $var .
.It Cm relative
Read relative URI references exactly without resolving them.
Normally, all relative URIs are expanded against the base URI when reading.
This flag disables that,
so URI references will be passed through exactly as they are in the input.
.It Cm generated
Read seemingly generated blank node labels exactly without adjusting them.
Normally, blank node labels like
.Li b123
are adapted to avoid potential clashes with generated ones.
This flag disables that,
so such labels will be passed through exactly as they are in the input.
Note that this may corrupt the output by merging distinct blank nodes.
.It Cm global
Assume a clean global namespace for blank node labels,
and do not automatically add prefixes.
a prefix like
.Li f1
is added to blank node labels when reading multiple files,
to prevent labels in different files from clashing.
This option disables that,
so blank node labels will be passed through without any added prefix.
Note that this may corrupt the output by merging distinct blank nodes.
.It Cm ordered
Generate blank node labels with suffixes left-padded with zeros.
This generates IDs like "_:b0000000123" that sort in numerical order,
which can be useful to preserve statement ordering.
.It Cm decoded
Read URIs with percent-encoded UTF-8 characters decoded.
Normally, percent-encoded octets in URIs are preserved as plain text.
This flag enables interpreting them as UTF-8,
decoding escapes like "%7E" to characters like "~" where possible.
.It Fl O Ar syntax
Set an output syntax or option.
May be given multiple times.
The case-insensitive
.Ar syntax
can be
.Cm empty ,
.Cm NQuads ,
.Cm NTriples ,
.Cm TriG ,
.Cm Turtle ,
or an option:
.Bl -tag -width 3n
.It Cm ascii
Escape all non-ASCII characters.
Normally, text is written in UTF-8.
This flag will escape additional non-printable-ASCII characters in string literals like
.Li \eU00B7
.Li \eU0001F600 ,
and in URIs like
.Li %B7
.Li %F0%9F%98%80 .
.It Cm escapes
Escape all non-ASCII characters with
.Dq U
This works like
.Cm ascii ,
except percent-encoding will not be used in URIs
(matching the format used in the Turtle test suite).
.It Cm contextual
Suppress writing directives that describe the context.
This can be used to suppress the header of
.Li prefix
.Li base
making the output depend on an implied context.
Note that this option may produce incomprehensible output if prefixes change while writing!
.It Cm expanded
Write expanded URIs instead of prefixed names.
.It Cm lax
Tolerate corrupt UTF-8 and write replacements.
.It Cm longhand
Avoid using the
.Ql a
shorthand for
.Ql rdf:type .
.It Cm terse
Write terser output with fewer, longer lines.
.It Cm verbatim
Write URI references exactly as in the input.
.Cm empty
syntax suppresses the output,
so that only warnings and errors will be printed.
.It Fl R Ar root
Keep relative URIs within a
.Ar root
This will avoid creating any relative URI references with leading path segments like
.Pa ../
that enter a parent of
.Ar root .
For example,
.Pa /home/you/file.ttl
is written to the file
.Pa /home/me/output.ttl
.Fl B Cm rebase ,
then it will be written as
.Li <../you/file.ttl> .
.Fl R Pa /home/me/
would prevent references from
.Dq escaping
like this,
so the above would instead be written as
.Li <file:///home/you/file.ttl> .
This is useful for keeping relative references within some directory.
.It Fl V
Display version information and exit.
.It Fl b Ar bytes
I/O block size.
This is the number of bytes in a file that will be read or written at once.
The default is 4096, which should perform well in most cases.
Note that this only applies to files, standard input and output are always processed one byte at a time.
.It Fl h
Print the command line options.
.It Fl k Ar bytes
Parser stack size.
Parsing is performed using a pre-allocated stack for performance and security reasons.
By default, the stack is 1 MiB, which should be sufficient for most data.
This can be increased to support unusually structured data and huge literals,
or decreased to reduce overall memory requirements and reduce startup time.
.It Fl l Ar level
Maximum log level, or (equivalently) minimum log priority.
Only messages with at least the priority of this level will be displayed.
.Ar level
is as in
.Xr syslog 2 ,
either a number from
.Cm 0
.Cm 7,
.Cm emerg ,
.Cm alert ,
.Cm crit ,
.Cm err ,
.Cm warn ,
.Cm note ,
.Cm info ,
.Cm debug .
.It Fl o Ar filename
Write output to the given
.Ar filename
instead of stdout.
.It Fl s Ar string
.Ar string
as input.
Errors and warnings are printed in color by default if the output is a terminal.
This can be overridden with environment variables:
.Bl -tag -compact -width 14n
If present (regardless of value), color is disabled.
If set to 0, color is disabled.
If set to anything other than 0, color is forced on.
exits with a status of 0, or non-zero if an error occurred.
.Bl -tag -width 3n
.It Format a Turtle file to stdout:
.Nm Fl O
.Ar turtle
.Pa input.ttl
.It Print only errors and discard the output:
.Nm Fl O
.Ar empty
.Pa input.ttl
.It Convert an NTriples file to Turtle:
.Nm Fl o
.Ar output.ttl
.Pa input.nt
.It Expand all prefixed names into full URIs:
.Nm Fl O
.Ar expanded
.Fl o
.Ar expanded.ttl
.Pa input.ttl
.It Merge two files:
.Nm Fl o
.Pa merged.ttl
.Pa header.ttl
.Pa body.ttl
.Bl -item -compact
.Xr serd-filter 1
.Xr serd-sort 1
.Lk http://drobilla.net/software/serd/
.Lk http://gitlab.com/drobilla/serd/
.Bl -item
.%A W3C
.%T RDF 1.1 NQuads
.%D February 2014
.Lk https://www.w3.org/TR/n-quads/
.%A W3C
.%D February 2014
.%T RDF 1.1 NTriples
.Lk https://www.w3.org/TR/n-triples/
.%A W3C
.%T RDF 1.1 TriG
.%D February 2014
.Lk https://www.w3.org/TR/trig/
.%A W3C
.%D February 2014
.%T RDF 1.1 Turtle
.Lk https://www.w3.org/TR/turtle/
.%A Jan Niklas Hasse
.%D April 2015
.Lk https://bixense.com/clicolors/
.%A Joshua Stein
.%D August 2017
.Lk http://no-color.org/
is a part of serd, by
.An David Robillard
.Mt d@drobilla.net .