.\" Copyright 2011-2023 David Robillard .\" SPDX-License-Identifier: ISC .Dd May 04, 2023 .Dt SERD-PIPE 1 .Os Serd 1.1.1 .Sh NAME .Nm serd-pipe .Nd read and write RDF data .Sh SYNOPSIS .Nm serd-pipe .Op Fl Cfhqv .Op Fl B Ar base .Op Fl I Ar syntax .Op Fl O Ar syntax .Op Fl b Ar bytes .Op Fl c Ar prefix .Op Fl k Ar bytes .Op Fl p Ar prefix .Op Fl r Ar root .Op Fl s Ar string .Op Fl w Ar filename .Op Ar input ... .Sh DESCRIPTION .Nm is a fast command-line utility for streaming RDF data. It reads one or more files and writes the data again, possibly in a different form. .Pp .Nm writes statements immediately as they are read, so it uses little memory and is suitable for use in pipelines and with huge files. Typical uses include checking syntax, converting to another syntax, pretty-printing, merging files, expanding URIs, and so on. .Pp By default, syntaxes are guessed from file extensions where possible, making use with filenames most convenient. For example, most common tasks can be accomplished with simple commands like: .Pp .Dl $ serd-pipe -o pretty.ttl input.nt .Pp The .Ar input operands are processed in command-line order. If .Ar input is .Ar - or absent, .Nm reads from standard input. Similarly, output defaults to standard output. If syntax isn't given and can't be determined from filenames, then input is read as TriG and output is written as NQuads (which will function properly with Turtle and NTriples, respectively). .Pp The options are as follows: .Bl -tag -width 3n .It Fl B Ar base Input base URI. Relative URI references in the input will be resolved against this. When the input is a file, the URI of the file is automatically used as the base URI. This option can be used to override that, or to provide a base URI for input from stdin or a string. .It Fl C Convert literals to canonical form. Literals with supported XSD datatypes will be parsed and rewritten canonically. Invalid literals will cause an error. All numeric datatypes are supported, as well as .Vt boolean , .Vt duration , .Vt datetime , .Vt time , .Vt hexBinary , and .Vt base64Binary . .It Fl I Ar syntax Set an input syntax or option. May be given multiple times. The case-insensitive .Ar syntax can be .Cm NQuads , .Cm NTriples , .Cm TriG , .Cm Turtle , or an option: .Bl -tag -width 3n .It Cm lax Tolerate invalid input where possible. Warnings will be printed for syntax errors, but parsing will attempt to continue. Note that data may be lost when using this option! .It Cm variables Support parsing variable nodes. Variables can be written in SPARQL style, for example .Li ?var or .Li $var . .It Cm verbatim Normally, the reader expands all relative URIs, and may adjust blank node labels to avoid clashing with generated ones. This flag disables all of this processing, so that URI references and blank nodes are passed to the sink exactly as they are in the input. Note that this does not apply to CURIEs, since serd deliberately does not have a way to represent CURIE nodes. A bad namespace prefix is considered a syntax error. .It Cm generated Read seemingly generated blank node labels exactly without adjusting them. Normally, blank node labels like .Li b123 are adapted to avoid potential clashes with generated ones. This flag disables that, so such labels will be passed through exactly as they are in the input. Note that this may corrupt the output by merging distinct blank nodes. .El .It Fl O Ar syntax Set an output syntax or option. May be given multiple times. The case-insensitive .Ar syntax can be .Cm empty , .Cm NQuads , .Cm NTriples , .Cm TriG , .Cm Turtle , or an option: .Bl -tag -width 3n .It Cm ascii Escape all non-ASCII characters. .It Cm expanded Write expanded URIs instead of prefixed names. .It Cm lax Tolerate corrupt UTF-8 and write replacements. .It Cm terse Write terser output with fewer, longer lines. .It Cm verbatim Write URI references exactly as in the input. .El .Pp The .Cm empty syntax suppresses the output, so that only warnings and errors will be printed. .It Fl b Ar bytes I/O block size. This is the number of bytes in a file that will be read or written at once. The default is 4096, which should perform well in most cases. Note that this only applies to files, standard input and output are always processed one byte at a time. .It Fl c Ar prefix Chop .Ar prefix from matching blank node IDs. This is typically used to revert the effects of .Fl p . For example, with .Ar prefix .Dq doc01 , the blank node .Li _:doc01b42 will be emitted as .Li _:b42 . .It Fl e Eat input one character at a time, rather than a page at a time which is the default. This is useful when reading from a pipe since output will be generated immediately as input arrives, rather than waiting until an entire page of input has arrived. With this option one less page of memory is used, but likely with a performance penalty. .It Fl f Fast and loose URI mode: preserve full URIs (without qualifying or making relative), and pass prefixed names through as-is. .It Fl h Print the command line options. .It Fl k Ar bytes Parser stack size. Parsing is performed using a pre-allocated stack for performance and security reasons. By default, the stack is 1 MiB, which should be sufficient for most data. This can be increased to support unusually structured data and huge literals, or decreased to reduce overall memory requirements and reduce startup time. .It Fl p Ar prefix Add .Ar prefix to blank node IDs. This can be used to avoid clashes between blank node IDs in input documents. The effects can be reversed in a later run with .Fl c . For example, with .Ar prefix .Dq doc01 , the blank node .Li _:b42 will be emitted as .Li _:doc01b42 . .It Fl q Suppress all output except data. .It Fl r Ar root Keep relative URIs within a .Ar root URI. This will avoid creating any relative URI references with leading path segments like .Pa ../ that enter a parent of .Ar root . .Pp For example, if .Pa /home/you/file.ttl is written to the file .Pa /home/me/output.ttl using the destination's base URI, then it could be written as .Li <../you/file.ttl> . Setting .Fl r Li file:///home/me/ would prevent references from .Dq escaping like this, so the above would instead be written as .Li , since it can't be expressed relative to the root URI. .Pp This is useful for keeping relative references within some directory. .It Fl s Ar string Parse .Ar string as input. .It Fl v Display version information and exit. .It Fl w Ar filename Write output to the given .Ar filename instead of stdout. .El .Sh ENVIRONMENT Errors and warnings are printed in color by default if the output is a terminal. This can be overridden with environment variables: .Pp .Bl -tag -compact -width 14n .It Ev NO_COLOR If present (regardless of value), color is disabled. .It Ev CLICOLOR If set to 0, color is disabled. .It Ev CLICOLOR_FORCE If set to anything other than 0, color is forced on. .El .Sh EXIT STATUS .Nm exits with a status of 0, or non-zero if an error occurred. .Sh EXAMPLES .Bl -tag -width 3n .It Format a Turtle file to stdout: .Nm Fl O .Ar turtle .Pa input.ttl .It Print only errors and discard the output: .Nm Fl O .Ar empty .Pa input.ttl .It Convert an NTriples file to Turtle: .Nm Fl o .Ar output.ttl .Pa input.nt .It Expand all prefixed names into full URIs: .Nm Fl O .Ar expanded .Fl o .Ar expanded.ttl .Pa input.ttl .It Merge two files: .Nm Fl o .Pa merged.ttl .Pa header.ttl .Pa body.ttl .El .Sh SEE ALSO .Bl -item -compact .It .Lk http://drobilla.net/software/serd/ .It .Lk http://gitlab.com/drobilla/serd/ .El .Sh STANDARDS .Bl -item .It .Rs .%A W3C .%T RDF 1.1 NQuads .%D February 2014 .Re .Lk https://www.w3.org/TR/n-quads/ .It .Rs .%A W3C .%D February 2014 .%T RDF 1.1 NTriples .Re .Lk https://www.w3.org/TR/n-triples/ .It .Rs .%A W3C .%T RDF 1.1 TriG .%D February 2014 .Re .Lk https://www.w3.org/TR/trig/ .It .Rs .%A W3C .%D February 2014 .%T RDF 1.1 Turtle .Re .Lk https://www.w3.org/TR/turtle/ .It .Rs .%A Jan Niklas Hasse .%T CLICOLOR .%D April 2015 .Re .Lk https://bixense.com/clicolors/ .It .Rs .%A Joshua Stein .%T NO_COLOR .%D August 2017 .Re .Lk http://no-color.org/ .El .Sh AUTHORS .Nm is a part of serd, by .An David Robillard .Mt d@drobilla.net .