.\" Copyright 2011-2023 David Robillard .\" SPDX-License-Identifier: ISC .Dd May 04, 2023 .Dt SERD-PIPE 1 .Os Serd 1.1.1 .Sh NAME .Nm serd-pipe .Nd read and write RDF data .Sh SYNOPSIS .Nm serd-pipe .Op Fl CVh .Op Fl B Ar base .Op Fl I Ar syntax .Op Fl O Ar syntax .Op Fl R Ar root .Op Fl b Ar bytes .Op Fl k Ar bytes .Op Fl l Ar level .Op Fl o Ar filename .Op Fl s Ar string .Op Ar input ... .Sh DESCRIPTION .Nm is a fast command-line utility for streaming RDF data. It reads one or more files and writes the data again, possibly in a different form. .Pp .Nm writes statements immediately as they are read, so it uses little memory and is suitable for use in pipelines and with huge files. Typical uses include checking syntax, converting to another syntax, pretty-printing, merging files, expanding URIs, and so on. .Pp By default, syntaxes are guessed from file extensions where possible, making use with filenames most convenient. For example, most common tasks can be accomplished with simple commands like: .Pp .Dl $ serd-pipe -o pretty.ttl input.nt .Pp The .Ar input operands are processed in command-line order. If .Ar input is .Ar - or absent, .Nm reads from standard input. Similarly, output defaults to standard output. If syntax isn't given and can't be determined from filenames, then input is read as TriG and output is written as NQuads (which will function properly with Turtle and NTriples, respectively). .Pp The options are as follows: .Bl -tag -width 3n .It Fl B Ar base Base URI, path, or .Cm rebase to use the output path. This is used to resolve relative URI references in the input. .Pp If the input is a file, its path is used by default, so relative paths are written as they are in the input. The special .Cm rebase argument will instead use the output path set by the .Fl o option, so paths are written relative to the output file. .Pp The distinction matters when reading from bundles of files that refer to each other. For example, when copying .Pa in/manifest.ttl to .Pa out/manifest.ttl , the relative URI reference .Ql will be written as .Ql <../in/data.ttl> when using .Fl o .Cm rebase . .It Fl C Convert literals to canonical form. Literals with supported XSD datatypes will be parsed and rewritten canonically. Invalid literals will cause an error. All numeric datatypes are supported, as well as .Vt boolean , .Vt duration , .Vt datetime , .Vt time , .Vt hexBinary , and .Vt base64Binary . .It Fl I Ar syntax Set an input syntax or option. May be given multiple times. The case-insensitive .Ar syntax can be .Cm NQuads , .Cm NTriples , .Cm TriG , .Cm Turtle , or an option: .Bl -tag -width 3n .It Cm lax Tolerate invalid input where possible. Warnings will be printed for syntax errors, but parsing will attempt to continue. Note that data may be lost when using this option! .It Cm variables Support parsing variable nodes. Variables can be written in SPARQL style, for example .Li ?var or .Li $var . .It Cm relative Read relative URI references exactly without resolving them. Normally, all relative URIs are expanded against the base URI when reading. This flag disables that, so URI references will be passed through exactly as they are in the input. .It Cm generated Read seemingly generated blank node labels exactly without adjusting them. Normally, blank node labels like .Li b123 are adapted to avoid potential clashes with generated ones. This flag disables that, so such labels will be passed through exactly as they are in the input. Note that this may corrupt the output by merging distinct blank nodes. .It Cm global Assume a clean global namespace for blank node labels, and do not automatically add prefixes. Normally, a prefix like .Li f1 is added to blank node labels when reading multiple files, to prevent labels in different files from clashing. This option disables that, so blank node labels will be passed through without any added prefix. Note that this may corrupt the output by merging distinct blank nodes. .El .It Fl O Ar syntax Set an output syntax or option. May be given multiple times. The case-insensitive .Ar syntax can be .Cm empty , .Cm NQuads , .Cm NTriples , .Cm TriG , .Cm Turtle , or an option: .Bl -tag -width 3n .It Cm ascii Escape all non-ASCII characters. .It Cm expanded Write expanded URIs instead of prefixed names. .It Cm lax Tolerate corrupt UTF-8 and write replacements. .It Cm longhand Avoid using the .Ql a shorthand for .Ql rdf:type . .It Cm terse Write terser output with fewer, longer lines. .It Cm verbatim Write URI references exactly as in the input. .El .Pp The .Cm empty syntax suppresses the output, so that only warnings and errors will be printed. .It Fl R Ar root Keep relative URIs within a .Ar root URI. This will avoid creating any relative URI references with leading path segments like .Pa ../ that enter a parent of .Ar root . .Pp For example, if .Pa /home/you/file.ttl is written to the file .Pa /home/me/output.ttl using .Fl B Cm rebase , then it will be written as .Li <../you/file.ttl> . Setting .Fl R Pa /home/me/ would prevent references from .Dq escaping like this, so the above would instead be written as .Li . .Pp This is useful for keeping relative references within some directory. .It Fl V Display version information and exit. .It Fl b Ar bytes I/O block size. This is the number of bytes in a file that will be read or written at once. The default is 4096, which should perform well in most cases. Note that this only applies to files, standard input and output are always processed one byte at a time. .It Fl h Print the command line options. .It Fl k Ar bytes Parser stack size. Parsing is performed using a pre-allocated stack for performance and security reasons. By default, the stack is 1 MiB, which should be sufficient for most data. This can be increased to support unusually structured data and huge literals, or decreased to reduce overall memory requirements and reduce startup time. .It Fl l Ar level Maximum log level, or (equivalently) minimum log priority. Only messages with at least the priority of this level will be displayed. The .Ar level is as in .Xr syslog 2 , either a number from .Cm 0 to .Cm 7, or .Cm emerg , .Cm alert , .Cm crit , .Cm err , .Cm warn , .Cm note , .Cm info , or .Cm debug . .It Fl o Ar filename Write output to the given .Ar filename instead of stdout. .It Fl s Ar string Parse .Ar string as input. .El .Sh ENVIRONMENT Errors and warnings are printed in color by default if the output is a terminal. This can be overridden with environment variables: .Pp .Bl -tag -compact -width 14n .It Ev NO_COLOR If present (regardless of value), color is disabled. .It Ev CLICOLOR If set to 0, color is disabled. .It Ev CLICOLOR_FORCE If set to anything other than 0, color is forced on. .El .Sh EXIT STATUS .Nm exits with a status of 0, or non-zero if an error occurred. .Sh EXAMPLES .Bl -tag -width 3n .It Format a Turtle file to stdout: .Nm Fl O .Ar turtle .Pa input.ttl .It Print only errors and discard the output: .Nm Fl O .Ar empty .Pa input.ttl .It Convert an NTriples file to Turtle: .Nm Fl o .Ar output.ttl .Pa input.nt .It Expand all prefixed names into full URIs: .Nm Fl O .Ar expanded .Fl o .Ar expanded.ttl .Pa input.ttl .It Merge two files: .Nm Fl o .Pa merged.ttl .Pa header.ttl .Pa body.ttl .El .Sh SEE ALSO .Bl -item -compact .It .Xr serd-filter 1 .It .Xr serd-sort 1 .It .Lk http://drobilla.net/software/serd/ .It .Lk http://gitlab.com/drobilla/serd/ .El .Sh STANDARDS .Bl -item .It .Rs .%A W3C .%T RDF 1.1 NQuads .%D February 2014 .Re .Lk https://www.w3.org/TR/n-quads/ .It .Rs .%A W3C .%D February 2014 .%T RDF 1.1 NTriples .Re .Lk https://www.w3.org/TR/n-triples/ .It .Rs .%A W3C .%T RDF 1.1 TriG .%D February 2014 .Re .Lk https://www.w3.org/TR/trig/ .It .Rs .%A W3C .%D February 2014 .%T RDF 1.1 Turtle .Re .Lk https://www.w3.org/TR/turtle/ .It .Rs .%A Jan Niklas Hasse .%T CLICOLOR .%D April 2015 .Re .Lk https://bixense.com/clicolors/ .It .Rs .%A Joshua Stein .%T NO_COLOR .%D August 2017 .Re .Lk http://no-color.org/ .El .Sh AUTHORS .Nm is a part of serd, by .An David Robillard .Mt d@drobilla.net .