diff options
Diffstat (limited to 'doc/cpp/reading_and_writing.rst')
-rw-r--r-- | doc/cpp/reading_and_writing.rst | 147 |
1 files changed, 147 insertions, 0 deletions
diff --git a/doc/cpp/reading_and_writing.rst b/doc/cpp/reading_and_writing.rst new file mode 100644 index 00000000..893e6f7b --- /dev/null +++ b/doc/cpp/reading_and_writing.rst @@ -0,0 +1,147 @@ +Reading and Writing +=================== + +.. default-domain:: cpp +.. highlight:: cpp +.. namespace:: serd + +Reading and writing documents in a textual syntax is handled by the :struct:`Reader` and :struct:`Writer`, respectively. +Serd is designed around a concept of event streams, +so the reader or writer can be at the beginning or end of a "pipeline" of stream processors. +This allows large documents to be processed quickly in an "online" fashion, +while requiring only a small constant amount of memory. +If you are familiar with XML, +this is roughly analogous to SAX. + +A common setup is to simply connect a reader directly to a writer. +This can be used for things like pretty-printing, +or converting a document from one syntax to another. +This can be done by passing the sink returned by the writer's :func:`~Writer::sink` method to the :class:`~Reader` constructor. + +First though, +an environment needs to be set up in order to write a document. +This defines the base URI and any namespace prefixes, +which are used to resolve any relative URIs or prefixed names by the reader, +and to abbreviate the output by the writer. +In most cases, the base URI should simply be the URI of the file being written. +For example: + +.. literalinclude:: overview.cpp + :start-after: begin env-new + :end-before: end env-new + :dedent: 2 + +Namespace prefixes can also be defined for any vocabularies used: + +.. literalinclude:: overview.cpp + :start-after: begin env-set-prefix + :end-before: end env-set-prefix + :dedent: 2 + +The reader will set any additional prefixes from the document as they are encountered. + +We now have an environment set up for the contents of our document, +but still need to specify where to write it. +This is done by creating an :struct:`OutputStream`, +which is a generic interface that can be set up to write to a file, +a buffer in memory, +or a custom function that can be used to write output anywhere. +In this case, we will write to the file we set up as the base URI: + +.. literalinclude:: overview.cpp + :start-after: begin byte-sink-new + :end-before: end byte-sink-new + :dedent: 2 + +The second argument is the page size in bytes, +so I/O will be performed in chunks for better performance. +The value used here, 4096, is a typical filesystem block size that should perform well on most machines. + +With an environment and byte sink ready, +the writer can now be created: + +.. literalinclude:: overview.cpp + :start-after: begin writer-new + :end-before: end writer-new + :dedent: 2 + +Output is written by feeding statements and other events to the sink returned by the writer's :func:`~Writer::sink` method. +:struct:`Sink` is the generic interface for anything that can consume data streams. +Many objects provide the same interface to do various things with the data, +but in this case we will send data directly to the writer: + +.. literalinclude:: overview.cpp + :start-after: begin reader-new + :end-before: end reader-new + :dedent: 2 + +The third argument of the reader constructor takes a bitwise ``OR`` of :enum:`ReaderFlag` flags that can be used to configure the reader. +In this case no flags are given, +but for example, +passing ``ReaderFlag::lax | ReaderFlag::relative`` would enable lax mode and preserve relative URIs in the input. + +Now that we have a reader that is set up to directly push its output to a writer, +we can finally process the document: + +.. literalinclude:: overview.cpp + :start-after: begin read-document + :end-before: end read-document + :dedent: 2 + +Alternatively, one "chunk" of input can be read at a time with :func:`~Reader::read_chunk`. +A "chunk" is generally one top-level description of a resource, +including any anonymous blank nodes in its description, +but this depends on the syntax and the structure of the document being read. + +The reader pushes events to its sink as input is read, +so in this scenario the data should now have been re-written by the writer +(assuming no error occurred). +To finish and ensure that a complete document has been read and written, +:func:`~Reader::finish` can be called followed by :func:`~Writer::finish`. +However these will be automatically called on destruction if necessary, +so if the reader and writer are no longer required they can simply be destroyed. + +Finally, closing the byte sink will flush and close the output file, +so it is ready to be read again later. +Similar to the reader and writer, +this can be done explicitly by calling its :func:`~OutputStream::close` method, +or implicitly by destroying the byte sink if it is no longer needed: + +.. literalinclude:: overview.cpp + :start-after: begin byte-sink-close + :end-before: end byte-sink-close + :dedent: 2 + +Reading into a Model +-------------------- + +A document can be loaded into a model by setting up a reader that pushes data to a model `inserter` rather than a writer: + +.. literalinclude:: overview.cpp + :start-after: begin inserter-new + :end-before: end inserter-new + :dedent: 2 + +The process of reading the document is the same as above, +only the sink is different: + +.. literalinclude:: overview.cpp + :start-after: begin model-reader-new + :end-before: end model-reader-new + :dedent: 2 + +.. + Writing a Model + --------------- + + A model, or parts of a model, can be written by writing the desired range using its :func:`Range::write` method: + + .. literalinclude:: overview.cpp + :start-after: begin write-range + :end-before: end write-range + :dedent: 2 + + By default, + this writes the range in chunks suited to pretty-printing with anonymous blank nodes (like "[ ... ]" in Turtle or TriG). + The flag :enumerator:`SerialisationFlag::no_inline_objects` can be given to instead write the range in a simple SPO order, + which can be useful in other situations because it is faster and emits statements in strictly increasing order. |