.. testsetup:: *
import serd
========
Overview
========
Serd is a lightweight C library for working with RDF data. This is the
documentation for its Python bindings, which also serves as a gentle
introduction to the basics of RDF.
Serd is designed for high-performance or resource-constrained applications, and
makes it possible to work with very large documents quickly and/or using
minimal memory. In particular, it is dramatically faster than `rdflib
`_, though it is less fully-featured
and not pure Python.
Nodes
=====
Nodes are the basic building blocks of data. Nodes are essentially strings:
>>> print(serd.uri("http://example.org/something"))
http://example.org/something
>>> print(serd.string("hello"))
hello
>>> print(serd.decimal(1234))
1234.0
>>> len(serd.string("hello"))
5
However, nodes also have a :meth:`~serd.Node.type`, and optionally either a
:meth:`~serd.Node.datatype` or :meth:`~serd.Node.language`.
Representation
--------------
The string content of a node as shown above can be ambiguous. For example, it
is impossible to tell a URI from a string literal using only their string
contents. The :meth:`~serd.Node.to_syntax` method returns a complete
representation of a node, in the `Turtle `_
syntax by default:
>>> print(serd.uri("http://example.org/something").to_syntax())
>>> print(serd.string("hello").to_syntax())
"hello"
>>> print(serd.decimal(1234).to_syntax())
1234.0
Note that the representation of a node in some syntax *may* be the same as the
``str()`` contents which are printed, but this is usually not the case. For
example, as shown above, URIs and strings are quoted differently in Turtle.
A different syntax can be used by specifying one explicitly:
>>> print(serd.decimal(1234).to_syntax(serd.Syntax.NTRIPLES))
"1234.0"^^
An identical node can be recreated from such a string using the
:meth:`~serd.Node.from_syntax` method:
>>> node = serd.decimal(1234)
>>> copy = serd.Node.from_syntax(node.to_syntax()) # Don't actually do this
>>> print(copy)
1234.0
Alternatively, the ``repr()`` builtin will return the Python construction
representation:
>>> repr(serd.decimal(1234))
'serd.typed_literal("1234.0", "http://www.w3.org/2001/XMLSchema#decimal")'
Any node can be round-tripped to and from a string using these methods. That
is, for any node `n`, both::
serd.Node.from_syntax(n.to_syntax())
and::
eval(repr(n))
produce an equivalent node. Using the `to_syntax()` method is generally
recommended, since it uses standard syntax.
Primitives
----------
For convenience, nodes can be constructed from Python primitives by simply
passing a value to the constructor:
>>> repr(serd.Node(True))
'serd.boolean(True)'
>>> repr(serd.Node("hello"))
'serd.string("hello")'
>>> repr(serd.Node(1234))
'serd.typed_literal("1234", "http://www.w3.org/2001/XMLSchema#integer")'
>>> repr(serd.Node(12.34))
'serd.typed_literal("1.234E1", "http://www.w3.org/2001/XMLSchema#double")'
Note that it is not possible to construct every type of node this way, and care
should be taken to not accidentally construct a string literal where a URI is
desired.
Fundamental Constructors
------------------------
As the above examples suggest, several node constructors are just convenience
wrappers for more fundamental ones. All node constructors reduce to one of the
following:
* :func:`serd.plain_literal` - A string with optional language, like
``"hallo"@de`` in Turtle.
* :func:`serd.typed_literal` - A string with optional datatype, like
``"1.2E9"^^xsd:float`` in Turtle.
* :func:`serd.blank` - A blank node, like "b42", which would be ``_:b42`` in
Turtle.
* :func:`serd.curie` - A compact URI, like "eg:name".
* :func:`serd.uri` - A URI, like "http://example.org", which would be
```` in Turtle.
Convenience Constructors
------------------------
* :func:`serd.string` - A string literal with no language or datatype.
* :func:`serd.decimal` - An `xsd:decimal
`_ like "123.45".
* :func:`serd.double` - An `xsd:double
`_ like "1.2345E2".
* :func:`serd.float` - An `xsd:float
`_ like "1.2345E2".
* :func:`serd.integer` - An `xsd:integer
`_ like "1234567".
* :func:`serd.boolean` - An `xsd:boolean
`_ like "true" or "false".
* :func:`serd.blob` - An `xsd:base64Binary
`_ like "aGVsbG8=".
* :func:`serd.file_uri` - A file URI like "file:///doc.ttl".
Namespaces
==========
It is common to use many URIs that share a common prefix. The
:class:`~serd.Namespace` utility class can be used to make code more readable
and make mistakes less likely:
>>> eg = serd.Namespace("http://example.org/")
>>> print(eg.thing)
http://example.org/thing
.. testsetup:: *
eg = serd.Namespace("http://example.org/")
Dictionary syntax can also be used:
>>> print(eg["thing"])
http://example.org/thing
For convenience, namespaces also act like strings in many cases:
>>> print(eg)
http://example.org/
>>> print(eg + "stringeyName")
http://example.org/stringeyName
Note that this class is just a simple syntactic convenience, it does not
"remember" names and there is no corresponding C API.
Statements
==========
A :class:`~serd.Statement` is a tuple of either 3 or 4 nodes: the subject,
predicate, object, and optional graph. Statements declare that a subject has
some property. The predicate identifies the property, and the object is its
value.
A statement is a bit like a very simple machine-readable sentence. The
"subject" and "object" are as in natural language, and the predicate is like
the verb, but more general. For example, we could make a statement in English
about your intrepid author:
drobilla has the first name "David"
We can break this statement into 3 pieces like so:
.. list-table::
:header-rows: 1
* - Subject
- Predicate
- Object
* - drobilla
- has the first name
- "David"
To make a :class:`~serd.Statement` out of this, we need to define some URIs. In
RDF, the subject and predicate must be *resources* with an identifier (for
example, neither can be a string). Conventionally, predicate names do not
start with "has" or similar words, since that would be redundant in this
context. So, we assume that ``http://example.org/drobilla`` is the URI for
drobilla, and ``http://example.org/firstName`` has been defined somewhere to be
a property with the appropriate meaning, and can make an equivalent
:class:`~serd.Statement`:
>>> print(serd.Statement(eg.drobilla, eg.firstName, serd.string("David")))
"David"
If you find this terminology confusing, it may help to think in terms of
dictionaries instead. For example, the above can be thought of as equivalent
to::
drobilla[firstName] = "David"
or::
drobilla.firstName = "David"
Accessing Fields
----------------
Statement fields can be accessed via named methods or array indexing:
>>> statement = serd.Statement(eg.s, eg.p, eg.o, eg.g)
>>> print(statement.subject())
http://example.org/s
>>> print(statement[serd.Field.SUBJECT])
http://example.org/s
>>> print(statement[0])
http://example.org/s
Graph
-----
The graph field can be used as a context to distinguish otherwise identical
statements. For example, it is often set to the URI of the document that the
statement was loaded from:
>>> print(serd.Statement(eg.s, eg.p, eg.o, serd.uri("file:///doc.ttl")))
The graph field is always accessible, but may be ``None``:
>>> triple = serd.Statement(eg.s, eg.p, eg.o)
>>> print(triple.graph())
None
>>> quad = serd.Statement(eg.s, eg.p, eg.o, eg.g)
>>> print(quad.graph())
http://example.org/g
World
=====
So far, we have only used nodes and statements, which are simple independent
objects. Higher-level facilities in serd require a :class:`~serd.World` which
represents the global library state.
A program typically uses just one world, which can be constructed with no
arguments::
world = serd.World()
.. testsetup:: *
world = serd.World()
Note that the world is not a database, it only manages a small amount of
library state for things like configuration and logging.
All "global" state is handle explicitly via the world. Serd does not contain
any static mutable data, making it suitable for use in modules or plugins. If
multiple worlds *are* used in a single program, they must never be mixed:
objects "inside" one world can not be used with objects inside another.
Generating Blanks
-----------------
Blank nodes, or simply "blanks", are used for resources that do not have URIs.
Unlike URIs, they are not global identifiers, and only have meaning within
their local context (for example, a document). The world provides a method for
automatically generating unique blank identifiers:
>>> print(repr(world.get_blank()))
serd.blank("b1")
>>> print(repr(world.get_blank()))
serd.blank("b2")
Model
=====
A :class:`~serd.Model` is an indexed set of statements. A model can be used to
store any set of data, from a few statements (for example, a protocol message),
to an entire document, to a database with millions of statements.
A model can be constructed and statements inserted manually using the
:meth:`~serd.Model.insert` method. Tuple syntax is supported as a shorthand
for creating statements:
>>> model = serd.Model(world)
>>> model.insert((eg.s, eg.p, eg.o1))
>>> model.insert((eg.s, eg.p, eg.o2))
>>> model.insert((eg.t, eg.p, eg.o3))
.. testsetup:: model_manual
import serd
eg = serd.Namespace("http://example.org/")
world = serd.World()
model = serd.Model(world)
model.insert((eg.s, eg.p, eg.o1))
model.insert((eg.s, eg.p, eg.o2))
model.insert((eg.t, eg.p, eg.o3))
Iterating over the model yields every statement:
>>> for s in model: print(s)
Familiar Pythonic collection operations work as you would expect:
>>> print(len(model))
3
>>> print((eg.s, eg.p, eg.o4) in model)
False
>>> model += (eg.s, eg.p, eg.o4)
>>> print((eg.s, eg.p, eg.o4) in model)
True
Pattern Matching
----------------
The :meth:`~serd.Model.ask` method can be used to check if a statement is in a
model:
>>> print(model.ask(eg.s, eg.p, eg.o1))
True
>>> print(model.ask(eg.s, eg.p, eg.s))
False
This method is more powerful than the ``in`` statement because it also does
pattern matching. To check for a pattern, use `None` as a wildcard:
>>> print(model.ask(eg.s, None, None))
True
>>> print(model.ask(eg.unknown, None, None))
False
The :meth:`~serd.Model.count` method works similarly, but instead returns the
number of statements that match the pattern:
>>> print(model.count(eg.s, None, None))
3
>>> print(model.count(eg.unknown, None, None))
0
Getting Values
--------------
Sometimes you are only interested in a single node, and it is cumbersome to
first search for a statement and then get the node from it. The
:meth:`~serd.Model.get` method provides a more convenient way to do this. To
get a value, specify a triple pattern where exactly one field is ``None``. If
a statement matches, then the node that "fills" the wildcard will be returned:
>>> print(model.get(eg.t, eg.p, None))
http://example.org/o3
If multiple statements match the pattern, then the matching node from an
arbitrary statement is returned. It is an error to specify more than one
wildcard, excluding the graph.
Erasing Statements
------------------
>>> model2 = model.copy()
>>> for s in model2: print(s)
Individual statements can be erased by value, again with tuple syntax supported
for convenience:
>>> model2.erase((eg.s, eg.p, eg.o1))
>>> for s in model2: print(s)
Many statements can be erased at once by erasing a range:
>>> model2.erase(model2.range((eg.s, None, None)))
>>> for s in model2: print(s)
Saving Documents
----------------
Serd provides simple methods to save an entire model to a file or string, which
are similar to functions in the standard Python ``json`` module.
A model can be saved to a file with the :meth:`~serd.World.dump` method:
.. doctest::
:options: +NORMALIZE_WHITESPACE
>>> world.dump(model, "out.ttl")
>>> print(open("out.ttl", "r").read())
,
,
.
.
Similarly, a model can be written as a string with the :meth:`serd.World.dumps`
method:
.. doctest::
:options: +ELLIPSIS
>>> print(world.dumps(model))
...
Loading Documents
-----------------
There are also simple methods to load an entire model, again loosely following
the standard Python ``json`` module.
A model can be loaded from a file with the :meth:`~serd.World.load` method:
>>> model3 = world.load("out.ttl")
>>> print(model3 == model)
True
By default, the syntax type is determined by the file extension, and only
:attr:`serd.ModelFlags.INDEX_SPO` will be set, so only ``(s p ?)`` and ``(s ?
?)`` queries will be fast. See the method documentation for how to control
things more precisely.
Similarly, a model can be loaded from a string with the
:meth:`~serd.World.loads` method:
>>> ttl = "<{}> <{}> <{}> .".format(eg.s, eg.p, eg.o)
>>> model4 = world.loads(ttl)
>>> for s in model4: print(s)
File Cursor
-----------
When data is loaded from a file into a model with the flag
:data:`~serd.ModelFlags.STORE_CURSORS`, each statement will have a *cursor*
which describes the file name, line, and column where the statement originated.
The cursor points to the start of the object node in the statement:
>>> model5 = world.load("out.ttl", model_flags=serd.ModelFlags.STORE_CURSORS)
>>> for s in model5: print(s.cursor())
out.ttl:2:24
out.ttl:3:2
out.ttl:4:2
out.ttl:7:24
Streaming Data
==============
More advanced input and output can be performed by using the
:class:`~serd.Reader` and :class:`~serd.Writer` classes directly. The Reader
produces an :class:`~serd.Event` stream which describes the content of the
file, and the Writer consumes such a stream and writes syntax.
Reading Files
-------------
The reader reads from a source, which should be a :class:`~serd.FileSource`
to read from a file. Parsed input is sent to a sink, which is
called for each event:
.. testcode::
def sink(event):
print(event)
env = serd.Env()
reader = serd.Reader(world, serd.Syntax.TURTLE, 0, env, sink, 4096)
with reader.open(serd.FileSource("out.ttl")) as context:
context.read_document()
.. testoutput::
:options: +ELLIPSIS
serd.Event.statement(serd.Statement(serd.uri("http://example.org/s"), serd.uri("http://example.org/p"), serd.uri("http://example.org/o1"), serd.Cursor(serd.uri("out.ttl"), 2, 24)))
...
For more advanced use cases that keep track of state, the sink can be a custom
:class:`~serd.Sink` with a call operator:
.. testcode::
class MySink(serd.Sink):
def __init__(self):
super().__init__()
self.events = []
def __call__(self, event: serd.Event) -> serd.Status:
self.events += [event]
return serd.Status.SUCCESS
env = serd.Env()
sink = MySink()
reader = serd.Reader(world, serd.Syntax.TURTLE, 0, env, sink, 4096)
with reader.open(serd.FileSource("out.ttl")) as context:
context.read_document()
print(sink.events[0])
.. testoutput::
serd.Event.statement(serd.Statement(serd.uri("http://example.org/s"), serd.uri("http://example.org/p"), serd.uri("http://example.org/o1"), serd.Cursor(serd.uri("out.ttl"), 2, 24)))
Reading Strings
---------------
To read from a string, use a :class:`~serd.StringSource` with the reader:
.. testcode::
ttl = """
@base .
@prefix eg: .
eg:name "Serd" .
"""
def sink(event):
print(event)
env = serd.Env()
reader = serd.Reader(world, serd.Syntax.TURTLE, 0, env, sink, 4096)
with reader.open(serd.StringSource(ttl)) as context:
context.read_document()
.. testoutput::
serd.Event.base("http://drobilla.net/")
serd.Event.prefix("eg", "http://example.org/")
serd.Event.statement(serd.Statement(serd.uri("http://drobilla.net/sw/serd"), serd.uri("http://example.org/name"), serd.string("Serd"), serd.Cursor(serd.string("string"), 4, 19)))
Reading into a Model
--------------------
To read new data into an existing model,
send it to the sink returned by :meth:`~serd.Model.inserter`:
.. testcode::
ttl = """
@prefix eg: .
eg:newSubject eg:p eg:o .
"""
env = serd.Env()
sink = model.inserter(env)
reader = serd.Reader(world, serd.Syntax.TURTLE, 0, env, sink, 4096)
with reader.open(serd.StringSource(ttl)) as context:
context.read_document()
for s in model: print(s)
.. testoutput::
Writing Files
-------------
.. testcode::
env = serd.Env()
byte_sink = serd.FileSink("written.ttl")
writer = serd.Writer(world, serd.Syntax.TURTLE, 0, env, byte_sink)
st = model.all().serialise(writer.sink(), 0)
writer.finish()
byte_sink.close()
print(open("written.ttl", "r").read())
.. testoutput::
:options: +NORMALIZE_WHITESPACE
.
,
,
.
.