diff --git a/doc/source/developer_reference.rst b/doc/source/developer_reference.rst index 88b0a558..e0e64c13 100644 --- a/doc/source/developer_reference.rst +++ b/doc/source/developer_reference.rst @@ -22,3 +22,6 @@ Developer Reference .. include:: developer_reference/revent.rst +----------------- + +.. include:: developer_reference/serialization.rst diff --git a/doc/source/developer_reference/serialization.rst b/doc/source/developer_reference/serialization.rst new file mode 100644 index 00000000..64859e69 --- /dev/null +++ b/doc/source/developer_reference/serialization.rst @@ -0,0 +1,123 @@ +.. _serialization: + +Serialization +============= + +Overview of Serialization +------------------------- + +WA employs a serialization mechanism in order to store some of its internal +structures inside the output directory. Serialization is performed in two +stages: + +1. A serializable object is converted into a POD (Plain Old Data) structure + consisting of primitive Python types, and a few additional types (see + :ref:`wa-pods` below). +2. The POD structure is serialized into a particular format by a generic + parser for that format. Currently, `yaml` and `json` are supported. + +Deserialization works in reverse order -- first the serialized text is parsed +into a POD, which is then converted to the appropriate object. + + +Implementing Serializable Objects +--------------------------------- + +In order to be considered serializable, an object must either be a POD, or it +must implement the ``to_pod()`` method and ``from_pod`` static/class method, +which will perform the conversion to/form pod. + +As an example, below as a (somewhat trimmed) implementation of the ``Event`` +class: + +.. code-block:: python + + class Event(object): + + @staticmethod + def from_pod(pod): + instance = Event(pod['message']) + instance.timestamp = pod['timestamp'] + return instance + + def __init__(self, message): + self.timestamp = datetime.utcnow() + self.message = message + + def to_pod(self): + return dict( + timestamp=self.timestamp, + message=self.message, + ) + + +Serialization API +----------------- + +.. function:: read_pod(source, fmt=None) +.. function:: write_pod(pod, dest, fmt=None) + + These read and write PODs from a file. The format will be inferred, if + possible, from the extension of the file, or it may be specified explicitly + with ``fmt``. ``source`` and ``dest`` can be either strings, in which case + they will be interpreted as paths, or they can be file-like objects. + +.. function:: is_pod(obj) + + Returns ``True`` if ``obj`` is a POD, and ``False`` otherwise. + +.. function:: dump(o, wfh, fmt='json', \*args, \*\*kwargs) +.. function:: load(s, fmt='json', \*args, \*\*kwargs) + + These implment an altenative serialization interface, which matches the + interface exposed by the parsers for the supported formats. + + +.. _wa-pods: + +WA POD Types +------------ + +POD types are types that can be handled by a serializer directly, without a need +for any additional information. These consist of the build-in python types :: + + list + tuple + dict + set + str + unicode + int + float + bool + +...the standard library types :: + + OrderedDict + datetime + +...and the WA-defined types :: + + regex_type + none_type + level + cpu_mask + +Any structure consisting entirely of these types is a POD and can be serialized +and then deserialized without losing information. It is important to note that +only these specific types are considered POD, their subclasses are *not*. + +.. note:: ``dict``\ s get deserialized as ``OrderedDict``\ s. + + +Serialization Formats +--------------------- + +WA utilizes two serialization formats: YAML and JSON. YAML is used for files +intended to be primarily written and/or read by humans; JSON is used for files +intended to be primarily written and/or read by WA and other programs. + +The parsers and serializers for these formats used by WA have been modified to +handle additional types (e.g. regular expressions) that are typically not +supported by the formats. This was done in such a way that the resulting files +are still valid and can be parsed by any parser for that format.