doc: re-write augmentations docs

Rewrite the "Output processors and Instruments" section into "Augmentations" section.
2025-07-25 00:09:45 +01:00 · 2018-05-14 14:34:17 +01:00
parent c6fae6fa55
commit d0368cf176
1 changed files with 152 additions and 173 deletions
--- a/doc/source/how_tos/users/agenda.rst
+++ b/doc/source/how_tos/users/agenda.rst
@@ -504,206 +504,185 @@ turn override global settings.
-Output Processors and Instruments
+Augmentationts
----------------------------------
+--------------
 Output Processors
 ^^^^^^^^^^^^^^^^^
 Output processors, as the name suggests, handle the processing of output
 generated form running workload specs. By default, WA enables a couple of basic
 output processors (e.g. one generates a csv file with all scores reported by
 workloads), which you can see in ``~/.workload_automation/config.yaml``. However,
 WA has a number of other, more specialized, output processors (e.g. for
 uploading to databases). You can list available output processors with
 ``wa list output_processors`` command. If you want to permanently enable a
 output processor, you can add it to your ``config.yaml``. You can also enable a
 output processor for a particular run by specifying it in the ``config`` section
 in the agenda. As the name suggests, ``config`` section mirrors the structure of
 ``config.yaml``, and anything that can be specified in the latter, can also be
 specified in the former.
 As with workloads, output processors may have parameters that define their
 behaviour. Parameters of output processors are specified a little differently,
 however. Output processor parameter values are listed in the config section,
 namespaced under the name of the output processor.
 For example, suppose we want to be able to easily query the output generated by
 the workload specs we've defined so far. We can use ``sqlite`` output processor
 to have WA create an sqlite_ database file with the results. By default, this
 file will be generated in WA's output directory (at the same level as
 results.csv); but suppose we want to store the results in the same file for
 every run of the agenda we do. This can be done by specifying an alternative
 database file with ``database`` parameter of the output processor:
 .. code-block:: yaml
        config:
                augmentations:
                    - sqlite
                sqlite:
                        database: ~/my_wa_results.sqlite
                iterations: 5
        workloads:
                - id: 01_dhry
                  name: dhrystone
                  label: dhrystone_15over6
                  runtime_params:
                        cpu0_governor: performance
                  workload_params:
                        threads: 6
                        mloops: 15
                - id: 02_memc
                  name: memcpy
                - id: 03_cycl
                  name: cyclictest
                  iterations: 10
 A couple of things to observe here:
 - There is no need to repeat the output processors listed in ``config.yaml``. The
  processors listed in ``augmentations`` entry in the agenda will be used
  *in addition to* those defined in the ``config.yaml``.
 - The database file is specified under "sqlite" entry in the config section.
  Note, however, that this entry alone is not enough to enable the output
  processor, it must be listed in ``augmentations``, otherwise the "sqilte"
  config entry will be ignored.
 - The database file must be specified as an absolute path, however it may use
  the user home specifier '~' and/or environment variables.
 .. _sqlite: http://www.sqlite.org/
 Augmentations are plugins that augment the execution of workload jobs with
 additional functionality; usually, that takes the form of generating additional
 metrics and/or artifacts, such as traces or logs. There are two types of
 augmentations:
 Instruments
-^^^^^^^^^^^
+        These "instrument" a WA run in order to change it's behavior (e.g.
        introducing delays between successive job executions), or collect
        additional measurements (e.g. energy usage). Some instruments may depend
        on particular features being enabled on the target (e.g. cpufreq), or
        on additional hardware (e.g. energy probes).
-WA can enable various "instruments" to be used during workload execution.
+Output processors
-Instruments can be quite diverse in their functionality, but the majority of
+        These post-process metrics and artifacts generated by workloads or
-instruments available in WA today are there to collect additional data (such as
+        instruments, as well as target metadata collected by WA, in order to
-trace) from the device during workload execution. You can view the list of
+        generate additional metrics and/or artifacts (e.g. generating statistics
-available instruments by using ``wa list instruments`` command. As with output
+        or reports). Output processors are also used to export WA output
-processors, a few are enabled by default in the ``config.yaml`` and additional
+        externally (e.g. upload to a database).
 ones may be added in the same place, or specified in the agenda using
 ``augmentations`` entry.
-For example, we can collect power events from trace cmd by using the ``trace-cmd``
+The main practical difference between instruments and output processors, is that
-instrument.
+the former rely on an active connection to the target to function, where as the
 latter only operated on previously collected results and metadata. This means
 that output processors can run "off-line" using ``wa process`` command.
 Both instruments and output processors are configured in the same way in the
 agenda, which is why they are grouped together into "augmentations".
 Augmentations are enabled by listing them under ``augmentations`` entry in a
 config file or ``config`` section of the agenda.
 .. code-block:: yaml
        config:
-            augmentations:
+                augmentations: [trace-cmd]
                - trace-cmd
                - csv
            trace-cmd:
                    trace_events: ['power*']
            iterations: 5
        workloads:
            - id: 01_dhry
              name: dhrystone
              label: dhrystone_15over6
              runtime_params:
                    cpu0_governor: performance
              workload_params:
                    threads: 6
                    mloops: 15
            - id: 02_memc
              name: memcpy
            - id: 03_cycl
              name: cyclictest
              iterations: 10
-Instruments are not "free" and it is advisable not to have too many enabled at
+The code above illustrates an agenda entry to enabled ``trace-cmd`` instrument.
 once as that might skew results. For example, you don't want to have power
 measurement enabled at the same time as event tracing, as the latter may prevent
 cores from going into idle states and thus affecting the reading collected by
 the former.
-Instruments, like output processors, may be enabled (and disabled -- see below)
+If your have multiple ``augmentations`` entries (e.g. both, in your config file
-on per-spec basis. For example, suppose we want to collect /proc/meminfo from the
+and in the agenda), then they will be combined, so that the final  set of
-device when we run ``memcpy`` workload, but not for the other two. We can do that using
+augmentations for the run  will be their union.
-``sysfs_extractor`` instrument, and we will only enable it for ``memcpy``:
+
 .. note:: WA2 did not have have augmentationts, and instead supported
          "instrumentation" and "result_processors" as distinct configuration
          enetries. For compantibility, these entries are still supported in
          WA3, however they should be considered to be depricated, and their
          use is discouraged.
 Configuring augmentations
 ^^^^^^^^^^^^^^^^^^^^^^^^^
 Most augmentations will take parameters that modify their behavior. Parameters
 available for a particular augmentation can be viewed using ``wa show
 <augmentation name>`` command. This will also show the default values used.
 Values for these parameters can be specified by creating an entry with the
 augmentation's name, and specifying parameter values under it.
 .. code-block:: yaml
        config:
-            augmentations:
+                augmentations: [trace-cmd]
-                - trace-cmd
+                trace-cmd:
-                - csv
+                        events: ['sched*', 'power*', irq]
-            trace-cmd:
+                        buffer_size: 100000
                    trace_events: ['power*']
            iterations: 5
        workloads:
                - id: 01_dhry
                  name: dhrystone
                  label: dhrystone_15over6
                  runtime_params:
                        cpu0_governor: performance
                  workload_params:
                        threads: 6
                        mloops: 15
                - id: 02_memc
                  name: memcpy
                  augmentations: [sysfs_extractor]
                - id: 03_cycl
                  name: cyclictest
                  iterations: 10
-As with ``config`` sections, the ``augmentations`` entry in the spec needs only to
+The code above specifies values for ``events`` and ``buffer_size`` parameters
-list additional instruments and does not need to repeat instruments specified
+for the ``trace-cmd`` instrument, as well as enabling it.
-elsewhere.
+
 You may specify configuration for the same augmentation in multiple locations
 (e.g. your config file and the config section of the agenda). These entries will
 be combined to form the final configuration for the augmentation used during the
 run. If different values for the same parameter are present in multiple entries,
 the ones "more specific" to a particular run will be used (e.g. values in the
 agenda will override those in the config file).
 .. note:: Creating an entry for an augmentation alone does not enable it! You
          **must** list it under ``augmentations`` in order for it to be enabed
          for a run. This makes it easier to quickly enabled and diable
          augmentations with complex configurations, and also allows defining
          "static" configuation in top-level config, without actually enabling
          the augmentation for all runs.
 .. note:: At present, it is only possible to enable/disable instruments  on
          per-spec base. It is *not* possible to provide configuration on
          per-spec basis in the current version of WA (e.g. in our example, it
          is not possible to specify different ``sysfs_extractor`` paths for
          different workloads). This restriction may be lifted in future
          versions of WA.
 Disabling augmentations
 ^^^^^^^^^^^^^^^^^^^^^^^
-As seen above, plugins specified with ``augmentations`` clauses get added to
+Sometimes, you may wish to disable an augmentation for a particular run, but you
-those already specified previously. Just because an instrument specified in
+want to keep it enabled in general. You *could* modify your config file to
-``config.yaml`` is not listed in the ``config`` section of the agenda, does
+temporarily disable it. However, you must then remember to re-enable it
-not mean it will be disabled. If you do want to disable an instrument, you can
+afterwards. This could be inconvenient and error prone, especially if you're
-always remove/comment it out from ``config.yaml``. However that will be
+running multiple experiments in parallel and only want to disable the
-introducing a permanent configuration change to your environment (one that can
+augmentation for one of them.
 be easily reverted, but may be just as easily forgotten). If you want to
 temporarily disable a output processor or an instrument for a particular run,
 you can do that in your agenda by prepending a tilde (``~``) to its name.
-For example, let's say we want to disable ``cpufreq`` instrument enabled in our
+Instead, you can explicitly disable augmentation by specifying its name prefixed
-``config.yaml`` (suppose we're going to send results via email and so want to
+with a tilde (``~``) inside ``augumentations``.
 reduce to total size of the output directory):
 .. code-block:: yaml
        config:
-                iterations: 5
+                augmentations: [trace-cmd, ~cpufreq]
-                augmentations:
+
-                    - ~cpufreq
+The code above enables ``trace-cmd`` instrument and disables ``cpufreq``
-                    - csv
+instrument (which is enabled in the default config).
-                sysfs_extractor:
+
-                        paths: [/proc/meminfo]
+If you want to start configuration for an experiment form a "blank slate" and
-                csv:
+want to disable all previously-enabled augmentations, without necessarily
-                    use_all_classifiers: True
+knowing what they are, you can use the special ``~~`` entry.
 .. code-block:: yaml
        config:
                augmentations: [~~, trace-cmd, csv]
 The code above disables all augmentations enabled up to that point, and enabled
 ``trace-cmd`` and ``csv`` for this run.
 .. note:: The ``~~`` only disables augmentations from previously-processed
          sources. Its ordering in the list does not matter. For example,
          specifying ``augmentations: [trace-cmd, ~~, csv]`` will have exactly
          the same effect as above -- i.e. both trace-cmd *and* csv will be
          enabled.
 Workload-specific augmentation
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 It is possible to enable or disable (but not configure) augmentations at
 workload or section level, as well as in the global config, in which case, the
 augmentations would only be enabled/disabled for that workload/section. If the
 same augmentation is enabled at one level and disabled at another, as will all
 WA configuration, the more specific settings will take precedence over the less
 specific ones (i.e. workloads override sections that, in turn, override global
 config).
 Augmentations Example
 ^^^^^^^^^^^^^^^^^^^^^
 .. code-block:: yaml
        config:
                augmentations: [~~, fps]
                trace-cmd:
                        events: ['sched*', 'power*', irq]
                        buffer_size: 100000
                file_poller:
                        files:
                                - /sys/class/thermal/thermal_zone0/temp
        sections:
                - classifers:
                        type: energy
                augmentations: [energy_measurement]
                - classifers:
                        type: trace
                augmentations: [trace-cmd, file_poller]
        workloads:
-                - id: 01_dhry
+                - gmail
-                  name: dhrystone
+                - geekbench
-                  label: dhrystone_15over6
+                - googleplaybooks
-                  runtime_params:
+                - name: dhrystone
-                        cpu0_governor: performance
+                  augmentations: [~fps]
-                  workload_params:
+
-                        threads: 6
+The example above shows an experiment that runs a number of workloads in order
-                        mloops: 15
+to evaluate their thermal impact and energy usage. All previously-configured
-                - id: 02_memc
+augmentations are disabled with ``~~``, so that only configuration specified in
-                  name: memcpy
+this agenda is enabled. Since most of the workloads are "productivity" use cases
-                  augmentations: [sysfs_extractor]
+that do not generate their own metrics, ``fps`` instrument is enabled to get
-                - id: 03_cycl
+some meaningful performance metrics for them; the only exception is
-                  name: cyclictest
+``dhrystone`` which is a benchmark that reports its own metrics and has not GUI,
-                  iterations: 10
+so the instrument is disabled for it using ``~fps``.
 Each workload will be run in two configurations: once, to collect energy
 measurements, and once to collect thermal data and kernel trace. Trace can give
 insight into why a workload is using more or less energy than expected, but it
 can be relatively intrusive and might impact absolute energy and performance
 metrics, which is why it is collected separately. classifiers_ are used to
 separate metrics from the two configurations in the results.
 Other Configuration
 -------------------