Modify execution so that output processors' process_job_output() (but
not export_job_output()) is wrapped by the JOB_OUTPUT_PROCESSED signals.
This makes sense conceptually, and is more useful in practice, as there
are already WORKLOAD_RESULT_EXTRACTION and WORKLOAD_OUTPUT_UPDATE
signals sent by the job, if it's desirable to run before output
processors, but after the job results have been collected.
As part of resolving a resource, record its MD5 hash in the output
metadata. In order to enable this, resource resolution is now done via
the context, rather than directly via the ResourceResolver (in order not
to introduce a context dependency in the Gui object, context now
emulates the resolver interface).
Metadata is a key-value mapping for arbitrary data, similar to
classifiers. Unlike classifiers, metadata does not directly relate to
the results of the execution, but to the execution itself, and typically
would not be processed by Output Processors in the same way as
classifiers. Metadata can also be a lot more free-form in it's value
structure; while classifier values are simple scalars, metadata values
can be arbitrary POD structures.
If the device is to perform an initial reboot, allow up to `max_retries`
from the run configuration for the reboot to succeed and add logging to
inform the user.
Treat the optional reboot of the device as part of the job itself so that
if the reboot fails it will be retried in accordance to the job
configuration.
Instead of starting and stopping the target from inside the Context,
move the calls to the Runner. This allows for the any initilizaton
errors to be dealt with as part of the job and retried as
appropriate.
Expose the RebootPolicy as an attribute of the ExectionContext and ensure
that that the target is not automatically rebooted if the reboot
policy is set to "Never".
- Have the job manage the indent of each of its stages
- Switch output_processor to using indentcontext
- Move the "Configuring augmentations" log line into Job -- the
rest of the stages are logged there, but this was done by the
Executor for some reason.
- Use indentcontext inside initialize_run to make sure log level
is dedented on error.
- Ensure Executor postamble always runs, event if runner errors.
- Fix format_duration() to handle sub-second timedeltas.
Record UI state if an error occurs during setup, run, and output
processing stages (for other stages, the UI state is unlikely to be
relevant as they typically would not include UI manipulation).
Hitting CTRL-C will abort execution of the current job, but will still
trigger run finalization, and possibly, post-processing and teardown of
the current job. If an exception is raised during this
post-process/teardown, the previous exception state (for the
KeybardInterrupt) will be clobbered. That means that, after the new
exception has been handled, WA would attempt to execute the next job,
rather than go to finalization of the run.
To avoid this, set a flag in the context upon catching KeybardInterrupt,
and check this flag before attempting to execute the next job in the
queue.
Rather than relying on a custom Logger with a context to add events when
they are logged, have the runner register hooks for corresponding
signals that do that.
The issue with the previous approach is that the event ended up
being added too late -- after the current job was already cleared, so
all events were added at run level.
With the new approach, the event will be added when it is first logged,
while there is still a current job being set. There will be no
duplication for Exceptions being re-raised and handled at different
levels because log_error() will ensure that each Exception is logged
only once.
- Keep track of logged exceptions inside log_error itself.
- signal: log the exception, if there is one in the finally clause of
the signal wrapper; this will ensure that the error will be logged
closer to the command that originated.
- entrypoint: use log.log_error for top-level error logging, rather than
the entrypoint logger directly; this will ensure that errors are not
repeated unnecessarily.
- Log CTRL-C message at zeroth indent level to make it easier to see in
the non-verbose output where it occurred.
If a target error occurs, check whether the target is unresponsive. If
it is, attempt to hard reset it if possible, or gracefully terminate
execution if not.
ExecutionContext.end_run() does final updates to the run info in the run
output (final status, run duration, etc). This was previously accessed
via self.output in the context. Typically, this would correctly resolve
to the run output, as there would be no current job. However, in the
event of a crash, current_job would be set, and this would resolve to
the job output itself, resulting in run info not being updated. Use
run_output to avoid this.
Save classifiers at Result as well as Metric level. Reason: when
processing output, one might want to filter complete results, as well as
individual metrics. While it is in theory possible to get the
classifiers for a job by simply extracting the common classifiers
between all metrics, this fails when there are no metrics generated for
a job (note that one might still want to process the output in this
case, e.g. for the artifacts).
Make TargetInfo an attribute of run output, replacing the read/write
methods for the targetfile. Instead, always load it on creation, if
targetfile exists (useful for external scripts), and have a method to
set it after creation (uselful during WA run, where the output is
created before connecting to the target).
Remove wa.framework.plugin.Artifact and associated references. The name
of the class clashes with the class from output and can potentially
cause confusion.
The original intention for this was to be an "expected artifact
descriptor" of sorts that plugins can specify for validation purposes,
but that functionality was never implemented. Given that the framework
has undergone significant changes since this was implemented, it's not
clear that this is the best way to go about the original goal.
Therefore remove this for now.
Ensure that job output is processed even if a workload fails. This is
because output processing includes things like extracting logs, which
we still want to happen on failure.
Job status is now also set correctly when an error occurs during output
processing rather than actual running of the workload. Previously, the
status would be correctly set to PARTIAL in the inner except clause,
but the exception is then re-raised, and the status was "upgraded" to
FAILED in the outer except clause.
This maintains the default behaviour of bailing out immediately if any workload
fails in initialize(), but adds a setting, bail_on_init_failure, to change this
behaviour optionally. This can be useful where WA is being used more as a batch
processor.
Changes to the Status enum introduced by 31a535b5 and a9959550 broke
ran Jobs summary status at the end of the run. This fixes it so that the
total number of jobs and individual status counts are reported
correctly.
Set context for the loggers of the Runner, the workloads and the
installed instruments and processors. Errors/warnings logged by these
entities will be automatically added as events.
- Make sure TargetManager.finalize() actually gets called at the end
of the run.
- Overrule the "diconnect" parameter behavior for gem5 and make sure it
always disconnects. This necessary for stats to be generated properly.
Gem5Platform requires a host output directory as one if it's
instantiation parameters. This is not something we want to expose a
configuration parameter to the user, as for WA, the standard output
directory ought to be used.
Up to this point, WA's target instatiation process assumed that all
parameters came from the user, and there was no way for WA itself to set
them. This commit adds extra_platform_parms argument to
instantiate_target, to remedi this.
extra_platform_parms is then used to set the host output directory for
gem5 appropriately.