Write debug level logs on error – print detailed logging only in exceptional circumstances

Lucas Jellema

TL;DR: Collect debug information in an in-memory buffer that is printed to the output when an error occurs (and quietly discarded in all other cases).

I may have come up with a useful idea. Or something that is not all that useful. Or even something that already exists (and I am an ignorant fool for not knowing about it). If the latter – my apologies.

We had a production issue this morning. Actually, it started yesterday afternoon and we only learned about it this morning. Not good. And while we good resolve the issue relatively easily – a developer remembered the change introduced into the runtime platform that was the likely and as it turned out the actual source of the the problem, we were not happy about the information available to us to perform the analysis. Our system produces some logging at runtime in the production environment, but clearly not an enormously lot of detail. In fact, as the volume handled by the system has increased over the past weeks, we have been increasing the log level (from INFO to WARN) to stem the wave of logging.  The log files gave us but the barest of clues.

Ironically, we have just had a team discussion about logging. To define the guidelines that our logging should abide by. What labelling system for our log messages will we use – to identify the source and meaning of the messages, even in an automated way. And to bring consistency in the log message output format to make automated collection and analysis of the log messages easier with the Elastic Stack, Splunk, Azure Application Monitoring and Oracle Cloud Logging.

We had a serious debate about the log levels we wanted to group our logging into and the necessity to be able at run time to manipulate the log level threshold to for example start analysing in detail in the production environment for specific components by lowering the logging threshold to INFO for example.

However, if the production log level threshold is WARN – then when the incident occurs, there will not be any INFO or DEBUG logging written in the log files. Changing the logging threshold after the incident is ‘de put dempen als het kalf al verdronken is’ if you will pardon me for this Dutch expression. It is too late.

Ideally, so goes my thinking, you can get the logging at debug level from everything immediately preceding a problematic exception in our code. We do not have to set the logging threshold on DEBUG and wade through mountains of irrelevant details but when the *** has hit the fan, the little pile of debug level logging is written to the output – as if in the period just prior to the issue someone with insider knowledge had turned on debug level logging.

Sounds nice, does it not? But how to implement this?

In my current project, our components are Serverless Functions – little Java and Node applications. It really would not be hard at all to add the following mechanism in these applications:

  • when entering a new context – method/function, iteration in a loop – create a debug log message buffer (an array)
  • push messages to the buffer that would be relevant in case of an error
  • in case of a run time exception (at least the ones that are unexpected):
    • write the contents of the buffer to the logging output – regardless of the current log level threshold, at ERROR level
    • handle the exception
  • if no exception occurred, the local buffer is removed as soon as the current context is returned from

Note that we could pass the debug buffer to called functions and use the debug buffer object in way that is very similar to the call stack. Instead of the names of the methods invoked throughout the call history, it contains the log messages. And with each successful closing of a branch in the program execution, all logging associated with that branch can be discarded. And at each level in the program execution, all previous debug logging from higher up levels is available. This probably makes sense if this debug logging needs to be retained across asynchronous steps.

In Node (JS) because of the asynchronous and single thread nature of processing, it may be necessary to define the debug buffer object in the Async Local Storage object that was just introduced in Node v13/14/v12.7. Using lower level async-hooks APIs, you can implement this in earlier versions as well. In Java, when multiple threads are running in parallel, Thread local storage may be the best place for the semi-global debug  buffer object.

A simple example in Node (warning: for same of simplicity, I have not yet used Async Local Storage which means that this example does not work for processing of multiple requests at the same time). In this example, functions calculator and calculateSums prepare their local log buffer, write messages to it and produce their contents in case of an exception. Here is the code that processes a series of calculations by dynamically evaluating strings:

CODE

https://gist.github.com/lucasjellema/5fd5b786fdf1c3f492d0aadde0a74dfb

image

And here is the outcome triggered by the clearly incorrect sum “hgah”:

image

Note that the first call to function calculateSums does not show up in the logging – because no INFO level logs were produced from that function. However, because in the second call to the function, an exception occurred, we get all the debug level output from the function, also from before the actual moment of the exception. I am surprised by the way that 2/0  did not result in an exception.

This demonstrates my point: under normal circumstances, no logging would be produced by this little application – apart from the console.info reports with the results (if the logging level allows for INFO). When an exception occurs, then the debug from the current context and the calling context is available, for instant analysis.

Does this make sense? How could this best be applied?

Another implementation: send logging to a more central object (an append and pop store) with a “savepoint” identifier. After successfully completing a function, release the logging by removing (everything logged after) the savepoint. In case of problems, print all logging for the current context’s savepoint. In code, that would look like:

https://gist.github.com/lucasjellema/d1e782433b3ee20399880b2508819969

image

Resources

Exploring Async Local Storage in Node v14 – https://blog.kuzzle.io/nodejs-14-asynclocalstorage-asynchronous-calls

Node Documentarion on AsyncLocalStorage – https://nodejs.org/api/async_hooks.html#async_hooks_class_asynclocalstorage

Medium Article on Thread Local (emulation) in NodeJS – https://medium.com/@tabu_craig/nodejs-and-thread-local-storage-eb2c1a24881

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Next Post

Introducing OCI Data Integration - New cloud native, serverless service for ETL/ELT and Data Pipelines

Facebook0TwitterLinkedinTL;DR: Oracle offers a new cloud native, serverless service on OCI for data processing and ETL/ELT, called Data Integration. It seems a new incarnation of Oracle Data Integrator or even of Warehouse Builder. It provides data flows that can filter, convert, join and aggregate. It currently only supports Object Storage […]