It was through inspecting some Python code that relied quite heavily that I suddenly realized the beauty of the ES6 concept of generators and the yield keyword. A generator function does not return its result all at once but instead an iterator that can be read from, one value at a time. A for loop can be constructed that iterates over the results from a function – but the function does not have to create its entire result set before the for loop can start doing any work. The generator yields results when asked for them, lazily doing the work it was asked to do. And when no more results are required, no more results are produced and yielded.
Note: for those of you who are into PL/SQL and know pipelined table functions, this must sound somewhat familiar.
Let’s take a look at some interesting examples.
This code contains a for loop over the result from a function call. The for…of syntax (see: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/for…of ) allows iterating over Array, Map, Set, Array-like objects such as arguments, NodeList and user-defined iterables. It also allows iteration over the result of a generator function – a pipelined or step-by-step returned result.
//print the alphabet for (let ch of alphabet()) { console.log(ch) }
In this case, the function alphabet() is defined like this:
function* alphabet() { var n = 0 while (n < 26) { yield String.fromCharCode(97 + n++); } }
Nothing very special. The most eye catching elements are the * postfix in the function keyword and the yield call in the while loop. Simply put: the asterisk turns the function into a generator function (a function whose result can be iterated over by the caller) and the yield is the action that returns a result. One element out of the potentially many that this function could return (if the caller keeps asking for more results, that is). Note that the generator function will run until the yield – not beyond that call. It will only continue to run when the caller asks for the next value.
Running this code produces the alphabet. Duh…
The generator does not make any promises regarding the speed with which results are returned. It may be the case that producing the next result takes a while.
The next code fragment adds logging with timestamps, and adds a deliberate delay of 500 ms at the beginning of an iteration of the for loop. Right after getting the next value from the iterator (aka the generator function), a sleep of half a second is performed. This shows up in the logging as a gap between the yield action and the corresponding int-to-character translation and printing to the output. It should be clear that there is also a gap of about 500 ms between the iterations inside the generator function – caused by the lazy retrieval of values from the generator.
const sleep = (milliseconds) => { return new Promise(resolve => setTimeout(resolve, milliseconds)) } const lg = (msg) => { const d = new Date() console.log(`${d.getSeconds()}:${d.getMilliseconds()} - ${msg}`) } function* alphabet() { var n = 0 while (n < 26) { lg(`.. yield ${n}`) yield String.fromCharCode(97 + n++); } } const doSomething = async () => { //print the alphabet for (let ch of alphabet()) { await sleep(500) lg(ch) }// for of alphabet() } doSomething()
Note how the combination of the await keyword in the async function doSomething with the sleep function that returns a Promise with a delayed result because of the setTimeOut allows me to have a sleep construct in a single threaded runtime. See for example this article for some background: https://flaviocopes.com/javascript-sleep/
If the action performed by the generator function to produce a next value takes considerable time, it may make sense to start producing the next value (asynchronously) just prior to yielding the current value. That means when the consumer asks for the next value, some (and perhaps all) work already will have been done to produce it.
In the next code snippet, the delay is inside the generator function – a very common situation. The generator function may have to read from a database or invoke an external API and constructing the result may take some time. In this code, the delay is simply again the sleep function. However, there are some very relevant changes compared to the previous code snippet – changes that have only just been enabled in ES 2018.
The generator function is now async – and it has to be because it invokes the asynchronous sleep function. As a consequence, the await keyword has been added to the for..of loop that retrieves values from the now async generator function. These two changes were not supported until recently. See this article for more background on ES2018 Asynchronous Iteration: https://medium.com/information-and-technology/ecmascript-2018-asynchronous-iteration-ec0b6a3a294a
What we can see is that the logging for yield and the corresponding alphabet character are virtually at the same time. The time gap is now between the logging of the letter and the next yield action in the generator- as it will frequently be generators that call out to external sources or otherwise have a hard time preparing their values. Fortunately, we can now have the generators yield values one by one – and start working on the yielded values long before all of them have been prepared.
Pipelining
An example of pipelining generator functions together is shown here: a text fragment is reduced to individual sentences that are then reduced to words. The occurrences of words are counted. And yes, map and reduce operations on arrays can do a similar job. Here I only use this example to demonstrate the syntax of the generator function and the yield action.
Suppose the action performed by function sentences() would be expensive, performance wise, then at least we can count on (some not all) earlier results when using the generator approach compared to first preparing the full result set from sentences() before doing any work on these results.
const lg = (msg) => { const d = new Date() console.log(`${d.getSeconds()}:${d.getMilliseconds()} - ${msg}`) } function* sentences(text) { const s = text.match( /[^\.!\?]+[\.!\?]+|[^\.!\?]+$/g ); for (let sentence of s) yield sentence; }//sentences function* words(sentence) { const w = sentence.split(" "); for (let word of w) yield word; }//words var txt="One potato is growing in the field. Two potatoes are peeled by my mother in the field. Three potatoes are swimming in gravy on my plate."; var dictionary ={} for (sentence of sentences(txt)) for (word of words(sentence)) { dictionary[word]= dictionary[word]?dictionary[word]+1:1 } lg(JSON.stringify(dictionary))
here is the not the very useful output
Note: this code would work – and words would be found and added to the dictionary even if the sentences generator would never actually complete – but would continue to produce a stream of sentences, like a constant Tweet consumer.
This article describes how running aggregates (such as moving averages) can be produces on endless streams of data, using aysnchronous iterators: https://dev.to/nestedsoftware/asynchronous-generators-and-pipelines-in-javascript–1h62 This is very powerful. And now relatively simple to implement in ES2018 (JavaScript, Node JS applications).
Resources
ES6 Generators and Async/Await https://medium.com/altcampus/es6-generators-and-async-await-e58e35e6834a
Asynchronous Generators and Pipelines in JavaScript https://dev.to/nestedsoftware/asynchronous-generators-and-pipelines-in-javascript–1h62