sábado, 3 de noviembre de 2012

Getting Started with Web Workers

One of the many design goals of the JavaScript language was to keep it single-threaded and, by extension, simple. Though I must admit that, given the idiosyncrasies of the language constructs, it is anything but simple! But what we mean by being “single-threaded” is that there is only one thread of control in JavaScript; yes, sadly, your JavaScript engine can do only one thing at a time.

Now, doesn’t that sound too restrictive to make use of multi-core processors lying idle on your machine? HTML5 promises to change all of that.


JavaScript’s Single Threaded Model

Web Workers live in a restricted world with No DOM access, as DOM is not thread-safe.

One school of thought considers JavaScript’s single-threaded nature as a simplification, but the other dismisses it as a limitation. The latter group has a very good point, especially when modern web applications make heavy use of JavaScript for handling UI events, querying or polling server-side APIs, processing large amounts of data, and manipulating the DOM based on the server’s response.

To be able to do so much in a single thread of control while maintaining a responsive UI is often a daunting task, and it forces developers to resort to hacks and workarounds (such as using setTimeout(), setInterval(), or using XMLHttpRequest and DOM events) to achieve concurrency. However, it’s worth noting that these techniques definitely provide a way to make asynchronous calls, but non-blocking doesn’t necessarily mean concurrent. John Resig explains why you can’t run anything in parallel on his blog.

The Limitations

If you have worked with JavaScript for a reasonable amount of time, it is highly probable that you have encountered the following annoying dialog box stating that some script is taking too long to execute. Yes, almost every time your page stops responding, the reason can be attributed to some JavaScript code.

Damn this slow machine !!

Here are some of the reasons why your browser just might hang up its boots while executing your script:

  • Excessive DOM Manipulation: DOM manipulation is perhaps the costliest operation that you can do with JavaScript. Consequently, a lot of DOM manipulation operation makes your script a good candidate for refactoring.
  • Never-ending Loops: It never hurts to scan your code for complex nested loops. These tend to do much more work than what actually is needed. Perhaps you can find a different solution that provides the same functionality.
  • Combining the two: The worst we can do is repeatedly update the DOM within a loop when more elegant solutions, such as using a DocumentFragment, exist.

Web Workers to the Rescue

…non-blocking doesn’t necessarily mean concurrent…

Thanks to HTML5 and Web Workers, you can now spawn a new thread–providing true asynchrony. The new worker can run in the background while the main thread processes UI events, even if the worker thread is busy processing a heavy amount of data. For example, a worker could process a large JSON structure to extract valuable information to display in the UI. But enough of my blabbering; let’s see some code in action.

Creating a Worker

Normally, the code pertaining to a web worker resides in a separate JavaScript file. The parent thread creates a new worker by specifying the script file’s URI in the Worker constructor, which asynchronously loads and executes the JavaScript file.

  var primeWorker = new Worker('prime.js');  

Start a Worker

To start a worker, the parent thread posts a message to the worker, like this:

  var current = $('#prime').attr('value');  primeWorker.postMessage(current);  

The parent page can communicate with workers using the postMessage API, which is also used for cross-origin messaging. Apart from sending primitive data types to the worker, the postMessage API also supports passing JSON structures. You cannot, however, pass functions because they may contain references to the underlying DOM.

The parent and worker threads have their own separate space; messages passed to and fro are copied rather than shared.

Behind the scenes, these messages are serialized at the worker and then de-serialized at the receiving end. For this reason, it is discouraged to send huge amounts of data to the worker.

The parent thread can also register a callback to listen for any messages that the worker posts back after performing its task. This allows the parent thread to take necessary action (like updating the DOM) after the worker has played its part. Take at look at this code:

  primeWorker.addEventListener('message', function(event){      console.log('Receiving from Worker: '+event.data);      $('#prime').html( event.data );  });  

The event object contains two important properties:

  • target: used to identify the worker who sent the message; primarily useful in a multiple worker environment.
  • data: the message posted by the worker back to its parent thread.

The worker itself is contained in prime.js and registers for the message event, which it receives from its parent. It also uses the same postMessage API to communicate with the parent thread.

  self.addEventListener('message',  function(event){      var currPrime = event.data, nextPrime;      setInterval( function(){      nextPrime = getNextPrime(currPrime);      postMessage(nextPrime);      currPrime = nextPrime;      }, 500);  });  

Web workers live in a restricted and thread-safe environment.

In this example, we simply find the next highest prime number and repeatedly post the results back to the parent thread, which in turn updates the UI with the new value. In context of a worker, both self and this refer to the global scope. The worker can either add an event listener for the message event, or it can define the onmessage handler to listen for any messages sent by the parent thread.

The task of finding the next prime number is obviously not the ideal use-case for a worker, but has been chosen here to demonstrate the concept of passing messages. Later, we do explore possible and practical use-cases where using a Web Worker would really reap benefits.

Terminating Workers

Workers are resource-intensive; they are OS-level threads. Therefore, you do no want to create a large number of worker threads, and you should terminate the web worker after it completes its work. Workers can terminate themselves, like this:

  self.close();  

Or a parent thread can terminate a worker:

  primeWorker.terminate();  

Security and Restrictions

Inside a worker script, we do not have access to the many important JavaScript objects like document, window, console, parent and most importantly no access to the DOM. Having no DOM access and not being able to update the page does sound too restrictive, but its an important security design decision. Just imagine the havoc it could cause if multiple threads try to update the same element. Thus, web workers live in a restricted and thread-safe environment.

Having said that, you can still use workers for processing data and returning the result back to the main thread, which can then update the DOM. Although they are denied access to some pretty important JavaScript objects, workers are allowed to use some functions like setTimeout()/clearTimeout(), setInterval()/clearInterval(), navigator, etc. You can also use the XMLHttpRequest and localStorage objects inside the worker.

Same Origin Restrictions

In context of a worker, both self and this refer to the global scope.

In order to communicate with a server, workers must follow the same-origin policy. For example, a script hosted on http://www.example.com/ cannot access a script on https://www.example.com/. Even though the host names are the same, the same-original policy states that the protocol must be the same as well. Normally, this is not a problem. It is highly probable that you are writing both the worker, client, and serve them from the same domain, but knowing the restriction is always useful.

Local Access Issues with Google Chrome

Google Chrome places restrictions on accessing the workers locally, hence you won’t be able to run these examples on a local setup. If you want to use Chrome, then you must either host these files on some server or use the --allow-file-access-from-files flag when starting Chrome from the command line. For OS X, start chrome as follows:

  $ /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --allow-file-access-from-files  

However, using this flag is not recommended in a production environment. Thus, your best bet is to host these files on a web server and to test your web workers in any supported browser.

Debugging Workers and Error Handing

Not having access to console makes this somewhat non-trivial, but thanks to Chrome Developer Tools, one can debug the worker code as if it were any other JavaScript code.

Damn this slow machine !!

To handle any errors thrown by web workers, you can listen for the error event, which populates an ErrorEvent object. You can inspect this object to know the detailed cause of the error.

  primeWorker.addEventListener('error', function(error){      console.log(' Error Caused by worker: '+error.filename          + ' at line number: '+error.lineno          + ' Detailed Message: '+error.message);  });  

Multiple Worker Threads

Though it is common to have multiple worker threads dividing the work between themselves, a word of caution is in order. The official spec specifies that these workers are relatively heavy-weight and are expected to be long-lived scripts running in the background. Web workers are not intended to be used in large numbers because of their high start-up performance cost and a high per-instance memory cost.

Brief Intro to Shared Workers

The spec outlines two types of workers: dedicated and shared. So far, we have seen examples of dedicated workers. They are directly linked to their creator script/page in the sense that they have a one to one relationship with the script/page that created them. Shared workers, on the other hand, can be shared among all the pages from an origin (ie: all pages or scripts on the same origin can communicate with a shared worker).

To create a shared worker, simply pass the URL of the script or the worker’s name to the SharedWorker constructor.

The major difference in the way shared workers are used is that they are associated with a port to keep track of the parent script accessing them.

The following code snippet creates a shared worker, registers a callback for listening to any messages posted by the worker, and posts a message to the shared worker:

  var sharedWorker = new SharedWorker('findPrime.js');  sharedWorker.port.onmessage = function(event){      ...  }  sharedWorker.port.postMessage('data you want to send');  

Similarly, a worker can listen for the connect event, which is received when a new client tries to connect to the worker and then posts a message to it accordingly.

  onconnect = function(event) {      // event.source contains the reference to the client's port      var clientPort = event.source;      // listen for any messages send my this client      clientPort.onmessage = function(event) {          // event.data contains the message send by client          var data = event.data;          ....          // Post Data after processing          clientPort.postMessage('processed data');      }  };  

Because of their shared nature, you can maintain the same state in different tabs of the same application, as both the pages in different tabs use the same shared worker script to maintain and report the state. For more details on shared workers, I encourage you to read the spec.


Practical Use-cases

Web workers are not intended to be used in large numbers because of their high start-up performance cost and a high per-instance memory cost.

A real-life scenario might be when you’re forced to deal with a synchronous third-party API that forces the main thread to wait for a result before proceeding to the next statement. In such a case, you can delegate this task to a newly spawned worker to leverage the asynchronous capability to your benefit.

Web workers also excel in polling situations where you continuously poll a destination in the background and post message to the main thread when some new data arrives.

You may also need to process a huge amount of data returned by the server. Traditionally, processing a lot of data negatively impacts the application’s responsiveness, thereby making the user experience unacceptable. A more elegant solution would divide the processing work among several workers to process non-overlapping portions of the data.

Other use cases could be analyzing video or audio sources with the help of multiple web workers, each working on a predefined part of the problem.


Conclusion

Imagine the power associated with multiple threads in an otherwise single threaded environment.

As with many things in the HTML5 spec, the web worker spec continues to evolve. If you plan to web workers, it won’t hurt to give the spec a look.

The cross-browser support is fairly good for dedicated workers with current versions of Chrome, Safari, and Firefox. Even IE does not lag too far behind with IE10 taking the charge. However, shared workers are only supported on current versions of Chrome and Safari. Surprisingly, the latest version of the Android browser available in Android 4.0 does not support web workers, although they were supported in version 2.1. Apple also included web worker support starting with iOS 5.0.

Imagine the power associated with multiple threads in an otherwise single threaded environment. The possibilities are endless!



No hay comentarios:

Publicar un comentario