Science is, loosely speaking, about acquiring and sharing (objective) knowledge. Although many definitions of science do not emphasize sharing of knowledge, it is clearly a useful and widely accepted part of doing science. In fact, in most cases publishing your results in some form or other is more or less mandatory if you want to actually get paid. Of late, even just publishing your results is not always enough. More and more, open access publishing is becoming the norm. Your research should not just be accessible in theory, it should be accessible for anyone, anywhere and at any time.
Huge numbers of papers are published each year, distributed over many, many proceedings and journals. However, these papers miss part of the picture. These days, a huge portion of the findings in such papers are made possible by software. This software is, however, conspicuously absent in most cases, which makes it virtually impossible to reproduce the results (also see doi:10.1038/467753a). If software is made available at all, it is almost always through the personal webpage of one of the authors. Typically it will be written in the author's favourite language, using the author's favourite framework. The set of supported operating systems tends to match the set of operating systems the author used at the time of creating the software.
Let's say that the software behind a paper /has/ been made accessible. In (too) many cases you then have to track down (the right versions of) dependencies, make sure they all work on your platform, make sure the software itself runs on your platform, find the right data to feed the software, in the right format, etc., etc. In the, by no means certain, event that you manage to get the software to run the way you want it to, you can then finally try it out, only to find (in many cases) that it will not quite serve your purpose. Next please!
In many cases you just want to be able to reproduce the results in the paper and try the software on some of your own data. Then, if it looks like it might be able to do what you want, or could be extended to do what you want, you simply want to get on with using the software for your own purposes, adapting it as needed. In essence, every minute spend on just getting the software to run is a minute that could have been spent on something more worthwhile. We must thus do better!
This leads us to web-based scientific software. Software based on web standards can run on virtually any device (provided the software on it is not too old, but many technologies are now quite mature). Furthermore, since they are widely recognized standards (de facto or formalized), they provide a very solid base for making software that can be easily shared, can still be used in years or even decades, and can easily interoperate with other software. I will give an overview of what has been tried, show examples of how web graphics can be used, discuss issues I have come across, and give an idea of what could be done in the future.
There are some server-based attempts (the Image Processing On Line (IPOL) journal and Runmycode for example), and in many cases one could write an application in such a way that it can be run through a web-based interface. However, this relies on someone supplying a server to run such applications, and does nothing in terms of making it easier for the user to run and adapt the software himself. Not to mention the issues with having to upload possibly sensitive data to someone else's server. Also, such methods are typically not well-suited to interactive applications (although it is possible to make an application available through a VNC-like environment, but this seems like a needlessly complex and baroque solution).
In some cases, people have actually gone so far as to simply share their entire systems as virtual machines (SHARE for example). Although in a sense this is the ultimate way to ensure that someone else is able to run and use your code, somehow it does feel a little over the top. Also, although another user is saved the trouble of setting up the build system and such, there is nothing easy and convenient about having to work with a program through a virtual machine. And it definitely does not make it easier to use different programs together (just imagine having to run multiple VMs and somehow coupling them). So perhaps not a bad approach from an archival point of view, but whether it is convenient to use in practice?
An interesting system has been tried out by Elsevier. This is somewhat similar to Image Processing On Line approach, in the sense that the author uploads code which can be (re)run on a server. However, the Elsevier approach is more aimed at integrating the code and data with the actual paper, rather than presenting them separately. Admittedly this is not really what happens in practice, as the code and data is simply shown in a sidebar, rather than having actual interactive content within the paper, but at least one can view results next to the normal paper. Also, the authors do not necessarily need to create a single demo, but can upload multiple snippets of code to accomplish various tasks. More importantly, perhaps, one can, apparently, embed code snippets as well as results in web pages. This system was the result of a contest; the other finalists are also listed on-line.
The cases above form a nice motivation for wanting to make (interactive) scientific software that uses web technologies. Such software would be able to run out-of-the-box on a dizzying array of devices, can in principle be integrated with a narrative (if so desired), and places very few demands on the server (in fact, web applications can often be made to also run off-line).
Here I show a few examples of demos I developed, starting with a "glyph-based tensor field visualization". My work involves so-called "symmetric tensors". In a nutshell, you can think of symmetric tensors as being matrices that describe a function on the unit sphere. So we consider a symmetric matrix `M = ((a,b,c);(b,d,e);(c,e,f))` to describe a function `f` such that for a vector `v = (x,y,z)` we have
`f(v) = v^TMv = ((x),(y),(z)) ((a,b,c),(b,d,e),(c,e,f)) (x,y,z)`.
Clearly scaling the vector `v` simply scales the output (although in this case quadratically), so we typically stick to using vectors of unit length. If we have an image with at each position such a matrix, we call it a tensor field (roughly speaking). If we plot each matrix as a kind of (3D) polar plot, then we get an image like this (interactive demo using WebGL):
Although this data set is quite small and relatively simple, the interactivity of the view makes it much easier to get a feel for it. Traditionally this kind of glyph-based rendering is mostly used on a few slices of the data to avoid clutter (it would of course be possible to add a way to select slices to a view like the above).
Another demo I created applies certain (non-linear) filters in various ways to colour images. The end result uses Canvas to get at the image data and then uses typed arrays during processing (in a web worker).
Using Node.js I was able to easily turn this demo into a command-line version for batch processing, effectively getting the best of both worlds. However, even though I made some effort to package the code as conveniently as possible, the Python version was a lot less accessible than the above web-based version: it required intalling Python, NumPy, SciPy, PIL (causes/caused trouble with Python 3), downloading the required test images, adjusting some paths and then running make (unless you run Windows, then you first need to install Mingw/MSYS and hope it works).
Another way of visualizing tensors (symmetric matrices) is by drawing a line in the direction of the maximum response (the eigenvector associated with the largest eigenvalue). Here the anisotropy in a seismic data set is shown for different depths using a modified d3.js rendering to Canvas (see pull requests mbostock/d3#1568 and d3/d3-geo-projection#4 for the modifications):
Reference (for the tensor data): E. Debayle and Y. Ricard, Seismic observations of large-scale deformation at the bottom of fast-moving plates, Earth. Planet. Sci. Letters, 2013.
The coastlines come from the Natural Earth data set (public domain). The light grey lines are plate boundaries and came with the original data distribution.
Ultimately the above demo should also allow rendering to SVG for export purposes. But, although E. Debayle and Y. Ricard are kind enough to provide the data and code required to generate static plots similar to the one above, even in its limited form the above already has a lot to offer compared with the original distribution of the data and code. If all is well you can interact with a plot of the data above, right now, and you did not even have to explicitly download the data and code. You can immediately get at least some idea of the data, evaluate different viewpoints, select depths that were not shown in the original paper, etc.
Some of the things I would like to be able to do in the future include making renders not unlike these ones of fibre tracts in the brain:
|Taken from Maarten H. Everts, Henk Bekker, Jos B.T.M. Roerdink, and Tobias Isenberg: Depth-Dependent Halos: Illustrative Rendering of Dense Line Data. IEEE Transactions on Visualization and Computer Graphics, 15(6):1299–1306, November/December 2009.|
Although it might be possible to do it using WebGL, it would require some work-arounds, as every line segment is actually an extruded trapezoid (the black parts are flat, the white parts taper off away from the viewer). Ideally geometry shaders would be available.
Some other work we do (or have done) that would be interesting to have on-line includes:
|Touch interfaces for working with simulation data and (large) point data sets.||Edge-bundling of graphs.|
|(Bio-)molecule visualization using different abstraction levels (at the same time).||(Example-based) hatching of 3D models.|
Also, it would be interesting to explore the potential of integrating code with prose, a bit like in this very simple example:
Eventually I would like to see a growth of modularised scientific web applications that can easily be integrated with or presented as additional material to online papers. For example, one could image a scheme by which processing of data is done in web workers and we have various web components for showing data in tables, graphs, maps, etc.
Although a lot is possible already, there are still some limitations to using web technologies. Most of these boil down to performance issues or lack of support for certain (advanced) functionality. Perhaps the most important limitation I came across is that WebGL (1.0) is based on OpenGL ES 2.0, and as a result does not support instanced rendering, depth manipulation in fragment shaders or geometry shaders (along with a long list of similar "modern" OpenGL functionality). Instanced rendering is something that would be particularly useful, as it essentially allows one to avoid a lot of costly data copying and drawing calls (there is a community approved WebGL extension). In fact, it might even pay to simply incorporate it in WebGL and "fake" it on any platforms where it is not supported, as this would at least allow this kind of common task to be done in native code. Depth manipulation in fragment shaders is a little bit more problematic, but there is a draft extension. Interestingly, the commonly used work-around using min/max blending would require an extension that has so far only been proposed.
The filter demo presented few real problems, but it would be have been really nice if Canvas would have had (basic) support for colour management. Although SVG has some support for specifying colours in different colour spaces (notably linearRGB and sRGB), Canvas has no support for colour management to speak of. Another problem I encountered was that in practice images had to be copied to/from the web worker, rather than being transferred by reference (due to an issue with Firefox), adding overhead. (It is probably be possible to figure a way around the issue, but you'd have to be pretty explicit about it.) Another reasonably innocuous, but annoying, problem was that different browsers have different behaviour when it comes to letting a user save a Canvas. In particular, Chrome seems not to allow it. So you either have to explicitly expose this functionality, or you need to convert the output to an image again.
With the Canvas-based demo showing the seismic data set I experienced hardly any problems, except for the minor inconvenience that support for dashing seems to be missing from certain browsers... Also, the fact that it is Canvas based has to do with what was perhaps the most surprising problem I encountered: SVG rendering is way too slow to be useful in these kinds of contexts. To get a better idea of the problem, have a look at these benchmarks:
All benchmarks show a wide performance gap between Canvas and SVG, with Canvas typically being around four times faster than SVG (maximum and minimum factors are 8.9(!) and 1.7, respectively). An interesting anomaly is a benchmark comparing Bézier curve drawing with both Canvas and SVG. Clearly, when drawing (tens of) thousands and thousands of glyphs, lines, etc., this becomes a bit of an issue.
Science is often built on software, so we should not just be making papers with formulas and graphs and such, but we should also give code a place in publishing. However, the normal way of distributing code — whereby one simply makes the code available in an arbitrary programming language, for an arbitrary platform, and making use of an arbitrary number of arbitrary libraries — becomes a bit cumbersome in this context. After all, the code is just a means to an end; scientists are not programmers (at least not primarily). And although there are various efforts facilitating easier code sharing, they all have their downsides. Web technologies can be used to overcome many of these hurdles, with the main downsides being performance (which is already quite good, but still some way from native code) and some limitations to the available features.
We have seen some examples of how web technologies, and in particular web graphics, allow for the creation of interactive online scientific demo's. WebGL, although having a limited feature set, can be considered extremely useful for all sorts of 3D visualizations. Canvas is similarly useful for 2D graphics, and possibly a little more feature complete, although some more colour management features would not go astray, and the lack of support for dashing in certain browsers is a little awkward. SVG has a lot of potential, especially when one also wants to be able to have printed results, but there are still some performance issues with current browsers. It is also important to realize that one can often fairly easily create both a web application and a command-line application by neatly separating the user interface from the rest of the code.