Friday, December 19, 2014

Using filtered SQL queries for building big data dashboards


    Dashbuilder is a tool I like to describe as a “micro” BI. It lets the user create dashboards and showcase their data using pie, bar or line charts as well as display data in a tabular form. Data could be loaded from plain text like CSV files or query from a database connection. When data is small enough, Dashbuilder can handle pretty well the whole set in memory as far as it doesn't exceed the 2MB size limit. However, most of the time, our data sets are bigger and we can't upload all the data for Dashbuilder to handle it by its own. Is in these cases where database backed queries can help us to implement nice drill down reports and charts without preloading all the data. 

   Let's take as an example a very simple stock exchange dashboard  which is fed from two database tables:


    The dashboard displays some indicators about several companies from several countries selling their shares at a given price on every day closing date. The dashboard displays 4 KPIs  (Key Performance Indicators) as you can see in the following screenshot: 



    All the indicators are displaying data coming from the two database tables defined above.


  • Bar chart - Average price per company
  • Area chart - Sales price evolution
  • Pie chart - Companies per country
  • Table report - Stock prices at closing date 

   At the end of this article [1] you'll find detailed instructions about how to download and install this example dashboard. What we're going to start discussing next is the two strategies we can use for building a dashboard. This is an important aspect to consider, specially if we're facing big data scenarios.


The in-memory strategy


   This strategy consists in creating a data provider which load all the data set rows by executing a single SQL query over the two tables.  

   SELECT C.NAME, C.COUNTRY, S.PRICE_PER_SHARE, S.CLOSING_DATE
  FROM COMPANY C JOIN STOCK S ON (C.ID=S.ID_COMPANY)

  Every single indicator on the dashboard will consume the same data set. When filters are executed from the UI no further SQLs are executed since all the calculations are done over the data set in memory. The following video shows a browser window and a bash console showing that only a single SQL is executed when the dashboard is initialized.





  Pros:
  • Data retrieval logic keeps very simple
  • Only a single data provider is needed
  • Faster configuration of  KPIs since all the data set properties are available at design time
  • Multiple indicators from a single data provider
   Cons:
  • Can't be applied on medium/large data sets due to poor performance


The native strategy


  The native approach consists in having a data provider for every indicator in the dashboard. instead of loading an handling all the data set in memory. Every KPI is told what data has to display. The next video  shows a full SQL based version of the sales stock dashboard. As you can see, every time the user filters on the dashboard, some SQL queries are executed. No data is hold in memory, the dashboard is always asking the DB for the data.



  As you can see, on every filter request the SQLs are parsed, injected with the filter values and re-executed. The SQL data providers are the following:

  Bar chart - Average price per company

    SELECT C.NAME, AVG(S.PRICE_PER_SHARE)
  FROM COMPANY C JOIN STOCK S ON (C.ID=S.ID_COMPANY)
  WHERE {sql_condition, optional, c.country, country}
  AND {sql_condition, optional, c.name, name}
  GROUP BY C.NAME

  Area chart - Sales price evolution

  SELECT S.CLOSING_DATE, AVG(S.PRICE_PER_SHARE)
  FROM COMPANY C JOIN STOCK S ON (C.ID=S.ID_COMPANY)
  WHERE {sql_condition, optional, c.country, country}
  AND {sql_condition, optional, c.name, name}
  GROUP BY CLOSING_DATE

  Pie chart - Companies per country

  SELECT COUNTRY, COUNT(ID)
  FROM COMPANY
  WHERE {sql_condition, optional, country, country}
  AND {sql_condition, optional, name, name}
  GROUP BY COUNTRY

  Table report

  SELECT C.NAME, C.COUNTRY, S.PRICE_PER_SHARE, S.CLOSING_DATE
  FROM COMPANY C JOIN STOCK S ON (C.ID=S.ID_COMPANY)
  WHERE {sql_condition, optional, c.country, country}
  AND {sql_condition, optional, c.name, name}


   As you can see every KPI is delegating the filter & group by operations to the database. The filter magic happens thanks to the {sql_condition} statements. Every time a filter occurs in the UI the dashbuilder core gets all the SQL data providers referenced by the KPIs and it parses/injects into those SQLs the current filter selections made by the user. The signature of the sql_condition clause is the following:

    {sql_condition, [optional | required], [db column], [filter property]}  where:

  • optional: if no filter exists for the given property then the condition is ignored. 
  • required: if no filter is present  then the SQL returns no data.
  • db column: the db column where the current filter is applied.
  • filter property: the UI property which selected values are taken. 

 Pros:
  • Support for high volumes of data. The database tables need to be properly indexed though.
 Cons:
  • The set up of the data providers is a little bit more tricky  as it requires to create SQL queries with the required filter, group by and sort operations for every KPI.

   When designing a dashboard never forget of thinking thoroughly about the origin, type and the volume of the data we want to display in order to go for the right strategy.


-----------------------------------------------------------------------------------------------------------

  
[1]    These are the steps to download an deploy the Stock Trade sample dashboard: 
  1. Download  & deploy the Dashbuilder webapp on your favorite app server - Installation instructions.  
          (You can also build from sources)
  1. Extract the contents of the following zip file into  dashbuilder.war/WEB-INF/deployments folder
  2. Create the stock trade database tables. Use or adapt  the H2 script file stocktrade-h2.sql provided.
  3. Start the app. The dashboard should be automatically deployed.

Wednesday, August 6, 2014

Dashbuilder 6.1.0 released

   It's been 7 months after the latest release of Dashbuilder. During this time we've spent most of the time testing the application on other platforms such as WebSphere or WildFly as well as adding several bug fixes. As a result, the current 6.1 is much more stable and offers wider compatibility with other platforms.

   New features added:
  • Support for the WildFly 8.x and WebSphere 8.x application servers
  • Ability to embed KPIs into third-party applications  (as we advanced a few months ago in this  blog entry)
  The links to the release artifacts can be found on the project website http://dashbuilder.org     

Wednesday, July 23, 2014

New tabular reports component

   As we mentioned in a previous post (Rich interactive dashboards in uberfire), the data viewer layer is not tied to just one type of visualization technology, but instead supports pluggable renderers. This means that, if so desired, each of the components of a specific dashboard can be configured with it's own specific rendering technology, independently of the renderer its fellows components might use.
On the other hand it's also possible to have only the tables report components, for example, to use a specific table rendering technology, while all the others use the default renderer (which, for now, is the Google Charts Visualization library). This is what we'll illustrate in this blog entry.

   Recently we've added a new table visualization component, which in a foreseeable near future will become the default rendering technology for table reports, replacing the previously mentioned Google library for this type of displayers.

   So, how do we tell the framework that it should visualize a table using this new renderer? Let's go back to an example from the previous post to illustrate this:

The sales pipeline dashboard.

   As you can see the 'Sales pipeline' dashboard that is shown in the above image consists of a line chart, a few pie charts, a bar chart, and finally, a table report. The latter is being visualized using the (still) default Google visualization library. Let's check out how the table report of this dashboard is setup:
The table report setup

   As you can see, this is fairly straightforward; through a sequence of API calls we tell the framework to configure a table renderer component, with its title, the default ordering column and sense, the page size, the columns it should display, and the data set it should work with.

   This setup displays the table as it can be seen in the first image, with the default google visualization library. Let's now change this to our new table component. For this to happen we need to adapt the above configuration as follows:

Set a specific renderer

   So, it's as simple as adding a call to 'renderer( <rendererID> )', to the component's setup to get the job done and that's that! The result is shown in the image below: 

New table component renderer.
   As you can see, the table report is now visualized with our new table component. It offers several improvements over the google table renderer:

  • it's based on a standard widget-set, so no dependency on third party (possibly closed-source) libraries,
  • the possibility to hide columns through a column selector pop up,
  • the ability to manually adjust the column widths,
  • the ability for the table to emit filtering events that occur within it (e.g. someone selects a specific country or product), so that other components can adapt their content accordingly (see also the entry on filtering: An introduction to displayer filtering )

Below we've included a small screen cast to illustrate some of these features:

Tuesday, July 22, 2014

An introduction to displayer filtering

   One of the most interesting features of interactive dashboards is the fact that they consist of data visualization components that can be made responsive to events that happen within some other data displayer component, which is usually being shown in the same dashboard. Equally important is that data visualization components can be told to notify to others that some kind of 'event' has happened within it, e.g. a data filter event in a pie chart, or some value that was selected inside a table report cell.

   So how is this achieved in dashbuilder?

   Check out the following example of what could be a sales dashboard:

An example sales dashboard

   We have a meter chart, a bubble chart, a line chart and a couple of bar charts in there, representing some possible sales indicators. Now imagine that a sales manager wants to check the correlation between all of these for a specific country (bubble chart), or for a specific product or employee (bar charts). If these charts were simply static visualizations, he/she wouldn't be able to do that, rendering the dashboard practically useless for detailed data interpretation.

  So, the importance of being able to configure the charts, whatever their type, and where necessary (it might not always be desirable nor useful), so that they can respond to external and / or 'self' events (i.e. filtering actions), and also be able to notify other charts of those is beyond doubt.

   As you might already know, all data visualization components in dashbuilder can be configured through a very straightforward and easy-to-use API. Take the bubble chart in the above dashboard, for instance, which is setup as follows:

The bubble chart setup

   Pay attention to the call to the 'filterOn( ... )' method, surrounded by the red rectangle, this is where all the filtering 'magic' happens. Let's dig a little deeper into what this method offers.
As you can see it accepts three boolean parameters:

  • The first one will configure the data displayer so that it will respond to the filtering events it generates itself, i.e. the event generated by the displayer will affect its own content. Say for a second we were to have this parameter to 'true' in the 'By product' bar chart configuration. If we were then to click on 'Product 11' for example, the result would be the following:


An 'auto'-filtering bar chart
   We observe that only the two bars corresponding to 'Product 11' have remained present in the 'By product' graph. It's up to the dashboard designer to determine if this effect is what is convenient for the type of dashboard and for the type of graph in its specific context. In this case we might wonder if, functionally speaking, it makes sense to have a bar chart auto-filter itself, but please note that the auto-filtering option also plays an important role in the ability to apply 'drill-down' to data displayers. However, this  will be the topic of a future post.

   Also note that when we clicked on the 'Product 11' bar, all the other graphs, with the exception of the meter chart, automatically adjusted their content to the newly set filter. This is because of the
  • Second filterOn parameter, which will configure the data displayer so that it will notify the filtering event that has occurred within it to others. Others? That is, every other displayer that has registered itself to listen to the events within a certain context (basically, although not strictly, a dashboard). This is achieved by coordinating displayers through a 'DisplayerCoordinator' component, as illustrated below:

    Coordinate displayers amongst themselves
    Every component that registers with this coordinator will be notified of the events that occur within the other components in the same coordinator instance, but only if its
  • Third filterOn() parameter, which tells a displayer to respond or not to the events that are notified throughout a specific dashboard, is set to true. For the sake of this example, this parameter was set to 'false' for the meter chart and, as you can see from the screenshot where the bar chart auto filter was demonstrated, the meter chart has not changed its content.
That's it for our brief introduction to displayer filtering, in a future post we will broaden this subject and also introduce the ability to 'drill down' in the data represented by a displayer. Stay tuned!

Wednesday, June 25, 2014

Rich Interactive Dashboards in Uberfire

    Uberfire is one of the latest & coolest projects within JBoss middleware. The Uberfire project was kicked off about two years ago. Despite it's still on the release 0.5  I think is mature enough to start bringing some of the Dashbuilder features to it. Uberfire leverages the GWT & Errai technologies and provides a rich framework to develop desktop-like rich internet applications.

 The next video (don't forget to select HD) shows a live showcase with some interactive sample dashboards built using the Dashbuilder client API.  All the dashboards are fed from a dynamically generated client data set. So there is no interaction with the backend. All the operations (group, filter, sort) are performed on the client side. The Dashbuilder architecture allows for handling data sets on both client & server though. For demo purposes we have gone for the client approach as it is easier to implement.

   In next blog entries we will walk through the architecture internals as well as the APIs. Today, I'm just going to give an overview of how these sample dashboards have been built.




   The four dashboards (see screenshots below) are fed from a dynamically generated data set (the sales opportunities data set - SalesDataSetGenerator.java),  which is a Bean type data set generator that registers a data set at Showcase initialization time.  The dashboards themselves are just GWT UI Binder widgets composed by a bunch of data viewer instances spread on the same page layout.

Sales goals
Sales goal dashboard
Sales pipeline
Sales distribution by country
Sales table reports

   Let's take for instance the first dashboard. The UI Binder template (SalesGoals.ui.xml) is used to lay out all the charts properly.  Each chart is assigned a unique identifier used within the dashboard  (SalesGoals.java).

Dashboard UI binder template

   The chart definition is composed by two parts, both defined using a very simple but powerful API.
  • A data set reference the chart is going to display. Here we can specify a plain data set or a data set lookup which is just a sequence of operations (filter, sort, group) performed over an existing data set.
  • A display configuration or how we want to visualize the data. There are several types of visualizations - pie, bar, line, area, meter, map bubble charts, tables, hierarchical trees, etc... The display settings will depend on the type of visualization chosen.
Data Displayer API sample

    As I said, we'll dive into the API internals in next blog entries. For now, just taking a look at the examples is the quickest way to figure out how this API works.

    One last note worth to mention is that, right now, the default rendering technology used is the Google Charts Visualization library. We went for Google because it was the quickest road to follow in order to get an early prototype. Nevertheless, the data viewer layer is not tied to any specific rendering technology as it supports the concept of pluggable renderers. So, in the future, it'll be possible to have several renderers available and have every chart decide which particular one to use.
 
   The project is hosted on Github. Despite it's in a very early stage of development we're pushing hard to bring many new features in the next few months:
  • Data visualization editors
  • Drag&drop dashboard composer
  • Renderer based on the D3 library
  • Support for real-time dashboards
  • RESTful API for remote dashboard interaction

That's all. I encourage you to clone the project, build, run it and play with the examples.

Enjoy it!


Tuesday, March 11, 2014

Embed a KPI chart into your web app

   Recently, a new cool feature has been added to the Dashbuilder echosystem: the ability to embed dashboard charts into your own web app via a lightweight easy to use JavaScript API.

   As you may know, Dashbuilder allows for the creation of custom feature complete dashboards. Now, once a dashboard is built, it's possible to pick up any KPI and paste it into your web application. 

   To see how this mechanism works, two out-of-the-box examples have been added to the tooling. To test them just type the following URL on a fresh Dashbuilder installation:

KPI embed sample


Report embed sample
   If you want to figure out how to use this JS lib just take a look at the HTML files provided. Both are displaying charts belonging to the pre-installed dashboards that come along with the tooling. 
  Notice, this brand new feature is only available in the latest sources in GitHub (soon in the forthcoming 6.1.0 Final release). So if you wanna play with it you'll need to build the release from sources, as described here.  

   In the near future, an additional RESTful API will be provided so that third-party applications can not only embed KPI mashups but also to make calls to the Dashbuilder data engine. REST support will empower Dashbuilder integration capabilities with remote apps, making possible for developers to retrieve/push data from dashboards. Some real world uses cases could be the following:
  • Alert management  based on KPI threasholds.
  • UI development reading the data coming from a given KPI.
  • Push data from an external app: ERP, CRM or the like, to feed a dashboard.
This and many others interesting features will be delivered throughout the next months. 

Stay tuned! 

Monday, February 24, 2014

The Dashbuilder demo gets Docker-ized

   During the last week I've been playing a little with Docker. Docker is an open-source virtualization technology that automates the deployment of any application as a lightweight, portable, self-sufficient container that will run virtually anywhere.  For those interested in getting all the details  I recommend watching this excellent presentation given by the Docker's founder Solomon Hykes.

   In a nutshell, what Docker provides is a platform for:
  • Creating custom images. an image could be an application, a custom Linux distro, a service or whatever packaged Linux software you can imagine. An image it's basically composed by a base Linux distro plus some software installed and configured on top of it. 
  • Sharing a custom image with others via the Docker's central repository - https://index.docker.io. You can push your own images, search for images and pull images created by others.
  • A daemon service (accessible via a REST API) plus a command utility package for creating containers and managing its life cycle: list the available containers, start/stop a container instance, etc.
   What I've done in Dashbuilder is to create a Docker image containing the latest demo of the project and push that image to the Docker shared repository. 


Installation

   Before going further, you need to install the Docker package on your Linux host. At the time of this writing Docker is only available for Linux based machines. Windows and Mac OSX support is still under development. Please read the Docker package installation.

Once you have Docker installed you can carry on with the demo installation as follows:
  1. Pull the image from the central repo:
    sudo docker pull dgutierr/dashbdemo 
    
  2. Run the image:
    sudo docker run -p 8080:8080 dgutierr/dashbdemo 
    
  3. Open the app:
    http://localhost:8080/dashbuilder  
    Users: root/root, demo/demo
    
      If you actually don't know how to interact with the app, please take a look at the Quick Start Guide.

   Enjoy!   


Image internals

   Our Docker image is based on the mattdm/fedora image, which is basically a Fedora 20 distro, plus the following software installed:
  • Open JDK 1.7.0,
  • Git client, and 
  • A git clone of the Dashbuilder demo repository.

Wednesday, January 29, 2014

First steps to create your dashboards

One of the most typical situations with Dashbuilder is that you already have some existing data you want to create dashboards from.


You can add new dashboards by configuration with Dashbuilder's composition tool, by following these steps:

Step 01 - You must first carefully think what data needs to be displayed to users and which is the desired format. It’s both important to have the right data and to display it in simple and concise way. Dashboards must be simple to use and to understand, focusing on displaying only relevant information.

At this stage, you must also take into account performance and volume concerns.

Step 02 - Make sure the data is available through an standard protocol and a known format, suitable to be queried. Typically, a database or an structured file would be enough. If this is not available in this format, you can always create a new connector or publish it through any custom integration.

Step 03 - Setup connectivity parameters. By using the tool with the right privileges, you can define connectors parameters (i.e. JDBC connections) and data providers configuration (i.e. SQL queries)

Step 04 -  Configure one or several workspaces with as many pages as needed. Set up navigation across them, for example, by using the provided menus. It’s important to realize every page or workspace will have its own access URL, that can be linked from any other application.

Step 05 - Add indicators (charts) and reports to pages, and configure them. By using the KPI indicator editor, you set them to display the right information at the right place.

Step 06 - Configure filter and drill down capabilities. This component controls how navigation is performed when users click on the charts, by filtering the data by the right property value.

Step 07 - Set the right permissions for each level of information. You can define which elements of the dashboards should be visible or even editable for a different range of users or roles.

Step 08 - Customize look'n' feel. Sometimes, a custom look and feel is required. There are different tools to modify the standard look, by providing CSS and graphical resources, as well as custom HTML content.

Step 09 - Test your dashboards for each role.

Step 10 - Gather user feedback and improve your dashboards based on it.

You can find more information about how to implement each of these steps at the project's documentation page http://dashbuilder.org/documentation.html

Tuesday, January 28, 2014

Process monitoring with jBPM integration

Dashbuilder is packaged and automatically ready to use with jBPM 6 workbench (https://www.jboss.org/jbpm). By doing so, you will get several process and task related dashboards, that are connected to jBPM history log.

They will provide, for example, an overview of the running and completed processes and tasks. Different sort of reports and indicators will become available for BPM monitoring, including time related (i.e. average task durations) or instance related (i.e amount of active processes).

You can find the instructions to install this flavour of Dashbuilder at http://docs.jboss.org/jbpm/v6.0/userguide/jBPMGettingStarted.html

Read more about this integration at http://docs.jboss.org/jbpm/v6.0/userguide/chap-bam.html

Dashbuilder 6.0.1 released

A new version of Dashbuilder has been release along with the Drools & jBPM platform.
This version contains only some bug fixes. You can download it from here , and as always, you are more than welcome to provide feedback about it.
We will continue working on the new generation of Dashbuilder, that will greatly improve its integration and reporting capabilities.