jeudi 2 juillet 2009

Business Intelligence and Geospatial BI opensource softwares

The increasing amount of numeric data makes it difficult to control, to master.
The abundance of formats: excel files, XML, data stored in databases like Oracle, MySQL, PostgreSQL can be constraining.
Human intelligence is not sufficient to solve complex cases where many parameters must be taken into account.

Quoting Wikipedia, "Business Intelligence refers to skills, technologies, applications and practices used to help a business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected information itself".
Note that even if there is Business in this term, BI is not only used in commercial and economic contexts.

Here are some goals of Business Intelligence:
- Breaking the barriers between formats so as to proceed joins, crosses, and building homogeneous infrastructures. We also need good performance in data treatment, its quantity being huge.
- Giving us direct and graphical informations for what we need. These selected informations are usually displayed through graphs, reports and dashboards.
- Helping us in making good decisions. Putting dimensions into data, not only relations, allows instaurating hierarchical relationships between them. It refines our analysis and helps us prioritizing our actions. On Line Analysis Processing reflects this approach.
- Synthetizing. The use of complex algorithms, statistic techniques will uncover patterns or even predict phenomenons that wouldn't have been "macroscopically" detected by a human being. That's what we call data mining.

This schema was taken and translated from, a french reference portal on business performance management. It illustrates the different components of Business Intelligence.

Globally, BI divides itself into two main domains: integration and valorization. Integration is at the top of the BI chain. It consists in collecting and storing data while valorization aims at distributing and exploiting it.

In the opensource world, two integration tools distinguish themselves: Pentaho and Talend.
  • kettle, a component of pentaho, a complete BI suite.
  • Talend is developed by a french company. It was awarded "company-to-watch" by the Intelligent Enterprise Magazine.
Integration tools are also called ETL for "Extract, Transform and Load" :
  • Extract: they read many data sources
  • Transform: they can apply treatments to data, convert them between different formats
  • Load: they include "write" features

The advantage of kettle is that it's part of a complete BI suite. The other modules of pentaho are Mondrian, an OLAP server, Pentaho reports, Pentaho DashBoards, Pentaho Weka for data mining.
Talend provides connectors to many valorization tools like PALO, Jaspersoft or SpagoBI. It integrates itself well in a complete BI environment. The Jaspersoft Suite includes Talend, where it's been renamed JasperETL.

OpenSource BI softwares are still young but they gain more and more popularity amongst big companies.

Geographical data is like any kind of data. To add the geographical dimension to a standard set of data, you would just add a geometry column describing the graphical properties of each row. While you can compare strings between them, proceed mathematical operations on numbers, what you can perform on geometry are intersections, union, splitting, difference,...

Integrating the geographical dimension to the ETL tools raised the interest of
the GIS societies and Community. Geopolitics, geomarketing are some domains in which we would use Spatial OLAP analysises and geographical reports. Also, they would be useful to face some contemporary issues like the understanding of how migrations of population are correlated with climate change.

In the opensource geospatial BI world, we can distinguish two integration softwares.
  • GeoKettle is based on Kettle by Pentaho. It was developed at the canadian Laval university by the team of Dr Badard.
  • Spatial Data Integrator is based on Talend and developed by CamptoCamp, a famous french geospatial company.
The advantage of GeoKettle is that it is part of a complete geospatial BI suite, as Kettle is. The other components of the suite are GeoMondrian, a spatial OLAP server and Spatialytics for navigation into SOLAP data cubes and dashboards.
The complete geospatial BI suite based on Pentaho will be presented at the Foss4G 2009.

Here is a set of operations you can accomplish with a spatial ETL:
  • Transform a complete folder of shapefiles into PostGIS Tables
  • Mass Coordinate Reference System transforming
  • Joining multiple data sources, like a MySQL Table with a geographic File.
  • Geographical Data quality control.
Globally, spatial ETL tools will help you build and maintain a solid spatial data infrastructure very fast and efficiently.

Most of the next posts of this blog will deal with Spatial Data Integrator. I haven't tested GeoKettle but what I can say is that SDI is really friendly to use. Even if SDI is not part of a complete geospatial BI suite, nothing prevents you from using the canadian geospatial valorization tools GeoMondrian and Spatialytics in addition to it.