mardi 6 octobre 2009

[fr] Présentation de Spatial Data Integrator, logiciel d'intégration de données SIG (mais pas que..)

The powerpoint below will be translated in english soon

La réorganisation de l'Etat, notamment la fusion des services Déconcentrés, fait ressortir un large spectre de problématiques liées à la gestion de patrimoines de données. En rapprochant des personnes, des activités, c'est des infrastructures qu'il faut faire converger.

Tout l'enjeu consiste à maîtriser l'accroissement de la quantité de données, à homogénéiser les formats de stockage qui pouvaient être différents d'une structure à l'autre et à normaliser les méthodes de documentation et de traçabilité des données (qui pouvaient se faire via des fiches de méta-données).

La mutualisation des patrimoines de données, et des méthodes, est un élément qui affectera de manière importante l'appréciation que l'équipe de pilotage fera quant à la qualité de la fusion. Elle sera perçue comme stratégique et fera l'objet de beaucoup d'insistance.

Dans ce contexte, et parce que les délais sont courts, les équipes chargées de l'administration et de la valorisation des données doivent faire preuve d'une grande réactivité. La facilité avec laquelle elles pourront répondre au besoin d'unification est néanmoins tributaire des moyens disponibles. Il est donc essentiel qu'elles disposent de solutions clé en main leur permettant d'intervenir efficacement sur le système d'information décisionnel de leur structure selon la démarche projet qu'elles auront adoptée.

Lors de Journées Nationales du Réseau Géomatique qui rassemblaient des acteurs et responsables SIG du Ministère de l'Ecologie, de l'Energie, du Développement durable et de la Mer ainsi que du Ministère de l'Agriculture et de la Pêche, je fus invité à présenter une de ces solutions: Spatial Data Integrator, logiciel d'intégration de données géographiques(...mais pas que).

C'est le diaporama de cette présentation que je vous propose. En voici son articulation:
-Dans un premier temps, l'outil est présenté assez rapidement...
-...pour passer à une démo simple mais néanmoins utile qu'est la gestion des rejets lors de la jointure d'un fichier excel et d'un fichier géographique...
-...puis enfin, 4 cas d'utilisation sont abordés qui sont bien sûr transposables hors du domaine de l'Administration

Le présentiel comporte de nombreuses copies d'écran issues du logiciel qui vous aideront à reproduire les jobs.

jeudi 2 juillet 2009

Business Intelligence and Geospatial BI opensource softwares




The increasing amount of numeric data makes it difficult to control, to master.
The abundance of formats: excel files, XML, data stored in databases like Oracle, MySQL, PostgreSQL can be constraining.
Human intelligence is not sufficient to solve complex cases where many parameters must be taken into account.

Quoting Wikipedia, "Business Intelligence refers to skills, technologies, applications and practices used to help a business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected information itself".
Note that even if there is Business in this term, BI is not only used in commercial and economic contexts.

Here are some goals of Business Intelligence:
- Breaking the barriers between formats so as to proceed joins, crosses, and building homogeneous infrastructures. We also need good performance in data treatment, its quantity being huge.
- Giving us direct and graphical informations for what we need. These selected informations are usually displayed through graphs, reports and dashboards.
- Helping us in making good decisions. Putting dimensions into data, not only relations, allows instaurating hierarchical relationships between them. It refines our analysis and helps us prioritizing our actions. On Line Analysis Processing reflects this approach.
- Synthetizing. The use of complex algorithms, statistic techniques will uncover patterns or even predict phenomenons that wouldn't have been "macroscopically" detected by a human being. That's what we call data mining.

This schema was taken and translated from piloter.org, a french reference portal on business performance management. It illustrates the different components of Business Intelligence.



Globally, BI divides itself into two main domains: integration and valorization. Integration is at the top of the BI chain. It consists in collecting and storing data while valorization aims at distributing and exploiting it.





In the opensource world, two integration tools distinguish themselves: Pentaho and Talend.
  • kettle, a component of pentaho, a complete BI suite.
  • Talend is developed by a french company. It was awarded "company-to-watch" by the Intelligent Enterprise Magazine.
Integration tools are also called ETL for "Extract, Transform and Load" :
  • Extract: they read many data sources
  • Transform: they can apply treatments to data, convert them between different formats
  • Load: they include "write" features

The advantage of kettle is that it's part of a complete BI suite. The other modules of pentaho are Mondrian, an OLAP server, Pentaho reports, Pentaho DashBoards, Pentaho Weka for data mining.
Talend provides connectors to many valorization tools like PALO, Jaspersoft or SpagoBI. It integrates itself well in a complete BI environment. The Jaspersoft Suite includes Talend, where it's been renamed JasperETL.

OpenSource BI softwares are still young but they gain more and more popularity amongst big companies.

Geographical data is like any kind of data. To add the geographical dimension to a standard set of data, you would just add a geometry column describing the graphical properties of each row. While you can compare strings between them, proceed mathematical operations on numbers, what you can perform on geometry are intersections, union, splitting, difference,...

Integrating the geographical dimension to the ETL tools raised the interest of
the GIS societies and Community. Geopolitics, geomarketing are some domains in which we would use Spatial OLAP analysises and geographical reports. Also, they would be useful to face some contemporary issues like the understanding of how migrations of population are correlated with climate change.




In the opensource geospatial BI world, we can distinguish two integration softwares.
  • GeoKettle is based on Kettle by Pentaho. It was developed at the canadian Laval university by the team of Dr Badard.
  • Spatial Data Integrator is based on Talend and developed by CamptoCamp, a famous french geospatial company.
The advantage of GeoKettle is that it is part of a complete geospatial BI suite, as Kettle is. The other components of the suite are GeoMondrian, a spatial OLAP server and Spatialytics for navigation into SOLAP data cubes and dashboards.
The complete geospatial BI suite based on Pentaho will be presented at the Foss4G 2009.

Here is a set of operations you can accomplish with a spatial ETL:
  • Transform a complete folder of shapefiles into PostGIS Tables
  • Mass Coordinate Reference System transforming
  • Joining multiple data sources, like a MySQL Table with a geographic File.
  • Geographical Data quality control.
Globally, spatial ETL tools will help you build and maintain a solid spatial data infrastructure very fast and efficiently.

Most of the next posts of this blog will deal with Spatial Data Integrator. I haven't tested GeoKettle but what I can say is that SDI is really friendly to use. Even if SDI is not part of a complete geospatial BI suite, nothing prevents you from using the canadian geospatial valorization tools GeoMondrian and Spatialytics in addition to it.

jeudi 18 juin 2009

Freemind tip: explore and edit your XML Files

More and more, we use XML files to store data.
The XML file format gives structure to data in a hierarchical way. Thanks to its validation rules, the compliance of data can be checked. Most development languages allow parsing them in order retrieve a specific data.

Here are some examples of XML-based Files in the GIS World:
  • KML, aka Keyhole Markup Language has been developed by google.
  • GML, geographical markup language.
  • GeoRSS, the RSS files with embedded location data.
  • SLD, Styled Layer Descriptor, which permits advanced map renderings.
  • getfeatureinfo request responses returned by a WMS server.
You can read XML files in your web browser but it turns out to be limited and very static.
With Freemind, you can add metadata to nodes like web links, notes, which can be useful if you'd like to collaborate on an XML file before releasing a final version of it.
Also, you can highlight some nodes, add some markers (icons for example), things that are helpful when you start learning a specific XML-based format.

Freemind and XMLs


Freemind is used to represent and organize ideas in a hierarchical, dynamic and graphical way. The Freemind files are XML-based files with a .mm extension.

Let's take the case of an SLD file. Its structure is rich and complex. I'd like to comfortably explore it before editing it. Freemind will ease navigation through it. Here are the steps to import our SLD file in Freemind.
The operation is valid for any XML files, so you can do it for KMLs.

1 -I open my SLD file in a text editor and I copy all the text

2 - Then I simply paste it in FreeMind. Here is the result:


As you can see, my initial linear text has been rendered in a tree-structured way, which is much more readable and attractive. it's much more comfortable to edit as well.

By default, after pasting your text, all your nodes are unfolded. You'd surely prefer to get all your nodes folded and unfold only the nodes you want.
To fold all the nodes, just pass your cursor over the root node and select in the navigation item > "fold all the nodes".

Now, you can use FreeMind features:
  • zooming, automatic folding and unfolding.
  • add nodes, edit nodes
  • take notes,
  • add attributes, icons...
  • attach web links to nodes
  • filtering your nodes by their icons or attributes.
To export your mindmap to an XML:
1 - select the node you want, most often the root node, press copy
2 - then paste in a text editing tool. Now you've got your xml. Note that the icons and attributes you added in Freemind have no consequences on the content of your XML but beware of the "note" nodes you might have added. These ones will be considered as XML nodes.

FreeMind is a pretty efficient for taking notes.
One goal of Freemind is to implement a mode in which people would collaborate on a common file over the internet.

vendredi 5 juin 2009

Unexpected Uses of OpenLayers

As I went through the websites listed in the OpenLayers Gallery, I was surprised of some unexpected uses of the javascript library. Discovering them made me enthusiastic. I decided to make a post about these strange maps...

Here is the collection I noticed. If you have some more, why not sharing them!

Mathematics : Mandelbrot Fractal Browser



With this project, you can navigate through a mandelbrot fractal frame.
You can zoom in or out. All along your navigation, you won't get lost in this infinity of forms thanks to the overview map.
OpenLayers was obviously the most convenient technology for this kind of displaying.
This website makes intelligent use of the resolution configuring, the zooming capabilities and ergonomic characteristics of OpenLayers.

Biology : Genome browser



What if, in the same manner as above, you could explore the genome?
That's what this website allows you to accomplish.
The coordinates are, here, replaced by the base pairs position and each genome's area is georeferenced.
A click on a region triggers the display of its characteristics.
Really nice!

The code is avalaible on google code. If you're curious about it, check it out here.

Gaming : Pentamino puzzle



This website demonstrates extensive use of OpenLayers' Vector Capabilities.
The build of such an interface is a real technical challenge.

Communication : Rosetta Project



The rosetta project aims at building an archive of all the languages in the world.
A very rich image representing the Earth with languages labels emerging from the continents helps you find your way in this tremendous collection.
With such an attractive and interactive homepage, you want to go deeper into the subject.


These examples show localization in fields where it wasn't expected.
It shows some very clever uses of OpenLayers. For some of these applications, one might have first thought of other technologies like Flash, but as we can see, the light-weight OpenLayers library really does the business good.

mercredi 27 mai 2009

Talend Case Studies

Talend is a powerful data integration opensource software.
The offical website includes a section with some clear, printscreened tutorials that let you explore the major functionalities.
The PDF documentations: user and components reference guide (in french and english) are really complete.
In the components reference guide, you'll find scenarios for each component.
Also, some webinars are animated live during which some Talend users from different organizations (public, private) explain how they use Talend. The webinars are still accessible in the webinar archives part.

Like for every software, the best thing is to practice. To approach all the software's potentials or even to figure out what could be processed, case studies are really helpful. So, it's a good news Talend published a case studies PDF. You'll probably find it useful to see how organizations used Talend in some ambitious business intelligence projects where data integration and orchestration were some prerequisites.
http://www.talend.com/document-download.php?doc=practosdi2fr&src=AdDeveloppez_may09

vendredi 15 mai 2009

From GIS File Management to DataBase Management with PostgreSQL/PostGIS

One goal of data Integration is to collect data from an organization into a single location.
One common difficulty for the data integrator is the spread of data that makes it difficult to locate.
Another one is to keep the data structure and unicity, even when data is centralized.

PostGIS/PostgreSQL is a very interesting and convenient datawarehouse for hosting an organization's pool of geographic Data:
-Fist of all, it is opensource, very well-documented.
-It takes advantage of the contributions of a growing community; PostGIS will soon support raster with the WKTRaster project.
-When you access PostGIS files through applications like QGIS, you guarantee the user a quick access to data; you can prevent him from modifying the data structure like the names, the types of the fields : this way, it maintains your data quality
-Automatic Processes can be performed on the server-side thanks to triggers. It's useful for historization : imagine automatically adding the current date, the user name when inserting / updating data.
-Roles and permissions are easily manageable and more fine-grained and versatile than ACL rights on a server. Whereas server's ACLs only allow you to give or revoke permissions on accessing/reading/writing a file, with PostgreSQL, you can grant privileges on reading (viewing), inserting, updating and deleting values.

All these elements contribute in easing Data Management, ensuring its quality.

Let's consider I managed to gather a tremendous quantity of geographic files in a set of folders ,now the question is: how to migrate my data into my PostgreSQL/PostGIS Database?
Assuming a database is equivalent to a folder and a database table to a GIS File, I'd like to get a database structure which would be as compliant as possible with the initial folder tree structure.

In the next posts, I'll detail two ways to get our database "skeleton", each one leaning on a delimited file with the databases' names. The first way uses a DOS Batch File, the second the ETL (Extract, Transform and Load) software Spatial Data Integrator.

PS: you can transform GIS Files into PostGIS tables, but it's also reversible: you can convert PostGIS Tables into GIS Files.

dimanche 26 avril 2009

QGIS routine: get the attribute values of selected features

Here, we'll learn how to access the values of a layer's selected features.

The retrieval of these values deserves many uses:
Statistics:
-some aggregation operations like sum, average, whatever...

Actions:
-Opening a picture related to a ponctual object
-Opening a web browser which URL includes one or more attribute values

Outputs:
-Export the values of selected features in a PDF report
-Opening a spreadsheet with these values so as to make graphs.

Notice: most of the actions mentionned above can also be accomplished using the QGIS actions that you access through the layer's properties.

Here is the QGIS routine that will allow you to access the attribute values of the active layer's selected features:

>>> myLayer=iface.activeLayer()
>>> objects=myLayer.selectedFeatures()
>>> objets.attributeMap()
>>> object=objects[0]
>>> attributes=object.attributeMap()
>>> attributes[0].toString()
"Bonifacio"

>>> objects=myLayer.selectedFeatures()
it returns a list of the selected objects
>>> attributes=object.attributeMap()
This attributeMap() method allows you to get the attribute values of the object you considered, in our case, the first one (object=object[0]).
It returns a dictionary which each key is an auto-incremented number. Notice that, unfortunately, the key is not the attribute name.
>>> attributes[0].toString()
Each value of the attributes is QString object. The method toString() makes it readable for the user. Here, we get the the first attribute's value.

Most often, you would combine the previous "attribute name" routine with this one.