mercredi 27 mai 2009

Talend Case Studies

Talend is a powerful data integration opensource software.
The offical website includes a section with some clear, printscreened tutorials that let you explore the major functionalities.
The PDF documentations: user and components reference guide (in french and english) are really complete.
In the components reference guide, you'll find scenarios for each component.
Also, some webinars are animated live during which some Talend users from different organizations (public, private) explain how they use Talend. The webinars are still accessible in the webinar archives part.

Like for every software, the best thing is to practice. To approach all the software's potentials or even to figure out what could be processed, case studies are really helpful. So, it's a good news Talend published a case studies PDF. You'll probably find it useful to see how organizations used Talend in some ambitious business intelligence projects where data integration and orchestration were some prerequisites.
http://www.talend.com/document-download.php?doc=practosdi2fr&src=AdDeveloppez_may09

vendredi 15 mai 2009

From GIS File Management to DataBase Management with PostgreSQL/PostGIS

One goal of data Integration is to collect data from an organization into a single location.
One common difficulty for the data integrator is the spread of data that makes it difficult to locate.
Another one is to keep the data structure and unicity, even when data is centralized.

PostGIS/PostgreSQL is a very interesting and convenient datawarehouse for hosting an organization's pool of geographic Data:
-Fist of all, it is opensource, very well-documented.
-It takes advantage of the contributions of a growing community; PostGIS will soon support raster with the WKTRaster project.
-When you access PostGIS files through applications like QGIS, you guarantee the user a quick access to data; you can prevent him from modifying the data structure like the names, the types of the fields : this way, it maintains your data quality
-Automatic Processes can be performed on the server-side thanks to triggers. It's useful for historization : imagine automatically adding the current date, the user name when inserting / updating data.
-Roles and permissions are easily manageable and more fine-grained and versatile than ACL rights on a server. Whereas server's ACLs only allow you to give or revoke permissions on accessing/reading/writing a file, with PostgreSQL, you can grant privileges on reading (viewing), inserting, updating and deleting values.

All these elements contribute in easing Data Management, ensuring its quality.

Let's consider I managed to gather a tremendous quantity of geographic files in a set of folders ,now the question is: how to migrate my data into my PostgreSQL/PostGIS Database?
Assuming a database is equivalent to a folder and a database table to a GIS File, I'd like to get a database structure which would be as compliant as possible with the initial folder tree structure.

In the next posts, I'll detail two ways to get our database "skeleton", each one leaning on a delimited file with the databases' names. The first way uses a DOS Batch File, the second the ETL (Extract, Transform and Load) software Spatial Data Integrator.

PS: you can transform GIS Files into PostGIS tables, but it's also reversible: you can convert PostGIS Tables into GIS Files.