mercredi 15 août 2012

[Sextante Plugin] An R script for Neighborhood Detection


Des bâtiments regroupés en bloc à Paris
Artice in English

[Plugin Sextante] Un script R pour détecter les voisinages : Cet article explique comment réaliser un petit script R via le plugin Sextante pour QGIS afin d'établir des relations de voisinage entre objets en fonction de la distance qui les sépare. Il comporte une petite introduction sur l'écosystème QGIS, sur R aussi. Et puis bien sûr, des détails sur l'écriture d'un script R pour le plugin Sextante GIS Plugin.




Content:
  • An opensource spatial constellation
    • R
    • QGIS as a solar planet
    • Sextante GIS plugin as a glue
    • R and QGIS
  • Performing a neighborhood analysis with Sextante plugin
    • The R example scripts
    • Neighborhood by distance: my R script
    • Writing an R script
    • The complete Sextante R script
  • Usage in QGIS
    •  Quiet places where to sleep/live in Paris
    • Styling and visualizing
    • Statistics
    • Convex hulls

Download the R script directly here.

An opensource constellation

http://commons.wikimedia.org/wiki/File:Solar_sys.jpg

R

R is a statistical language from which a huge number of libraries, representing a multicoloured palette of domains, have proliferated, from finance to ecology, from spatial to morphometry.

Its popularity is increasing very fast. Even the New York Times wrote about it. Also, an article describes how it is used at google.

More and more companies search for data scientists that master R, besides SAS, the proprietary equivalent tool.

In the spatial domain, there are some functions that you'll never find in any opensource GIS software, be it GRASS or Sextante.

In this spatial R page, you might discover some tasks you've never heard of in the geospatial world, like spatial modeling, point patterns and small area estimations.


QGIS as a solar planet

More and more, QGIS tends to integrate famous tools. The first one was GRASS, then there was R, Sextante GIS, recently there was the Orfeo Toolbox for Imagery and Remote Sensing.

All of this constitutes an opensource spatial galaxy/ecosystem in which every planet (here, software) develops itself in contact with other planets, also stars (the technologies that gain popularity like Mapnik, Leaflet).


Sextante GIS in QGIS as a glue



Initially, Sextante GIS is a geospatial library.
Then, it became a plugin for QGIS, developed by Victor Olaya. It not only included Sextante functionalities but also functions from Orfeo Toolbox, R, GDAL, GRASS,...

The goal of this plugin is to indifferently execute functions from different libraries in a graphical environment since not everybody is familiar with command lines.

Also, the Sextante QGIS Plugin integrates a modeler in which you can chain functions and make QGIS a kind of automated factory for spatial tasks. Remember GRASS also has a modeler.

I think that with the improvement of his Sextante modeler, QGIS could become a very powerful spatial ETL and could compete with Spatial Data Integrator/Talend Open Studio, GeoKettle or FME (when it deals with spatial treatments).


R and QGIS


There are many plugins that use the R library: manageR by Carson Farmer, spqr by Barry Rowlingson, SDA4PP (Spatial Data analysis For Point Patterns, Home Range Estimation

- manageR is an R editor inside QGIS. Knowing R is a prerequisite for using this plugin.

- spqr produces graphics for stat visualizations .

- SDA4PP is a suite of functions for the analysis of point patterns: are the points clustered, seggregated? 

I think that the most interesting R plugins for QGIs are those that guide the user through a graphical interface and perform complex R tasks behind.



R in Sextante QGIS plugin


When you install the Sextante GIS plugin, you must activate the R scripts to get them available in the Sextante dock

- Also, you have to mention where R is installed.

- I preferred to move all the R scripts that initially were in C:\Documents and Settings\[username]\.qgis\python\plugins\sextante\r\scripts to C:/R/QGIS_SEXTANTE to make them more accessible. I changed the R scripts folder in the configuration dialog as you can see in the image above.

The Sextante R Scripts

If you don't know anything about R and if you want to use an R script but you don't have the approriate package to use it, all you have to know is how to install this package. It is very simple. I recommend you RStudio for this (if a library has not been installed in your standard R, it won't be available in Sextante QGIS R scripts)

Using an R script doesn't require any knowledge in R programming but it's preferrable to understand how it is made, and to see which function it uses. You can refer yourself to the documentation to get some details, by using R-Seek. Be studious: some functions sometimes require some theoretical knowledge "that could lead either to years of therapy, or to your Ph.D".



Performing a neighborhood analysis with Sextante plugin

The R example scripts

There a already 9 R scripts in the package as you can see in the image above. The system of these scripts are input-output systems where the inputs are always spatial data and where outputs can either be spatial data, plots (images), or console outputs.

- With the Ripley Rason script, you can create some envelopes that are a bit like convex hulls but stick more to the group of elements.

- You can create create either regular, either random points based on a polygon (generally, we use a grid)

- You can analyze the clustering of points using the F, G, K Functions scripts

- You can check whether you values are random - that means following a normal distribution - with the Kolgomorov-Smirnov normality test script

- You can make kind of same analysis but with spatial quadrats: consider a study zone with points. You divide this zone into a certain number of squares (quadrats). You count the points inside each square. Finally, you make a test to see if your points are randomly located (it's called the null hypothesis of Complete Spatial Randomness).

- There is an R script that makes a raster histogram. It illustrates that you can have rasters as inputs, not only vectors.



Neighborhood by distance: my R script

As you can see, the R scripts mainly focus on point patterns. I wanted to implement an R script that I had already written, but in an R environment, that focused on areas.

The goal of this R script is to specify a distance below which some polygons (for instance, buildings) are considered neighbors. Once it finds the neighborhood relationships, it counts the neighbors and affects each building to the group of neighbors it belongs to.



The output of this script is the same layer as the base one but enriched with two more attributes: neighbors count and neighbor group.

This script can be useful to determine the structure of the habitat and detect the buildings that are isolated or highly grouped based on a mimimum distance threshold.

Here, you'll find a complete description of the spdep package.

Here is the script, in R language:




Writing an R script

The goal was to transform this R script into a Sextante-QGIS R scripts.

If you click on "create a new R script", an edition window opens and you can have an assistant through the "edit script help" button. Unfortunately, I didn't understand how this script help worked. So, I inspired myself from the example scripts.

To create a script, create a new file, in my case, neighborhood_by_distance.rsx and edit it.

An R script is structured in two parts:
- the configuration part where you set the input, output, options.
- the code part where you use the variables set in the configuration part inside a standard R script.


The configuration part

Group

Each script belongs to a group of scripts. It's the group key, for instance,
##[datagistips]=group

means that the script belongs to the "datagistips" group. If the group doesn't exist, it will automatically be created.

Input

The inputs are either vector, either raster. I don't think you can mention you'd like a certain type of features: points, lines or polygons (maybe an improvement to be done as some analysises can only be done on a certain type of features?)

For vectors, you write:
##polygons=vector

polygons will be the label in the graphical window

For rasters:
##layer = raster

Options

In options, the user can indicate a numeric value or a string value.

For the numeric value
##distance=number 100

Here, the label will be distance and the default value is set to 100.
When the user will specify the value, he will type it manually or choose it from the different layer statistics

If the value is character, the user will have to type it.
 ##title=string France

Here, the label will be title and the default value is set to France

Fields

One very interesting thing is that you can use a field in a script. In the case of the Kolgomorov-Smirnov example script, you choose the numeric field from which to test the normality.
##field=field layer

Then, in a script
>lillie.test(layer[[field]])     
will integrate the field variable into the script

Outputs

For spatial data outputs, simply write
##output=output vector

Graphical outputs and console outputs will both be in HTML format

For graphical plots outputs, you must mention the tag
##showplots
(you could imagine producing some lovely plots with ggplot2)

For console output, you must precede the command by >
>lillie.test(layer[[field]])    


The complete Sextante R script

Below is the final R script. You can also download it here.

As you can see, you can instantly port your R code to a graphical environment so that every QGIS user, even a beginner, can use your process. Also, it's easier than developing a python QGIS plugin.

The plugin then appears in the Sextante Toolbox

Usage example in QGIS

http://commons.wikimedia.org/wiki/File:Paris_sunset.JPG


Quiet places where to sleep/live of Paris

Here, I executed the script on an extract of buildings in the 1st district of Paris that I downloaded from the Geofabrik OpenStreetMap data repository

You could use it to determine which buildings are the most isolated, where you would probably sleep one night or live, far from the busy, crowdy life of the City of Lights.

The distance that I chose was 15 meters. In such a big city like Paris, that's quite a lot.

The R script in action


Styling and visualizing

Once I had my neighbors layer, I could style it differently depending on the neighbors counts or the group it belonged to..

Clustered and isolated buildings on top of OSM data (with OpenLayers plugin)

Statistics

With the standard statistics tools incorporated in QGIS, you can analyze the spatial structure.


The results mention that there 24 groups of neighbors. The biggest group contains 23 buildings. Half of the neighbors contain between 0 and 7 neighbors.

You can also produce histograms of these values with the Statist plugin and see the distribution of counts for the neighbor groups



By using the request functionnality, I could locate buildings that where relatively isolated.

Buildings with less than 3 neighbors (distance of 15m) on top of Bing Aerial Imagery

Convex hulls

Finally, why not creating convex hulls to visualize the neighborhood groups and put some OSM data behind to see how it looks?



The output data on top of OpenStreetMap data (15 meters threshold)

When on top of OSM data, you can easily visualize the buildings groups. Remember that it all depends on the distance thereshold you used. With a 5m threshold, you get this:

The same but with a 5 meters threshold (nicer than with a 15m threshold, isn't it?)

These spatial groups could probably reflect the different residences that exist in this place. They could be useful to plan a more detailed OSM investigation and mapping on the ground (to map residences in OpenstreetMap, use landuse=residential, name=[residence name], or a relation of type site)

Also, you could use these blocks as a base layer for some aggregated statistics: number of buildings, area, perimeter, lacunarity, fractal dimension, and so forth..

6 commentaires:

  1. Really helpful post with great details. I really appreciate your job. Thanks for sharing.

    GIS alerting

    RépondreSupprimer
  2. Hi, i am trying to write a R script for sexante.
    I am not sure how to convert this syntax in R to R script sexante.

    coordinates(input1) <- ~Long+Lat
    proj4string(input1) <- CRS("+init=epsg:4326")

    where input1 is a dataframe in R. I can import the dataframe as a table in R script.

    RépondreSupprimer
  3. hi there,
    i am trying to follow your steps. but i am running into lots of trouble.Could you help me out.

    I have describe my problem in the qgis userforum. Please take a look,
    http://osgeo-org.1560.x6.nabble.com/finding-neighboring-polygon-within-a-distance-td5069898.html

    RépondreSupprimer
  4. @shreshai : on QGIS, if you measure the distance between two buildings, what do you find ? the projection system (4326) might be at issue

    RépondreSupprimer
  5. Thank you very much Mathieu. I also downloaded OSM data and yes it was in epsg:4326. I feel like a dumb !

    Problem solved. Thanks.

    RépondreSupprimer