You are here

What does it mean "to use Linked Data for Spatial Data"?






In the context of the SmartOpenData project, using linked data for spatial data means identifying possibilities for the establishment of semantic connections between INSPIRE/GMES/GEOSS and Linked Open Data spatial related content in order to generate added value. The project requirements are within the environmental research domain.

This will be achieved by making existing ÔÇ£INSPIRE basedÔÇØ relevant spatial data sets, services and appropriate metadata available through a new Linked Data structure. In addition, the proposed infrastructure will provide automatic search engines that will crawl additional available geospatial resources (OGC and RDF structures) across the deep and surface web. RDF structure is used to describe the relation between two objects (for example object A is next to object B, where object A and B could be eventually stored in different databases). The project will go to great lengths to avoid duplicating information. For example, the following GML snippet describes the country of Afghanistan:

 

<gml:featureMember>
<ogr:world fid="F0">
<ogr:geometryProperty>
<gml:Polygon>
<gml:outerBoundaryIs>
<gml:LinearRing>
<gml:coordinates> coordinate list </gml:coordinates>
</gml:LinearRing>
</gml:outerBoundaryIs>
</gml:Polygon>
</ogr:geometryProperty>
<ogr:NAME>Afghanistan</ogr:NAME>
<ogr:GMI_CNTRY>AFG</ogr:GMI_CNTRY>
<ogr:REGION>Asia</ogr:REGION>
</ogr:world>
</gml:featureMember>

 

These will be combined with other sources of data. To take an example from existing data we might combine this data with The data held by GeoNames about Afghanistan to produce this RDF:

 

<http://sws.geonames.org/1149361/> gn:name "Afghanistan" ;
ogr:gmi_cntry "AFG" ;
ogr:region "Asia" ;
ex:geometryGML "{GML literal}"

 

So the URI of the country comes from GeoNames and the other data comes directly from the already available GML and is returned as RDF triples even though it is stored as GML (an XML dialect). The value in linking the GML data held with GeoNames is that the latter includes multiple alternative names for the country as well as copious links to Wikipedia about it. It's also easy from here to link to the DBpedia entry at http://dbpedia.org/resource/Afghanistan that contains a great deal more information about Afghanistan.

We've used the country of Afghanistan in this example because it's alphabetically the first country in the list for which we have GML readily available. The point is that by re-using existing identifiers available in the Linked Open Data cloud, SmartOpenData immediately will have access to a lot of other data sources and these will be available through SPARQL queries. But our goal will go much further. The project will build an infrastructure of objects and relationships with the added value of further links. By associating existing geospatial data with URIs used elsewhere, recording semantic relationships and linking across different data sets, the objects will have greater context and therefore usefulness. Simple cases like the one above can be processed on the fly but the project will build computationally expensive queries in advance to aid typical analysis and will develop methods to store this added value information as triples, whether centralised, distributed, off-line, or, where possible, calculated on the fly. The aim is to achieve the highest performance in resolving the queries and delivering the required information and functionality with the minimum of data duplication. SmartOpenData will allow the interrogation of this data using linked data's query language, SPARQL directly and turning data as triples.

An example of the kind of spatial queries that SmartOpenData support is ÔÇ£which types of land covers are represented in specific protected areas?ÔÇØ Here we are working with two different sets, potentially stored in different repositories, where every set could have Gigabytes in binary form. If we would like to support generic queries we could in the worst case compare every object against every other object. However it is not possible to provide such type of analysis in real time online. Therefore it is necessary to build parallel RDF structures to existing relation data. The duplication of this data in a triple store could be necessary, but would immediately introduce the problem of maintainability so data needs to be synchronised. Introducing semantic principles here will support better usability, though also will generate a list of research challenges.