From Data to Information: The role of Spatial Data Query

by Rolf Becker,
Technical Director,
MAPS Geosystems, Sharjah, U.A.E.

| Abstract Of The Paper & The Profile of The Speaker | Speaker Index | Paper Title Index |

Introduction

Trying to define Data and Information, the following definitions are made:

Information is task-critical knowledge or an answer to a query

It is obtained through cross-referencing data

As long as there is no question there can be no information

Stored information turns into data

To decide to turn right or left at the next intersection, or the decision whether or not to build a high speed railway link between Hamburg and Berlin, like any other decision, needs information: task-critical knowledge.

Interactive or direct data query

Information can be the number looked up in a telephone directory or might be the synthesis of a complex investigation. Say your car has broken down and you need to call a repair shop. Depending on the circumstances and the available data source you might be looking for a workshop specialised on your brand of car, within easy reach, which is reliable, reasonably priced, available at the week end etc. etc. To find this shop you might have to investigate several data sources, directories and town maps, enquiries with friends and so on. The information obtained summarizes in the telephone number that is needed to contact this particular shop.

The more data sources or data layers that are available to query, the more complex might be the analysis and the less ambiguous will the answer. To obtain information fast, we recur to automated procedures. Such procedures can be either direct or inter-active.

In an interactive procedure, the information finding permits an evaluation of intermediate results, respectively provides for introducing knowledge, experience as well as gut feeling (latent knowledge hidden somewhere in the brain). An interactive process also allows to take into consideration data that is not sufficiently structured for automated information extraction. Typically, this applies where pictures and similar spatial data are involved. For instance, from a CAD drawing a procedure automatically extract the distance between two points, but a simple query - such as if a point is located in- or outside a given area - requires visual interpretation.

The interactive approach to information finding can be very lengthy and, more likely than not, requires experience and training. Also, it is prone to pitfalls because the results are not based on the simultaneous cross-referencing and thus does not allow for taking the proper weighting of the data into consideration. This prevents blunder detection. (The difference between and Strip- and a Block Adjustment of an aerial triangulation is a good analogy.)

While conceding that inter-activity can be very beneficial and in many instances is a must, to obtaining information under time constraint, we must reduce inter-activity as much as possible by resorting to direct data query.

Contrary to the inter-active approach, the direct query provides the information in one step: 'ask question, get answer fashion', not necessarily instantaneously, but at least without human intervention. As a prerequisite all data must be suitably structured and referenced and appropriate query algorithms must be available. The disadvantages are that, the validity of the query process can not be traced. The information has to be accepted 'blindly'.

Structured and referenced Data

Knowledge is retrieved by the brain, information is supplied by external means. Books and pictures can contain an unlimited amount of information. Extracting this information requires a human brain. A cumbersome and time consuming procedure that becomes impossible if large data volumes are involved and the available time is restricted. Tables, spreadsheets, time tables, telephone books, balance sheets, dictionaries, plans and maps etc., all are designed to present data in a structured way. The fact the data is classified and referenced, allows us to obtain information 'at a glance', for instance by cross referencing rows and columns. Think of a time table or a town maps. Still, to extract information the data had to be read or 'viewed' into a human brain, which alone had the capacity to establish the necessary cross-reference derive information.

Spatial and Non-Spatial Data

Traditionally, there has been a duality in the storage, and consequently also in the processing, of spatial respectively non-spatial data.

Generation

Non-spatial data

Spatial data

1.

Books

Pictures

2.

Lists, tables, spread sheets, graphs, dictionaries

globes, maps, plans

3.

Spread sheets

digital maps

4.

RDBMS

GIS

5.

Combined Systems

This duality has been maintained in the computerized processing of digital tables and spreadsheets - or their equivalent for spatial data - the digital map. Computers by themselves, brought no fundamental changes. Data bases and digital maps store referenced data, but to provide information, both need visual evaluation.

This all changes with the introduction of RDBMS and GIS, both of which are able to create information automatically: RDBMS for attribute data and GIS for spatial data. Asked the right questions both systems can give answers without human interventions and potentially very fast.

While from a RDBMS we expect to obtain information in an 'ask question get answer' fashion, GIS still carries the notion that visual interaction is required. This does not hold true. However, it is essential to differentiate between a display as an analysis tool and a display as an information visualizer. While GIS is excellent at presenting information visually, its topological analysis capabilities makes visual graphics as an analysis tool redundant.

Unfortunately there are still exceptions. The increasing use of raster data brought a new challenge. While some raster data, like digital orthomaps and scanned maps, can be spatially referenced, automated cross-referencing with other data layers is only possible in exceptional cases, e.g. multi-spectral images. What is still required is automatic object recognition. As long as the later is still unresolved, visual cross-referencing of image data remains an imperative.

Embedded GIS

The traditional difference in the data storage and processing for attribute and spatial data has distracted from the fact that the spatial location of an object is only one of its many attributes.

Along a similar line: Instead of Geographic, GIS should be referred to as Spatial Information Technology, since ' geographic' distracts from the increasing importance of GIS in non-geographic application relating to mobile objects that have location in space - and time - among their most significant attributes. For instance, a car might have hundreds of attributes, but one of the most task-critical ones is its location at a given place and time.

RDBMS cross-reference non-spatial data very efficiently. With the help of topology, GIS does the same for spatial data.

GIS technology is restricted to spatially referenced objects . It first defines the geometry of an object and than attaches the object and its attributes in form of a data base entry. GIS treats the definition of an object as an attribute of a geometric element: point, line, polygon. It is only logical, that systems emerge which instead of attaching objects to spatial data, attach available spatial data to objects stored in Data Bases. ESRI's Spatial Data Engine (SDE) is a forerunner of such systems that embed spatial data into RDBMS. A new generation of software (ArcView, MapObjects etc.) use SDE and taking advantage of the speed of RDBMS and the spatial query capabilities of GIS, are capable to extract and visualize information at a speed that has been unconceivable until now.

The latest developments both in RDBMS as well as GIS confirm this trend. Object oriented, spatially enabled databases, Visual Information systems (VIS) etc. are becoming buzzwords. While GIS on its own is still struggling with the third dimension, RDBMS embedded GIS will tackle not only this problem but also the fourth dimension, time. Spatial positioning sensors, like GPS, as well as communication links will play an increasingly important role in information technology, leading to completely new applications in transportation, communication, distribution, warehousing, human resource management etc.

Benefits of embedded GIS

Embedded GIS processes non-spatial and spatial data simultaneously. It can take all data layers into consideration, with their respective weights. It relies primarily on processing power rather than pre-determined data structures, allowing more flexibility in introducing new algorithms adapted to the data type. It reduces inter-active data analysis and therefore operator intervention. That means:

Better information can be provided and fast Information can be obtained by persons who have zero notion of computing, RDBMS, GIS or whatever. There is no end-user training requirement.

The data storage and processing can be done at a remote site (Internet/Intranet).

The user interface can be reduced to the answering of some prompted questions at the most.

Increased use of RDBMS technology will promote the use of data at a lower hierarchy, this will reduce data conversion and more data will become accessible.

For the first time we will be able to get information straight to the end-user, who will gain the knowledge required to act.

The dangers of computer generated information

Computer and their software are not error free. However, the main limitation of computer generated information is not computer related, but caused by a query of an insufficient data. This might be because not all the required data exist, at least not in a referenced or compatible format, or simply because their existence or whereabouts are unknown or their significance not recognized.

As technology progresses, we will be depending on computers more and more, if we like it or not. Already today, when we fly in an air plane, we place our faith on a computer that takes vital decisions that the pilot no longer controls.

Practical Application

From a pure data supplier, a map maker, my company is more and more involved in Systems Installation. That is, not only data supply, but also in setting up procedures for the end-user to access information without having to battle with the intricacies of computer technology. The following two projects are typical applications.

One application is for the Agricultural Guaranty Program of the European Community that essentially regulates agricultural production by subsidising farmers. The purpose of this work is to check the farmer's subsidy application, determine the amount due as compensation and notify the farmer accordingly. Within a time frame of 8 months, orthomaps had to be made for an area of 20,000 km2, on which the location of 850,000 parcels had to be identified and their surface determined. This had to be linked with a data base containing the farmer's applications. Discrepancies have to be detected. Now, for each parcel a large scale map of A4-size has to be produced, showing the adjoining parcels, the relevant attributes as well as the results of the verification.

The other project is for a land consolidation scheme covering 6000 km2, which is required because of the planned construction of a large reservoir. Very accurate DTM and subsequently digital photomaps had to be produced. About 100,000 parcels have to be digitized from existing cadastral records and their attributes entered into a data base. The 210 km2 reservoir with a shore line of 1500 km will directly affect about 10,000 parcels due to total or partial submersion.

One of the embedded GIS applications is to determine the affected parcels. Produce individual large scale plots for each affected parcel, display the respective attributes and the inundated areas.

Both projects provide the end-user with a facility to extract information in huge quantities and practically without operator intervention. The GIS technology is embedded to an extent that the end-user is un-aware of its existence.

Conclusion

Information is a decision tool. The more accurate and the more accessible the information becomes the better the decisions. To obtain information automatically we need structured data and suitable analysis algorithms provided by today's combined RDBM and GIS technologies.

Highly automated data acquisition systems have contributed to a data explosion that most organizations are unable to cope with. Although data processing software becomes more and more performing, this is negated to a great extent by the increasing complexity of the processing systems. This situation can be overcome, if systems are installed that permit the end-user to obtain required information in an 'ask question get answer fashion', i.e. without the cumbersome piecing together of partial information obtained through intermediary and lengthy procedures. Since visual analysis is no longer an imperative, the merging of advanced GIS and RDBMS technology can contribute to an efficient use of available data with minimum effort.

An Automated Teller Machine (ATM) is an exemplary application. The 'hole in the wall' carries out a very complex transaction operation with an absolute minimum of operator intervention and no training requirements. These kind of user-interfaces can be extensively used. They do not prevent inter-activity, but on the contrary, can ask context-sensitive questions in a language and terminology familiar to the user.

Information is indispensable. It must be accurate, affordable, profitable and unobtrusive. Above all it must be accessible. Looking for a systems that should provide such information, we must look beyond the computer. While the importance of the computer will continue to grow, this is likely to happen in the background, for the end user computer operation will become much less visible. Today, computers are still crude and very often extremely annoying machines that make us suffer a lot. Whatever improvement we witness is compensated by the increased complexity of the systems. The demand to study the manuals is un-realistic, and the imperative that we have to be trained to use computers (The Germans even say: one has to learn to serve a computer) must be wrong. It leads to a continuous keeping-up with new developments, which - not to be confused with education - only cuts heavily into the time available to get on with the job at hand. To train people to use computers is contrary to the computers core asset, the programming. The computers must be trained to be used by people.

An involving technology is an immature technology. The more a technology matures, the less attention it needs from its user, until it virtually disappears. When we switch-on the light, we do no longer even think of electricity. The computer technology itself will become user-transparent, as it has already happened in many of our appliances, the washing machine, the cars, the camera etc.

On the same token, GIS technology, and other spatial information technologies like photogrammetry and GPS, will become embedded in many of our day to day operations, where - completely user-transparent - spatial data query will contribute to optimize our decision making.

Internet/Intranet type of user interfaces greatly contribute to this development by increasing the distance between computer and end-user, and this not only physically, but mainly mentally, in the sense that we will access many applications without having to be in the least concerned about the underlying technology however complex this might be.

GIS, through the introduction of automatic spatial data query makes an essential contribution for this to become reality.

| Abstract Of The Paper & The Profile of The Speaker | Speaker Index | Paper Title Index |


CGIS HOME PAGE

CONTENTS