Harvesting Big Geospatial Data from Natural Language Texts

Why do we want to harvest geospatial data from texts? Asking this question is important, since collections of natural language text, e.g., are often not representative of the entire population. There are at least three aspects in which the geosp spatial data harvested from texts is valuable. First, they can provide valuable human experience information, which is not available in other datasets. Second, geospatial data harvested from some natural language texts, such as social media posts, reflect near real-time situations and are valuable for applications such as disaster response.

Third, some geospatial data is only available in unstructured texts. Examples include events reported in newspapers, historical battles recorded in old archives, or business addresses contained in Web pages. In these cases, harvesting geosp spatial data from texts is necessary for enabling advanced spatial analysis. The goal of geoparsing is to recognize the placenames, or toponyms, mentioned in texts, and identify the corresponding instances.

The first step is to identify the place name. The second is to find its geographic location. The third is to resolve the ambiguity in the name. For example, “Washington was an important stop on the Southwest Trail.” This step is the first step in solving the problem.

“Harvesting geospatial data from unstructured texts has been frequently studied in geographic information retrieval (GIR) under the topic of geoparsing (Jones and Purves, 2008; Purves et al, 2018). The goal of geoparsing is to recognize the place names, or toponyms, mentioned in texts, and identify the corresponding instances and the location coordinates of the recognized place names (Freire et al, 2011; Gritta et al, 2018). A software tool developed for geoparsing is called a geoparser, which takes unstructured natural language texts as the input, and outputs structured geographic data with the recognized place names and their location coordinates. Some geoparsers, e.g., GeoTxt – Java (Karimzadeh et al, 2013), are published as Web services which provide easy access for general users through the Internet.”

Geodata3

Harvesting Big Geospatial Data from Natural Language Texts

Mordecai: Full Text Geoparsing and Event Geocoding & Github