2 minutes
Extract & resolve geographic entities from unstructured text
In this article we are going to see how to install a great opensource tool called CLAVIN
(Cartographic Location And Vicinity INdexer) that can extract and parse geographic entities from an unstructured text.
The installation will be done on Ubuntu 18.04.
Here is an example of what you can do: http://clavin.berico.us/clavin-web/
Here is the description of the tool coming from the official website:
CLAVIN does not simply “look up” location names – it uses intelligent heuristics to identify exactly which “Springfield” (for example) was intended by the author, based on the context of the document. CLAVIN also employs fuzzy search to handle incorrectly-spelled location names, and it recognizes alternative names (e.g., “Ivory Coast” and “Côte d’Ivoire”) as referring to the same geographic entity.
Prerequisites
Install Maven
Update your system to the latest stable version:
sudo apt-get update -y
sudo apt-get upgrade -y
Install Java if necessary:
sudo apt-get install -y default-jdk
Verify it is correctly installed with:
java -version
Install Maven:
cd /opt/
sudo wget https://www-us.apache.org/dist/maven/maven-3/3.6.0/binaries/apache-maven-3.6.0-bin.tar.gz
sudo tar -xvzf apache-maven-3.6.0-bin.tar.gz
sudo mv apache-maven-3.6.0 maven
Set environment variables by adding the following lines in the /etc/profile.d/mavenenv.sh
file:
export JAVA_HOME=/usr/lib/jvm/default-java
export M2_HOME=/opt/maven
export PATH=${M2_HOME}/bin:${PATH}
Give the execution rights on the environment variable file:
sudo chmod +x /etc/profile.d/mavenenv.sh
Load the env file:
source /etc/profile.d/mavenenv.sh
Add this command at the end of your ~/.zshrc
file:
source /etc/profile.d/mavenenv.sh
Verify it works with:
mvn --version
Install CLAVIN API
Clone the CLAVIN REST API repo:
git clone https://github.com/Berico-Technologies/CLAVIN-rest
cd CLAVIN-rest
Edit the pom.xml
file and add the following lines inside the <properties>
tag.
Build the jar executable:
mvn clean install
or $ mvn package
Download Geonames:
curl -O http://download.geonames.org/export/dump/allCountries.zip
unzip allCountries.zip
Download CLAVIN yaml configuration file:
curl -O https://raw.githubusercontent.com/Berico-Technologies/CLAVIN-rest/master/clavin-rest.yml
Create a CLAVIN dictionary or index of geographical names (also called gazetteer):
java -Xmx4096m -jar ./target/clavin-rest-0.3.0-SNAPSHOT.jar index clavin-rest.yml
Run the REST server:
java -Xmx2048m -jar clavin-rest.jar server clavin-rest.yml
The API will be available at: http://localhost:9090/api/v0/geotag