Downloading Geonames: A Step-by-Step Guide to Retrieving Lake Geonames Records for Canada
When working with geospatial data, accessing large datasets can be a challenge. One such dataset is the Lake Geonames, which contains information about lakes worldwide. In this article, we will explore how to download the Lake Geonames dataset for Canada using the geonames
package in R.
Introduction
The geonames
package provides an interface to the GeoNames database, a comprehensive geospatial database that contains information about geographic features such as cities, countries, lakes, and more. The package allows users to search for specific records based on various criteria, including country, feature code, and latitude/longitude.
However, when using the geonames
package in R, there are limitations to how many rows can be downloaded per day due to API usage limits. In this article, we will explore ways to work around these limitations and retrieve the total number of lake Geonames records available for Canada.
Why Not Just Work with the CA Database Locally?
One possible solution is to download the entire Canadian database locally instead of relying on the geonames
package’s API. This approach can be more efficient, especially if you plan to work extensively with the data in the future.
To achieve this, we will use the httr
package to download the Canadian database from the GeoNames website and then read it into a data frame using read.csv()
.
Retrieving the CA Database
First, we need to download the CA database from the GeoNames website. We can do this using httr::GET()
, which sends an HTTP request to the specified URL and writes the response to a file on disk.
library(httr)
library(tidyverse)
# Get CA database
httr::GET(
url = "http://download.geonames.org/export/dump/CA.zip",
httr::write_disk("CA.zip"),
httr::progress()
) -> res
# unzip it
unzip("CA.zip")
Reading the CA Database into a Data Frame
Once we have downloaded the database, we can read it into a data frame using read.csv()
. We need to specify the file name, delimiter (in this case, tab), and column names.
# readr::read_tsv doesn't like this file at least when I read it,
# so we use regular read.csv()
read.csv(
file = "CA.txt",
header = FALSE,
sep = "\t",
col.names = c(
"geonameid", "name", "asciiname", "alternatenames", "latitude",
"longitude", "feature_class", "feature_code", "country", "cc2",
"admin1_code1", "admin2_code", "admin3_code", "admin4_code",
"population", "elevation", "dem", "timezone", "modification_date"
),
stringsAsFactors = FALSE
) %>% tbl_df() -> ca_geo
filter(ca_geo, feature_code == "LK")
Resulting Data Frame
The ca_geo
data frame now contains the lake Geonames records for Canada. We can view the first few rows of the data frame using head()
.
# View the first 10 rows of ca_geo
head(ca_geo)
This data frame includes various columns such as geonameid
, name
, latitude
, and longitude
.
Conclusion
In this article, we explored ways to download the Lake Geonames dataset for Canada using the geonames
package in R. We discovered that relying on the API’s daily row limit can be problematic, so instead, we opted to download the entire Canadian database locally and read it into a data frame.
By following these steps, you can work with large geospatial datasets efficiently and effectively.
Last modified on 2024-09-05