Celebrating that I am contributing to the R-bloggers.com blog aggregator I am going to post a very simple function to check which species (both plants and animals) are considered “invaders” somewhere in the world. Basically the function asks that to the Global Invasive Species Database (GISD).
I coded this because a friend of mine aks me precisely that question [Yes, friends assumes you should know this kind of stuff (and also why the plants of their balcony are dying) off the top of your head just because you are a biologist]. However, I do not know much things and I am too lazy to check all 250 species one by one on the GISD webpage. Also is a good R practice, and I am ok investing some work time on personal projects. Google (and other big companies) encourage it’s employees to spend 20% of the time working on projects that aren’t necessarily in their job descriptions in order to bust its innovation power, so that should be even more important in science!
Hope it can be useful to more people, I uploaded the code as a Gist:
UPDATE: The function is now available on taxize R package developed by the rOpenScience people!
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
is.invasive() | |
##Description## | |
#This function check which species (both plants and animals) are considered "invaders" somewhere in the | |
# world. For that end, it checks GISD (http://www.issg.org/database/welcome/) and returns a value, either | |
#"Not invasive" or the brief description presented in GISD. Note that the webpage contains more | |
#information. Also note that the function won't tell you if it's exotic in your area, a lot of exotic | |
#species are not considered invaders (yet). As expected, the function is as good as the database is, which | |
#I find quite reliable and well maintained. The database is also able to recognize a lot (but not all) of | |
#the species synonyms. This function worked for me, but I didn't test it intensively, and any changes on | |
#the webpage html design will return wrong values. Apply the usual disclaimers when using it. | |
#The function is slow (not optimized at all), so be patient with long lists of species. | |
#Author Ignasi Bartomeus (nacho.bartomeus#gmail.com). Last updated 23 Nov 2012. | |
#Usage: | |
is.invasive(sp, simplified.df = FALSE) | |
#Arguments: | |
#sp: a vector of species names in latin (Genus species) | |
#simplified.df: Returns a data.frame with the species name and the values "Invasive", "Not Invasive". I | |
#recomend to check first the not simplified version (default), which contains raw information about the | |
#level of invasiveness. | |
#The function: | |
is.invasive <- function(sp, simplified.df = FALSE){ | |
require(plyr) | |
require(XML) | |
require(RCurl) | |
#reformat sp list | |
species <- gsub(" ","+",sp) | |
#create urls to parse | |
urls <- paste("http://www.issg.org/database/species/search.asp?sts=sss&st=sss&fr=1&x=13&y=9&sn=", | |
species,"&rn=&hci=-1&ei=-1&lang=EN", sep = "") | |
#create a data.frame to store the Output | |
Out <- data.frame(species = sp, status = c(1:length(urls))) | |
#loop through all species | |
for(i in 1:length(urls)){ | |
#Parse url and extract table | |
doc <- htmlTreeParse(urls[i], useInternalNodes = TRUE) | |
tables <- getNodeSet(doc, "//table") | |
t <- readHTMLTable(tables[[4]]) | |
tt <- as.matrix(t) | |
if(length(grep("No invasive species currently recorded",tt, value = TRUE)) > 0){ | |
Out[i,2] <- "Not invasive" | |
} | |
else{ | |
if(simplified.df == FALSE){Out[i,2] <- tt[12,1]} | |
else{Out[i,2] <- "Invasive"} | |
} | |
print(paste("Checking species", i+1)) | |
} | |
print("Done") | |
Out | |
} | |
#Example: | |
sp <- c("Carpobrotus edulis", "Rosmarinus officinalis") | |
## first species is invasive, second one is not. | |
d <- is.invasive(sp) | |
d | |
d <- is.invasive(sp, simplified.df = TRUE) | |
d |
Thank you for the post :)
Can you please open the feed to show full post?
(settings -> reading -> full instead of summary)
Please update me by e-mail so I can refresh your post on r-bloggers.
Thanks,
Tal
Nice function! I’m still learning R, so maybe I could figure out how to modify this eventually, but it would be handy if it could also return the “Geographical Range” section from the GISD. Also, what happens with misspellings? You code currently returns “Not invasive” but maybe what it should say is “Not listed in database.” That way someone might have a hint that it could be invasive still, but either misspelled or not big enough to have made it into the database.
Thanks again!
Thanks,
You are right about misspellings, but I expect no such errors from scientists (partially kidding). That’s easy to fix, just change the line [Out[i,2] <- "Not invasive"] for a more clear text. I may change that in the Gist.
Regrding other database atributes, that will take some more coding and I see two issues. One is that information is stored in long text fields, which are not as useful as categorical variables for any analysis. The second one is that if you need all this info is probably more efficient to ask the ISSG people for access to the database, than scrap it from the web.
Anyway, if you need "Geographical Range" for a lot of species for a particular project, let me know and we can work on it!
Pingback: is_invasive( ) got a new best friend: is_native( ) | Bartomeus lab