How to update and backup a MySQL database under version control and all within Rstudio

I am trying to have better workflows to ensure data quality and two important things for me are first, scripting as much as posible the data manipulation process, and second, backing up the database we use under version control (e.g. Git*). I succeeded on that, but it was a 11 TAB problem**, so I though would be good post it here for others, and for my future self.

One goal of the task is to be able to do everything within the same program (Rstudio) for simplicity, so the task implies connecting R to MySQL server via RMySQL, update the database, make a backup copy from R and commit changes.

1) Running mysqldump from R to make an initial backup: RMySQL can’t (that I know) run the most commonly used backup mysqldump. But you can do it by calling the shell (aka Terminal) from R with the function system{base}.

#first we can see which is our path on the shell/terminal
system("pwd") #it should be the working directory.
#Second, we make a mysqldump (has to be in one non broken line,
 #sorry beautiful 80 characters-max code lines)
system("/Applications/MAMP/Library/bin/mysqldump -u USER -h HOSP_IP -pPASWORD --port=8889 --skip-extended-insert DATABASE > DATABASE.sql")
#note that my mysqldump is located in applications folder because 
 #instaled MySQL using MAMP (highly recommended for an easy set up 
 #of a local MySQL environment in a Mac). 
 #Yours may be in /usr/local/mysq/bin/mysqldump
#The --skip-extended-insert makes a INSERT for each row, 
 #which makes version control lighter and more readable later on.

2) Put the backup under version control. Rstudio provide a nice hook to git, so you just need to ‘git init‘ the directory, ‘git add DATABASE.sql‘  and ‘git commit‘ the changes with a nice commit message.

3) Connecting R to MYSQL server to update it: There are good tutorials on this, so I’ll be brief. In this example I manipulated data within R (your.data.frame) and want to append the resulting dataframe to an existing table.

library(RMySQL)
source(psw.R) # this is a way to avoid committing your password. 
 #You can have an r file (added to .gitignore) with the line 
 #psw <- "mypassword", which you can call later from the code.
conn <- dbConnect(dbDriver("MySQL"), user = "USER", 
                  password = psw, dbname="DATABASE", 
                  port = 8889 , host= "HOST_IP") 
#this set up the connection.
dbListTables(conn) # show tables
dbWriteTable(conn, "TABLE", your.data.frame, append = TRUE, 
             overwrite=FALSE, row.name=FALSE) 
#adds the new rows at the bottom of the table.
dbDisconnect(conn) #always disconnect at the end.

4) Make another backup:

system("/Applications/MAMP/Library/bin/mysqldump -u USER -h HOSP_IP -pPASWORD --port=8889 --skip-extended-insert DATABASE > DATABASE.sql")

5) And finally you can make another commit reflecting the changes ‘git add DATABASE.sql‘  and ‘git commit’ with a nice commit message. You can compare the two versions now, and let anyone in the project play with it by pushing it to a central repo (e.g. Github).

I am sure there are other ways to do this, but I am pretty happy of how it works!

—-
*Git forces you to not only document changes, but explain why those changes are done in a nice story. So I prefer it to Audit Trail options in MySQL.
**Someone on the internet suggested to rate the difficulty of a problem in  the number of browser tabs you need to open in order to fix it. Sorry I forgot the source.

Advertisements

Is Uppsala city a Bumblebee hotspot?

Sorry, no data to backup my thoughts today… but I feel that the number of bumblebees I saw in the last two years doubles the previous 30 years of my life. Uppsala is a smallish city in Sweden. Has lots of flowers (along all the season), cool summers (from my spanish perspective I may even say cold) and patches of semi-natural grasslands and forests mixed within the city. And lots of bumblenees. Mostly B. terrestris, and B. lapidarius, but I also see frequently B. pascuorum and B. hypnorum, and is not rare to see B. hortorum*.

I don’t make an effort to look for them, but I saw them waiting for the train to pass (yes, railroads have a lot of flowers) or when I am playing with the kids. Walking in spring in the forest patches within the city implies watching your feet to avoid steeping into queens. I even spotted two different bumblebee nests (B. terrestris and B. lapidarius) using man-made structures in playgrounds. Common! I’ve not seen any another natural bumblebee nest in the rest of my life, despite looking for them in several occasions.

I was talking yesterday with a colleague that is surveying bumblebee in grasslands and forests around the city and he is frustrated because he sees very few bumblebees**. So it’s not only Sweden which has lots of bumblebees, but particularly small “green” cities. Maybe cities are really good habitat after all, at least for a few species (See also NYC bees). The same pattern can be seen in bird species, where a few species thrive in cities, so probably I am not saying anything new.


*Id’s on the fly
**4 individuals /8 hours in an area with some flowers in a nice sunny day! that’s the worst day, usually he gets ~20 though