Resources for learning R
I use R for statistical and visual analysis of data in behavior and physiology. Here is a collection of materials for those starting to work with R. If you have questions or problems you're interested in collaborating on, please get in touch (contact information available in my CV on the main page). Information is organized into recommended material, a collection of sample scripts, and materials from workshops I have organized (presentations, handouts, and scripts).
Workshops
R and Programming Skills Development (May 4-5, 2018), School of Life Sciences, Arizona State University
Open Source for Open Science 2017 (September 1-3, 2017), Ecology and Evolutionary Biology Program, Texas A&M University
Materials are hosted on the Texas A&M University EEB website
Open Source for Open Science: Evolution (November 7, 2014), Ecology and Evolutionary Biology Program, Texas A&M University
A co-organized one-day workshop to introduce members to R and other open-source tools for evolution-themed data analysis and visualization. Workshop schedule and materials.
Open Source for Open Science: An EEB Workshop (July 2014), Ecology and Evolutionary Biology Program, Texas A&M University
A 2.5-day workshop to introduce members to R and other open-source tools for data analysis and visualization. Workshop outline and materials.
Using R for Behavioral Analyses (April 2-5, 2012), Arizona State University
Behavioral data can quickly become voluminous and impossible to manage in spreadsheet programs. Scripting programs are an excellent way to grapple behavioral data into shapes that can be analyzed and summarized.
- Data Overview (pdf) - General introduction to types of data, how to analyze them, and how to format data for reading files into R. Modified from a presentation put together by Melanie Frazier.
- ClarkExampleData.csv - Sample data for a basic scripting exercise.
- ClarkBasicIntroductoryScript.R
- The scripts below are for the slightly more complicated work of preparing data for behavioral analysis. Generally, the first step will be to convert "raw" scan data into frequencies of behaviors. After that, behaviors that are part of the same task should be added together to calculate frequencies of tasks. Then you can start to ask and address questions about those tasks.
- ClarkScansToFreqs.R
- ClarkColonyInfo.csv
- ClarkObservations.csv
- CalculatingTaskFrequencies.R - This is a pretty hairy file, and it's also out-of-date because as it turns out we want to address some different questions. However, there are a lot of useful functions in here, and it's good practice for interpreting and editing scripts. You'll need the .csv that's generated by the script "ClarkScansToFreqs.R". I have also added a file (below) that tells you what the cryptic column titles mean.
- BehaviorCodes.csv
- ClarkDOLData.csv
- ClarkDOLExample.R
Recommended R Books and Websites
Many of these books may be at your university's library for browsing before you decide to fork over money. In the long run it’s often best to buy personal copies. Note that many books cover the same or similar subjects, especially basic introductions and statistics. It may be worthwhile to browse, but if you find yourself wasting too much time, just pick one up and run with it. Highly recommended books are *starred*.
Data Manipulation and Statistical Analysis
Read this section as a menu and pick relevant materials based on your interests and expertise.
- *Book: The R Book, by Michael Crawley. Amazon link. Fantastic all-around starting point and reference. From getting your data into R, to basic data manipulation, to basic graphics and major statistical procedures.
- *Book: Data Manipulation With R, by Phil Spector. Amazon link. Invaluable reference no matter what you do. Walks through all the basic R data classes and functions needed to manipulate them.
- *Book: A Handbook of Statistical Analyses Using R, by Brian Everitt and Torsten Hothorn. Amazon link. A clear, thorough, up-to-date introduction with plenty of examples. Most useful if you know what kind of statistic you want to use; less useful if you're still learning statistics.
- Book: Discovering Statistics Using R, by Andy Field, Jeremy Miles, and Zoe Field. Book companion website. I have not used this, but a friend recommends it and it looks comprehensive for a new graduate student wanting to learn statistics and R at the same time.
- Book: Mixed-Effects Models in S and S-Plus, by Jose Pinheiro and Douglas Bates. Amazon Link. Mixed-effects models are useful for repeated-measures designs where measurements are repeated on the same individuals or units. Only acquire this book if the subject is relevant to you.
- Tom Short's R Refcard (pdf) - This is a comprehensive starting RefCard that's well organized, from way back in 2004. The original site no longer works, but this copy comes from the CRAN.
- Quick-R - Set up for people who know statistics, and want to know how to do stats with R. Has a great reference menu for the syntax needed for different types of statistical analyses. Also see their book suggestion page. The site has also been transformed into a Book (link on their website).
- Exploratory Data Analysis / Statistical Modeling - Miscellaneous how-to guides for various topics, including graphing. I haven't used this much, so I cannot fully attest to its quality.
Visualization
ggplot2. If you are new, start at the top of this list and work your way through.
- R: ggplot2 Intro - A walk-through introduction to ggplot2. I base some of my teaching material off of this tutorial.
- Or, try this one. Again, just pick one or the other and run through it. It will take time, but the investment will pay off in the long run.
- A publication that also provides a more thorough, up-to-date walkthrough that will illustrate ggplot2's full capabilities: Ito, K., & D. Murphy. (2013) Application of ggplot2 to pharmacometric graphics. CPT: Pharmacometrics & Systems Pharmacology 2, e79. Doi:10.1038/psp.2013.56
- The online documentation for ggplot2 is invaluable once you have practice with the ggplot2 syntax.
- The ggplot2 Wiki: Useful tricks, links to more publications, incredible case study examples, an FAQ section, constantly being edited and updated.
- *Book: ggplot2: Elegant Graphics for Data Analysis, by Hadley Wickham. Amazon link. Companion Website - the website is most useful if you've read the book and just need to refer back to specifics or examples. Note that this book is now out-of-date due to some major revisions to the ggplot2 package in 2012. But it’s what we have to work with.
- Book: Winston Chang's Cookbook for R website (book link listed on site). The chapter on graphs employs ggplot2, and the book goes into greater depth on topics covered on the website.
More generalized graphics resources
- *Book: R Graphics, Second Edition, by Paul Murrell. Amazon link. A thorough treatment on using Lattice and Base graphics, an alternative to ggplot2. The type of data you wish to present will often determine which graphics package you use.
- Book: Lattice: Multivariate Data Visualization with R, by Deepayan Sarkar, author of the lattice package. Book link (Springer). Link to companion website (figures and code)
- Learn R - The blog of someone trying to learn to use R to create different types of graphs. Contains a fantastic collection of figures originally generated in lattice, then reproduced in ggplot2, along with code used to generate the figures. Great if you're trying to figure out how to create a particularly complex sort of visualization and want to determine your options.
Programming in General
- *Book: Practical Computing for Biologists, by Steven H.D. Haddock and Casey W. Dunn. Amazon link. An excellent introductory book that will help you spend less time clicking through files and more time doing the fun parts of Biology.
- Book: R for Everyone: Advanced analytics and Graphics (Addison-Wesley Data & Analytics Series), by Jared Lander. Amazon link. Useful for those who have had some basic exposure to computer languages.
- Online guide to programming in R.
- A MATLAB-to-R translation guide (pdf), for those already familiar with MATLAB.
Important Reading on Data Management - Creating Metadata
Fegraus, E.H., Andelman, S., Jones, M.B., and Schildhauer, M. (2005) Maximizing the value of ecological data with structured metadata: An introduction to Ecological Metadata Language (EML) and principles for metadata creation. Bulletin of the Ecological Society of America 86, 158-168.
Borer, E.T., Seabloom, E.W., Jones, M.B., and Schildhauer, M. (2009) Some simple guidelines for effective data management. Bulletin of the Ecological Society of America 90, 205-214.
Misc. sample scripts for data analysis and visualization
- Sample Response Surface plot and analysis.R. When dietary experiments involve manipulation of two or more nutrients, the consequences should be examined in multidimensional "nutrient space." The first step is statistical analysis, completed here using the R package rsm, which - importantly! - restructures the axes of the predictor variables (e.g. amount of protein and carbohydrate) and then allows tests of linear, quadratic, and nutrient interaction effects on a dependent variable chosen by the investigator. The second step is a visual representation of the data. This relies on a non-parametric approach from the package fields, which has a function to fit thin-plate splines to the data and generate a nice image. I am currently working on developing equally beautiful figures that are "faceted" for further comparisons with a standardized the Z-axis scale (e.g. if one wanted to compare males vs. females); the current state of this project can be found here on stackoverflow.com
- Basic Function Example. A powerful use for R is the ability to perform recursive calculations without altering the original data. This simple example shows how to write a function to repeatedly perform a calculation and output the results.
- Cricket Dietary Analysis Files. These are in-depth scripts used to generate plots and conduct statistical analyses of cricket feeding behavior. Note that some of the ggplot2 syntax is out-of-date, but the changes should not cause the code to break completely.