This week we will use the R language and statistical environment to analyze air quality data from Pocatello, Idaho, obtained directly from the Environmental Protection Agency (EPA) Air Quality System (AQS) database. Our aim is to determine if any patterns exist in air quality with respect to year and time of year. Before lab, read this handout and visit the websites listed in the handout.
The US government sets standards for six air pollutants called criteria pollutants (see box below). The standards for pollutants are based on requirements for the protection of human health, under the Clean Air Act. The Clean Air Act defines two types of ambient air quality standards.
Criteria Pollutants
| Pollutant | Health concern | 
|---|---|
| Sulfur Dioxide (SO2) | Respiratory tract problems, permanent harm to lung tissue. | 
| Carbon Monoxide (CO) | Ability of blood to carry oxygen impaired, cardiovascular, nervous, and pulmonary systems affected. | 
| Nitrogen Oxide (NOx) | Respiratory illness and lung damage. | 
| Ozone (O3) | Respiratory tract problems (difficult breathing, reduced lung function), asthma, eye irritation, nasal congestion, reduced resistance to infection, possible premature aging of lung tissue. | 
| Lead | Retardation and brain damage, especially in children. | 
| Particulates | Eye & throat irritation, bronchitis, lung damage, and impaired visibility. | 
In today’s lab we will focus on one class of criteria pollutants, particulate matter. The size of particle pollutants is directly linked to its potential for causing health problems. The EPA is mainly concerned about particles that are 10 micrometers = 10 microns in diameter or smaller because these particles can easily pass through the throat and nose and enter the lungs. Once inhaled, these particles can affect the heart and lungs and cause serious health problems. Particulate pollution can be made more severe by the presence of atmospheric inversion layers that prevent the dispersion of relatively cool, polluted air trapped near the ground. Inversions are particularly problematic for urban areas adjacent to mountains during the winter (Fig 2.1). The EPA groups particle pollution into two categories PM10 and PM2.5.
Figure 2.1: Inversion effects in Salt Lake City
Particles between 0 and 10 microns in diameter are called PM10 particles. These “inhalable coarse particles” often have high concentrations near roadways and dusty industries. Wood stoves may also cause high levels of PM10 pollution. Long-term exposure to high levels of PM10 can increase the likelihood of developing chronic respiratory conditions such as bronchitis. Current EPA standards require that daily PM10 concentrations of 150 μg/m3 not be exceeded more than once per year, on average, over 3 years.
In July 1997 the EPA established a new standard for air quality, based on particulate matter that is less than 2.5 microns in diameter (PM2.5). This new standard was created in response to health studies indicating that these smaller particles can cause serious health problems. PM2.5 particles can be directly emitted from sources such as forest fires, or they can form when gases emitted from power plants, industries and automobiles react in the air. Current EPA standards have primary, secondary, and primary and secondary specifications for PM2.5 particles:
The EPA monitors criteria pollutants, often in cooperation with state agencies like the Idaho Department of Environmental Quality (IDEQ). When the health-based standards are not met by a site, that site is considered to be in non-attainment. Given a non-attainment designation, the EPA requires that the state develop a State Implementation Plan (SIP), to show how air quality will be improved. The Pocatello-Chubbuck area was designated as non-attainment because of particularly high PM10 levels that were recorded in the early 1990’s.
The steering committee that developed Pocatello’s SIP identified smoke from wood stoves and fireplaces as one of the causes of high PM10 in the Pocatello-Chubbuck area. In an attempt to reduce PM10 associated with wood smoke, the Pocatello and Chubbuck city councils passed local air quality ordinances that restrict the materials that can be burned, limit the use of wood stoves and fireplaces, and require that new buildings have a heating source other than a wood stove or fireplace. These regulations prohibit burning when PM10 exceeds 120 μg/m3, and called for a voluntary ban when PM10 exceeds 80 μg/m3.
The IDEQ has been monitoring PM10 and PM2.5 particles, in the Pocatello, since 1990, at the intersection of Garrett and Gould Roads.
R is a popular open source programming language and computational environment. It can be downloaded for free here. All ISU computer labs are currently equipped with R.
Open R on your workstation. To start with, we can use R as a calculator. Type 2 + 2 and click Enter.
[1] 4The output term [1] means, “this is the first requested element.” In this case there is just one requested element, \(4\), the solution to \(2 + 2\). If the output elements cannot be held on a single console line, then R would begin the second line of output with the element number comprising the first element of the new line. For instance, the command rnorm(20) will take 20 pseudo-random samples from a standard normal distribution. We have:
 [1] -0.03106545 -0.53677292  1.66554225  0.32432316 -0.37972590  2.05901542
 [7]  0.23694828  0.18449073  0.54924468 -0.28540694 -0.20888245  0.39782478
[13] -1.17042364 -0.86490787  0.22263080 -0.13161842 -0.35473139  0.30858531
[19] -0.57922130  0.54571807Like Python, which we learned about earlier this semester, R is an Object Oriented Programming (OOP) language. Type:
The result, 4, can be accessed by typing the object name: obj1 (or whatever I named the object).
[1] 4Here I create a numerical array of the numbers, 1, 5, and 7, using the function c, which means “combine”.
I can use R in various ways. For instance, here I add obj1 to every element of obj2, and take the natural log of the sum.
[1] 1.609438 2.197225 2.397895Notes to ourselves, and others, can be made at the R command line using the # symbol.
[1] 1.609438 2.197225 2.397895Today we will primarily be using R to obtain and analyze Pocatello PM10 and PM2.5 data from the massive US EPA Air Quality System (AQS) database and API1. Details concerning the AQS can be found here.
Download the GG10daily and GG25daily binary datasets which contain PM10 and PM2.5 data from 12/31/1980 to 12/31/2025 from IDEQ air sensors installed at the corner of Garret and Gould streets in Pocatello. The data were acquired, via R, from the EPA-AQS database. Set the R working directory directory to be the location of the downloaded files. Use forward slashes instead of backslashes to designate nested directories and files.
Your job is to produce a summary report on trends in air quality in Pocatello, Idaho. Assume that this report will be used to determine the eligibility of the City of Pocatello for Federal funds. If air quality standards are not met, the availability of Federal funds for things such as highway projects will be reduced. Run all code in Section 4 to obtain data necessary to answer the questions below.
nrow(GG10daily) and nrow(GG25daily), respectively.min(GG10daily$date) and max(GG10daily$date), and min(GG25daily$date) and max(GG25daily$date), respectively.xlab and ylab. Paste the figures into your report.year10 <- format(GG10daily$date, "%Y") # PM 10 years from data
year25 <- format(GG25daily$date, "%Y") # PM 2.5 years from data
# yearly averages of daily averages
yearly.avg10 <- tapply(GG10daily$avg, year10, mean) # annual mean for PM 10 
yearly.avg25 <- tapply(GG25daily$avg, year25, mean) # annual mean for PM 2.5 
#--------------------------- plot for PM 10 ------------------------------#
plot(unique(year10), yearly.avg10, type = "l", xlab = "", ylab = "")#--------------------------- plot for PM 2.5 -----------------------------#
plot(unique(year25), yearly.avg25, type = "l", xlab = "", ylab = "")gtr.than.150, yearly.count and moving_avg contain in the code below? To actually see the contents of these object you can type their names in the console and hit Enter.xlab and ylab for the plot resulting from step 4 below. Paste the figure into your report.#---------------------- Step 1 (ID days with high PM 10 conc.) ----------------------------#
gtr.than.150 <- ifelse(GG10daily$avg > 150, 1, 0) # ID days with > 150 ug/mm^3?
#----------------- Step 2 (Get annual counts of high PM 10 days) --------------------------#
yearly.count <- tapply(gtr.than.150, year10, sum) # count days with > 150 ug/mm^3, by year
#-------------- Step 3 (Get 3 yr moving averages, requires zoo package) -------------------#
install.packages("zoo")
library(zoo)
# k = 3, align = "right": years 1,2,3 used to calculate moving avg for yr 3, and so on...  
moving_avg <- rollmean(yearly.count, k = 3, fill = NA, align = "right")
#------------------------------ Step 4 (create plot) --------------------------------------#
plot(x = 1990:2025, y = moving_avg, type = "l", xlab = "", ylab = "")
abline(h = 1, lty = 2, col = 2)yearly.avg25 form Q 2. The object contains annual average PM2.5 concentrations. In the code below a three year moving average of concentrations is taken and a plot is generated.
xlab and ylab. Paste the figure into your report.moving_avg <- rollmean(yearly.avg25, k = 3, fill = NA, align = "right") 
plot(1998:2006, moving_avg, type = "l", xlab = "", ylab = "", ylim = c(6,15))
abline(h = 9, lty = 2, col = 2)
abline(h = 15, lty = 2, col = 1)
legend("topright", lty = 2, legend = c("Primary", "Secondary"), col = c(2,1))##############################################################################
#------------------------- time series analysis -----------------------------#
##################################$###########################################
#-------------------------- PM 10 -------------------------#
month10 <- format(GG10daily$date, "%m") # PM 10 months from data
month_yr10 <- paste(year10, month10, sep = "-") # PM 10 year AND months
monthavg10 <- tapply(GG10daily$avg, month_yr10, mean) # PM 10 averages for year AND month combinations
pmts10 <- ts(monthavg10, start = c(1990, 11),frequency = 12) 
plot(stl(pmts10, s.window = "periodic")) mos <- c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")
tapply(GG10daily$avg, factor(month10, labels = mos), mean) # PM 10 monthly summary     Jan      Feb      Mar      Apr      May      Jun      Jul      Aug 
16.57457 17.54545 18.27233 18.50579 20.32230 22.57314 27.79355 34.09264 
     Sep      Oct      Nov      Dec 
26.58451 20.47404 16.35895 16.26393 #-------------------------- PM 2.5 -------------------------#
month25 <- format(GG25daily$date, "%m") # PM 2.5 months from data
month_yr25 <- paste(year25, month25, sep = "-") # PM 2.5 year AND months
monthavg25 <- tapply(GG25daily$avg, month_yr25, mean) # PM 2.5 averages for year AND month combinations
pmts25 <- ts(monthavg25, start = c(1998, 11),frequency = 12) 
plot(stl(pmts25, s.window = "periodic"))       Jan       Feb       Mar       Apr       May       Jun       Jul       Aug 
15.865625 15.123333  7.419658  5.107692  5.047967  5.442149  6.800000  8.262766 
      Sep       Oct       Nov       Dec 
 6.519540  5.656044  8.400000 17.557282 API stands for Application Programming Interface. An API provides glue code that allows one software framework (e.g., R) to work directly with a foreign software system.↩︎
The stl function provides a seasonal decomposition of a time series using a smoother algorithm. The function produces four related plots with names: data, seasonal, trend, and remainder. The seasonal component is found by direct application of a loess smoothing algorithm to monthly means. To find the trend, seasonal values are differenced, and those differences are smoothed. The remainder component is comprised of the residuals from the seasonal plus trend fit. Gray bars to the right of plots allow comparison of changes from the four perspectives. The height of the box in plot 1, compared to height of the box in plot 2, describes a comparable unit change in plot 1, compared to plot 2.↩︎