Pages

Showing posts with label class. Show all posts
Showing posts with label class. Show all posts

Saturday, October 26, 2013

Food Justice: Revisiting HEALTH

HEALTH is an organizational paradigm for Food Justice
Introduction: 
HEALTH has now been up and running for 5 years. Yay!
 

Okay, now that I've gotten the obligatory anniversary announcement out of the way, I want to draw attention back to a topic deserving of its own post:

What is "health"? What work does the acronym HEALTH perform?

In this post, I will elaborate a little bit on how I understand HEALTH after many additional years of life experience as an educator and activist, and why this understanding is preferable to the accepted definition and practice of "health." First, I will discuss the evolution of HEALTH from an organization to a blog to an experimental paradigm for coalition building. Second, I will juxtapose the self-centered normativity of "health" to the socialist politics of HEALTH. Third, I will break down HEALTH into several prerequisites and organizing points. I will conclude with acknowledging the difficulties of navigating this comprehensive vision of HEALTH and invite y'all to chime in with comments as to whether advocating HEALTH is as useful and un-problematic as I suggest.



1. The Evolution of a Vision (2005-2008)
Way back in 2005 I founded an organization on my college campus dedicated to addressing the intersections of oppressions. The club existed, on the one hand, to operate as an independent project for a course on Sustainable Buildings, and, on the other hand, to provide a much needed outlet for animal advocacy on campus. According to the original constitution submitted on April 5, 2005:
H.E.A.L.T.H. is dedicated to ecological sustainability and conservation, the adoption of compassionate and ecologically responsible lifestyles, and global awareness through activism and education. The club will work to develop an environmental taskforce for Beloit College, create and enforce environmentally sound policies, and educate the campus and community about ways to live more harmoniously with the Earth, nonhuman animals, and humans in developing countries. H.E.A.L.T.H. will be involved with nonviolent, grassroots environmental and animal activism 
HEALTH was founded upon ecofeminist philosophy, which I had begun studying independently a year before. Ecofeminism, in a nutshell, is a body of work that purports that the domination of nature (at least in the Western tradition) are entangled with the domination of women (as well as poc, working class, queers, and animals) historically, materially, conceptually, and mythologically. Ecofeminists valuably demonstrate, like other radical theories, that the oppression of humans and nonhuman beings mutually reinforce one another, and that liberation is only possible when all are free of injustices. HEALTH was conceived of this intersectional analysis.

Originally designed to address the unhealthy relationships between humans, animals, and the Earth, HEALTH would take on new meaning as an acronym during research for my interdisciplinary capstone project when I discovered the work of agrarian writer Wendell Berry and ecofeminist Chris Cuomo.

Wendell Berry's essays exemplified what thinkers like Fritjof Capra and David Orr called systems thinking. Systems thinking took into account the process, relationship, dynamism, wholeness, and complexity of "problems" (in contrast to mechanistic thinking which addressed problems by dissecting them into static, discreet parts with simple, predictable, linear cause and effect relationships. The problem with mechanistic thinking (in modern, industrial science, economics, politics, and technology) is that it often creates new problems and so it doesn't "solve for pattern."

In "Health is Membership," Berry wishes we return to the etymological root of  "health" as the whole-ness of belonging:

The word "health," in fact, comes from the same Indo-European root as ‘heal,’ ‘whole,’ and ‘holy.’ To be healthy is literally to be whole; to heal to make whole... our sense of wholeness is not just a sense of completeness in ourselves but also in a sense of belonging to others and to our place; it is an unconscious awareness of community, of having in common. (144)
[The contemporary] view of health that is severely reductive. It is, to begin with, almost frantically individualistic... One may presumably be healthy in a disintegrated family or community or in a destroyed or poisoned ecosystem.” (146)
In another essay, "Solving for Pattern," Berry discusses more concretely the destructive logic of providing health care for one group of a system at the expense of others who belong to that community in agriculture:
Our dilemma in agriculture now is that the industrial methods that have so spectacularly solved some of the problem of food production have been accompanied by ‘side effects’... the irony of agricultural models that destroy, first, the health of the soil and, finally, the health of human communities. (267)
The real problem of food production occurs within a complex, mutually influential relationship of soil, plant, animals, and people. A real solution to that problem will therefore be ecologically, agriculturally, and culturally healthful... [I]t is impossible to sacrifice the health of the soil to improve the health of the plants, or to sacrifice the health of plants to improve the health of animals, or to sacrifice the health of animals to improve the health of people. (269, 274)
Chris Cuomo provided more depth to Berry's arguments, in part by coming out of an ecofeminist tradition critical of the pastoral romanticization of the heteronormative family and settler colonialism. Cuomo offered an alternative route to addressing ecological ethics that wasn't based in mechanistic utilitarian, individualistic deontological, and apolitical care ethics. Cuomo proposed an eudaimonian ethic, based on the ancient Greek concept of flourishing, but applied to community as a social and ecological construct.
Humans cannot flourish without other humans, ecosystems, and species, and nothing in a biotic community can flourish on its own. Likewise, communities (both social and ecological) depend on the existence of other communities. Ethical objects therefore flourish as both social and ecological entities. To be extracted from community, human or otherwise, is to lack relationships and contexts that provide the meaning, substance and material for various sorts of lives.[*]
My ambition to build a coalition between clubs on campus and develop a sustainability taskforce, however, did not materialize. Several years organizing campus events and actions brought me to appreciation of how difficult it was to put this holistic perspective into practice. Such a comprehensive message and focus was naturally complex to deliver and we HEALTH spread itself thin attempting to address issues such as animal liberation and indigenous sovereignty (which I had come to appreciate after studying in Australia). Under the lack of general interest in and availability for advocacy on campus, HEALTH could not sustain itself after I graduated.


2. The Evolution of a Vision (2008-2013)
 

South Central Farm (1994 - 2006) was the largest urban farm and CSA in the USA.
When I returned home from a summer working as an educator at an animal sanctuary, I was inspired to keep my holistic vision and advocacy alive by creating a blog. Having learned from the past of how difficult it was to manage an organization that had potentially infinite possibilities, I narrowed the focus of HEALTH to a food justice blog that would encompass not only food sovereignty (which I learned the importance of through a sustainability project in my community), but also ecological sustainability, and animal liberation. The devotion of HEALTH to food justice seemed a natural fit since food is a site at which so many discourses of health (e.g., bodily, animal, ecological, communal, national) collide.

The original mission statement for HEALTH was posted on September 8, 2008:

HEALTH advocates ecological and social justice through campaigns in which the intersection of multiple oppressions in the production, distribution, and consumption of “food" can be addressed simultaneously... Health in its fullest sense cannot be achieved alone.
Over the next year, I would compile an array of resources, spanning form introductory web sites, documentary videos, peer-reviewed articles, academic journals, non-profit organizations, blogs, and books covering animal, agricultural, ecological, and social justice. Although I attempted to avoid doing so, the blog has admittedly leaned harder on the animal justice side of things. In the first two years, however, I did address matters of gender, race, class, and sexuality injustices in food production, consumption, and distribution.

One post I'm particularly fond of is "Skinny Bitch and Bulimic Vegetarians" published in April of 2009. Of all my posts, this one most directly addressed the limits of advocating personal "health" (or at least the superficial performance of health). After diagnosing the fat-shaming elements of vegan outreach (particularly the aesthetic appeal of Skinny Bitch and the PETA's campaign media), I shared my perspective on "health":

HEALTH cannot be achieved by individuals alone; true health is the consequence of an entire community flourishing mutually together. Modern reductionist approaches to health define "health" as something that can be achieved independent of Others and often at the expense of them (e.g., (over)fishing to consume more fish oil, enslaving people to pick tomatoes, wiping out wildlife to grow organic leafy greens, "curing" diseases by giving them first to millions of "animals"). Within this outlook, veg*n outreach that promotes veg*nism as good for "one's health" is playing into the liberal, antagonistic discourse of self-interest.
Since HEALTH must be achieved together it ought not, as much as possible, come at the expense of the health of Others. In this sense, appropriating mainstream means of advertising (i.e. using the promise of becoming a conventionally sexy and beautiful women) so as to exploit common insecurities over body-image (o)pressed into the minds of young women is not healthy. Exploiting, and thus perpetuating, oppression as a means to a "good" end can never be healthy, even if it promotes "health," because it ultimately subordinates the health of Others.

Read more »

Friday, October 4, 2013

Re-Assessing Animal Rights: Resources


I've been thinking about the state of the animal defense movement* quite a lot after attending four conferences on organizing this summer. Perhaps for the better, the Animal Rights 2013 conference was not one of them. The conferences I attended were either organized by and for grassroots activist or were nearly silent on the status of animal others. Never have I learned so much and been inspired more. There I was exposed to alternative interpretations of the history and politics of the US and the modern world, and there I realized how white and superficial the analyses and strategies of mainstream animal activists often are.

This post is dedicated to providing resources to those open to re-assessing the history, politics, organization, tactics, theories, and language of the animal defense movement. I intend to write more about the presentations and drama I witnessed at these conferences, but for now I want to share some essays and presentations that have really challenged and inspired me to re-think my assumptions and history of abstract theorizing that is valued in academic settings, especially in philosophy.


Re-Assessing Animal Defense
 

The History and Politics of the Animal Defense Movement
With the rise of the vegan movements, the politics of animal defense have become increasing personal that many activists have forgotten that vegan-consumption is just one strategy, and not even the most important. On the other hand, large nonprofits have taken to reforms that do not challenge the source of animal oppression: their status as commodities. Yet still, animal defense is often interpreted from the perspective of those who have made careers at nonprofits and universities--what of the history of the grassroots?




The Limits of Vegetarian Outreach
Vegetarian outreach has been a staple of the nonprofit animal defense movement since the 1990's when activists realized that over 95% of animals were killed and exploited by agribusiness. While there is much debate over how to best "sell" vegetarianism, critiques of the sufficiency of veganism as a "baseline" has been less frequent. Is vegan education our most effective tactic? Is "veganism" sufficient for animal liberation?

The Problem with Analogies to Human Oppression
Some animal activists draw logical analogies between the institutional violence against nonhuman animals and oppressed humans. The presumption is that the public will have a logical breakthrough that violence against nonhuman animals is unjust like violence against oppressed humans. Have the articulation and performances of these analogies bore the breakthroughs as activists hoped, or only further alienated them from their cause?
Critiques of Non-Profit Campaigns & Conferences
The hegemony of corporate non-profits have "hijacked" the strategy, language, and tactics of animal liberation. Non-profits generate funds and publicity for animals, however, they have also been notoriously conservative on matters of class, race, and gender in their organizations and campaigns. Their collusion with State power, capital, and white supremacy has built a large funding base, but are they building a movement upon the marginalization and oppression of humans?





The Intersections of Human and White Privilege in the ADM
The animal defense movement has continued to be the whitest social justice movements in the US for decades, despite that people of color are no less compassionate and no less likely to be vegetarian. We've already looked at colonial campaigns, analogies that alienate, under-representation in leadership, and complicity in racist law enforcement. What analytic tools, strategies, and language can whites adopt and support to build coalitions across racialized experiences?




How to Disrupt Oppression
Once equipped with more sophisticated theory and more supportive of people of color and queer leadership and projects, animal activists are on their way to building a movement that reaches beyond the single-issue identity politics of "animal rights." This is, of course, easier than it sounds. Because nearly all of us in the US have been colonised by white supremacist capitalist heteropatriarchy, it will take some effort on our behalf to challenge its "common sense" built into our brains-and-flesh. How can we resist these old habits?

Critiques of Ally, Intersectionality, and Privilege
Over the last ten years as the internet has made it easier to "call-out" animal activists for their complicity with racism and other oppressive systems, some mainstream organizations and many white activists have adopted the language of anti-oppression. Have white activists' identification as allies, acknowledgement of their privileges, and references to "intersectionality" transformed their activism or obscured privilege and power?

Are there any essays, talks, and books that have changed your advocacy for animals? Please share in the comments. I may add them to the list!

Read more »

Loops revisited: How to rethink macros when using R


Sunday, April 7, 2013

Mastering Matrices

R has many ways to store information.  Most of the time, our data comes in the form of a dataset, which we bring into R as a data.frame object. However, there are times when we want to use matrices as well. This post will show you how matrices can be useful and how to manipulate them easily.

First of all, the big difference between matrices and dataframes is that all of the rows and columns of a matrix must have the same class (numeric, character, etc).  In a dataframe, you can have some of each. See my initial post about objects, here.

You can convert from one to the other using as.data.frame() or as.matrix().  Be careful though, that if you convert a dataframe with different classes of columns, then your matrix will just be all character:


In order to have a numeric matrix, I'm going to just take the first 6 columns of the mydata dataframe. I can delete columns of a matrix or dataframe in two ways:
mydata.mat<-as.matrix(mydata[,1:6])
mydata.mat<-as.matrix(mydata[,-7])
These two lines are doing the exact same thing. In the first, I am subsetting the dataframe mydata by taking all rows and the first 6 columns of the dataframe, then I'm converting that subset to a matrix. In the second, I'm taking all rows and all columns except the 7th column. Note that if I wanted to drop even more columns, I would just use the c() function like this:
mydata.mat<-as.matrix(mydata[,c(-3,-7)])
Note now that since I have taken out the one character column in my dataframe before I convert it to a matrix, I will get a numeric matrix instead of a character matrix:


This kind of operation for deleting columns works the same way in both matrices and dataframes. However, to add a column to a dataframe or matrix is different. In a dataframe, you can use the $ operator to identify columns, like mydata$Married is the vector corresponding to the Married column. However, you can't use the $ operator on matrices. You will get the following error that the "$ operator is invalid for atomic vectors", which I see all the time when I'm converting back and forth from dataframes to matrices and make a mistake:


All this message means is that the object you're using is a matrix (mydata.mat) and you can't use the $ operator on a matrix.  If you get this message, you can either use as.data.frame() to convert your matrix to a dataframe, or you can adjust what you are doing to accomodate the rules of matrices.  For adding columns to a matrix, you use cbind(), and likewise for rows, rbind().

So let's say I want to add an age squared column. In the dataframe, I do:
mydata$agesq<-mydata$Age^2
which instantly names the new column "agesq". Now for a matrix, there are two ways to do this, via indexing by number or by name of the original column:
mydata.mat<-cbind(mydata.mat, mydata.mat[,2]^2)
mydata.mat<-cbind(mydata.mat, mydata.mat[,"Age"]^2)
In the first line, I'm taking all rows and the second column of the mydata.mat matrix and squaring it, then I'm column binding it to the original matrix. In the second line, I'm doing the exact same thing, except that instead of indexing with a number, I can use the name of the column "Age". I get the following after running both statements:



Notice that the last two columns of this matrix do not have names, which can be rectified, by using the colnames() function:
colnames(mydata.mat)[7:8]<-c("AgeSq", "AgeSqAgain")
I don't want to rename everything, so I take the 7th and 8th columns and name those appropriately.

Finally, what can matrices do for us? One important aspect of matrices is of course matrix multiplication, which is how we do any multivariable regression analysis. I'll do a post soon on regression analysis by hand in R. But another reason is that matrices are great way to store values that you return during the course of running a loop.

For example, say I want to show how great the central limit theorem is. I'll generate deviates from  some other distribution, say the Poisson, and take the mean of the draws each time. I'll do this 1000 times and then show what the histogram looks like.  In a problem like this, I'll use a loop.  I'll also use a matrix to store the mean each time.

Ok, we start out by initializing a matrix. We'll create a matrix of all NAs with 1000 rows and 3 columns using the matrix() function:
mat1<-matrix(NA, nrow=1000, ncol=3)

Next, we'll set up the for() loop. Let's look at it first and then go through the logic:
for(i in 1:nrow(mat1)){
  vec1<-rpois(1,1)
vec2<-rpois(10,1)
vec3<-rpois(100,1)
mat1[i,]<-c(mean(vec1), mean(vec2), mean(vec3))
 
}

So in the first line, we're saying for each value of i going from i=1 to i=nrow(mat1), do the stuff in the loop. We could have written 1:1000, but it's nice to leave it as nrow(mat1) since we may want to change the size of mat1 later and this way the loop will still be fine.

Next, we draw from a Poisson distribution three times, each time a larger number of draws (first 1 draw, then 10, then 100), and each time with a lambda of 1.

Finally, and this is where the matrix comes in, we'll take the mean of each one of those vectors and store it. We will store the three values in the ith row of the mat1 matrix, filling in all three columns.  In a longer way, I could have done:
mat1[i,1]<-mean(vec1)
mat1[i,2]<-mean(vec2)
mat1[i,3]<-mean(vec3)
and it would have come out the same, but the first way is nicer since it's more compact. Remember that matrices are just columns and rows of vectors, so you can always assign a vector to a row, as long as it's the same length. When you concatenate numbers (using the c() function), you make a vector, which is why it works.

Now, let's see how the old CLT is working by plotting some histograms:
par(mfrow=c(1,3))
hist(mat1[,1], main="n=1")
hist(mat1[,2], main="n=10")
hist(mat1[,3], main="n=100")

Again, with the histograms, I can plot each column at a time by subsetting the mat1 matrix:

Pretty nice! Other very helpful places to better understand matrices:



Sunday, March 17, 2013

Extracting Information From Objects Using Names()

One of the big differences between a language like Stata compared to R is the ability in R to handle many different types of objects at once, and combine them together or pull them apart.  I had a post about objects last year, but I thought I'd show in this post how to extract information from objects you create in R.

For this example, I'll go back to a dataset I've used in the past called mydata.Rdata and it's in the Code and Data Download site.

One function that is extremely useful to know is names().  The names() function will show you everything that is stored in R under that object name.  So, for example, if you do





where mydata is a dataframe object, you will get the names of the columns, which are the vectors that comprise the dataframe. Note that names(mydata) is an object itself (because everything is an object in R) - it is a character vector of length 7.  You can save this vector and print out the class to verify this.








But names() can be useful for much more than just column names, as we'll see in a moment.

But before we go on, let's take a moment to remember how subsetting works. In subsetting, you use square brackets to pull out exactly the element of an object that you want. So if I want to subset a dataframe, I can say

mydata.subset<-mydata[,c(1:2)]

which is saving into the new object mydata.subset, all the rows and only the first two columns of the mydata dataframe.

Now, let's combine the concept of using the names() function with the concept of subsetting to change one of the column names of our dataset:

names(mydata)[4]<-"Weight_lbs"

Here we are saying, of the names(mydata) object, take the fourth component and make it "Weight_lbs".  Now, if you run the names() on our dataframe, we find the change has been made:




Ok, so now we'll see how the names() function can be used in other applications.

1. Summary objects

There are two ways to extract information from objects in R, using subsetting and using the "$" operator. 

Below, we summarize the Age vector and store the results in sum.vec.  We print out the sum.vec object and the print out the corresponding names.  Now we can extract the 1st element of the summary vector of Age in the following way using the [ ] operator.













This gives us the first element, which is the minimum. We could also do:

sum.vec[c(2,3,5)] 

for the 25th, 50th, and 75th percentiles.


The other way to extract is by using "$".  For example, the summary() function on a table object gives you a Chi squared test:












Here, you can extract any of the pieces of information that came out in the test, including the number of cases, the number of variables, the test statistic, etc.  We can extract the pvalue of the test statistic by using the "$" operator, like this:






Let's see how this can be useful in the next example.

2. Regressions and statistical tests

The standard linear regression that we run in R is using lm().  It looks like this:











But there's a lot more that R has calculated that is not shown here. We can see this by saving this linear regression as an object and running names() on it:




So we see that saved under the reg.object are the coefficients, the residuals, fitted values, degrees of freedom, and a lot more.   To find out everything that names() provides for a given object, look it up by doing ?lm.  Now, to extract any of these components, like the residuals, use the "$" operator like this:

reg.object$residuals

You can make use of this extraction by taking the mean of the residuals





or plotting their distribution:

hist(reg.object$residuals, main="Distribution of Residuals" ,xlab="Residuals")

Don't forget that you can summarize regression objects using summary(), and get the names() of that summary too, like this:

summary(reg.object)
names(summary(reg.object))

which will give you more objects you can extract from your regression. You can use the names() function on any statistical model or function such as aov(), t.test(), chisq.test(), etc.

3.  Histograms and boxplots

Finally, let's go back to that histogram and save that into an object. There are objects under names() of the histogram object now:





I showed how you can manipulate those in my post on histograms.

Similarly, for boxplot:













Here I've extracted the stats object which gives you the lower whisker, the lower hinge, the median, the upper hinge, and the upper whisker for each group, which you can see below.



Friday, November 30, 2012

Data types part 4: Logical class


First, an update:  A commentator has asked me to post my code so that it is easier to practice the examples I show here.  It will take me a little bit of time to get all of my code for past posts well-documented and readable, but I have uploaded the code and data for the last 4 posts, including this one, here:

Code and Data download site

Unfortunately, I could not find a way to attach it to blogger, so sorry for the extra step.
_________________________________________________________________________

Ok, now on to Data types part 4: Logical

I started this series of posts on data types by saying that when you have a dataframe like this called mydata:




you can't do this in R:

Age<25

Because Age does not exist as an object in R, and you get the error below:





But then what happens when I do,

mydata$Age<25

This is perfectly legal to do in R, but it's not going to drop observations. With this kind of statement, you are asking R to evaluate the logical question "Is it true that mydata$Age is less than 25?".  Well, that depends on which element of the Age vector, of course. Which is why this is what you get when you run that code:



On first glance, this looks like a character vector.  There is a string of entries using character letters after all.  But it's not character class, it's the logical class.  If you save this string of TRUE and FALSE entries into an object and print its class, this is what you get:



The logical class can only take on two values, TRUE or FALSE.  We've seen evaluations of logical operations already, first in subsetting, like this:

mysubset<-mydata[mydata$Age<40,]

Check out my post on subsetting if this syntax is confusing. In a nutshell, R evaluates all rows and keeps only those that meet the criteria, which is only rows where Age has a value of under 40 and all columns.

Or here, in ifelse() statements

mydata$Young<-ifelse(mydata$Age<25,1,0)

More on ifelse() statements here. The ifelse() function is really useful, but is actually overkill when you're just creating a binary variable. This can be done faster by taking advantage of the fact that logical values of TRUE always have a numeric value of 1, while logical values of FALSE always have a numeric value of 0.

That means all I need to do to create a binary variable of under age 25 is to convert my logical mydata$Ageunder25 vector into numeric.  This is very easy with R's as.numeric() function. I do it like this:

mydata$Ageunder25_num<-as.numeric(mydata$Ageunder25)

or directly without that intermediate step like this:

mydata$Ageunder25_num<-as.numeric(mydata$Age<25)

Let's check out the relevant columns in our dataframe:


We can see that the Ageunder25_num variable is an indicator of whether the Age variable is under 25.

Now the really, really useful part of this is that you can use this feature to turn on and off a variable depending on its value. For example, say you got your data and realized that some of the height values were in inches and some were in centimeters, like this:



Those heights of 152 and 170 are in centimeters while everything else is inches.  There are various ways to fix it, but one way is to check which values are less than, say 90, which is probably a safe cutoff and create a new column that keeps those values under 90 but converts the values over 90.  We can do this in this way:


mydata$Height_fixed_in<-  as.numeric(mydata$Height_wrong<90)*mydata$Height_wrong 
+ as.numeric(mydata$Height_wrong>=90)*mydata$Height_wrong/2.54

So the first half of the calculation (in red) is "turned on" when Height_wrong is less than 90, because the value of the logical statement is a numeric TRUE, i.e. a 1, and this value of 1 is multiplied by the original Height column.  The second part of the statement (in blue) is FALSE and so is just 0 times something so it's 0.  If the Height_wrong column is greater than 90, then the first half is just 0 and the second half  is turned on and thus the Height_wrong variable is divided by 2.54 cm, converting it into inches. We get the result below:



Another useful way to use the as.numeric() and logical classes to your advantage is a situation like this:


I have in my dataset the age of the last child born (and probably other characteristics of this child not shown), and then just the number of other children for each woman.  I want to get a total number of children variable.  I can do it simply in the following way. 

First, a note about the is.na() function.  If you want to check if a variable is missing in R, you don't use syntax like "if variable==NA" or "if variable==.".  This is not going to indicate a missing value. What you want to use instead is is.na(variable) like this:

is.na(newdata$Child1age)

Which gives you a logical vector that looks like this:





If you want to check if a variable is not missing, you use the ! sign (meaning "Not") in front and check it like this:





We've seen this kind of thing before!  Now we can translate this logical vector into numeric and add it to the number of other children, like this:

newdata$Totalnumchildren<-as.numeric(!is.na(newdata$Child1age))+newdata$Numotherchildren

We get the following:


If we want to get those NAs to be 0, we can again use the is.na() function and replace whereever Totalnumchildren is missing with a 0 like this:

newdata$Totalnumchildren[is.na(newdata$Totalnumchildren)]<-0






Wednesday, November 21, 2012

Data types, part 3: Factors!


In this third part of the data types series, I'll go an important class that I skipped over so far: factors.

Factors are categorical variables that are super useful in summary statistics, plots, and regressions. They basically act like dummy variables that R codes for you.  So, let's start off with some data:



and let's check out what kinds of variables we have:


so we see that Race is a factor variable with three levels.  I can see all the levels this way:


So what his means that R groups statistics by these levels.  Internally, R stores the integer values 1, 2, and 3, and maps the character strings (in alphabetical order, unless I reorder) to these values, i.e. 1=Black, 2=Hispanic, and 3=White.  Now if I were to do a summary of this variable, it shows me the counts for each category, as below.  R won't let me do a mean or any other statistic of a factor variable other than a count, so keep that in mind. But you can always change your factor to be numeric, which I'll go over next week.






If I do a plot of age on race, I get a boxplot from the normal plot command since that is what makes sense for a categorical variable:

plot(mydata$Age~mydata$Race, xlab="Race", ylab="Age", main="Boxplots of Age by Race")

Finally, if I do a regression of age on race, notice how I instantly get dummy variables:

summary(lm(Age~Race, data=mydata))











Here Black is the reference category since it's the first level by alphabetical order.







What if I want to run the same regression as before, but I want to use Hispanic as my reference group?  Very easily, I just relevel the factor like this and get the resulting regression output:

mydata$Race2<-relevel(mydata$Race, "Hispanic")

summary(lm(Age~Race2, data=mydata))










Notice how now the Hispanic category is the reference.




So that is great stuff. But it's really important to know how to manipulate the factor variables. First, I can create factors using the factor() function.  I notice from viewing my dataset above (or from running class(mydata$Marriage)) that marriage is numeric and coded as 0, 1, or 2.  I find out in my codebook that those values correspond to Single, Married, and Divorced/Widowed. We can fix that this way:

mydata$Married.cat<-factor(mydata$Married, labels=c("Single", "Married", "Divorced/Widowed"))

This will create an unordered factor where 1=Single, 2=Married, and 3=Divorced/Widowed.

Now let's say I want to create a variable that describes whether someone's weight is "Low", "Medium", and "High".  In this case, I'll use the cut() function, which instantly creates a factor that I can label, then I use the ordered() function around the cut function to order the levels.  I show it in different colors for ease of viewing the two functions that I'm nesting:

mydata$Weight.type<-ordered(cut(mydata$Weight, c(0,135,165,200), labels=c("Low", "Medium", "High")))


If I print out the class and the contents of the variable (left) , I notice that it's an ordered factor and it tells me that Low<Medium< High, which is what I want.

We can see the new additions to my dataset (I'm showing just the relevant columns):




One caveat with factors - if you start off with a level and then you drop the only observations with that level, R still holds on to the level as a stored value and this can mess up your later analysis.  For example, I subset my data to the first 6 rows so that I eliminate all Hispanic subjects from my data, but R keeps Hispanic as a possible level:


mynewdata<-mydata[1:6,]
summary(mynewdata$Race)






So here Hispanic just has a 0 count, but is still a category.  This can be really annoying, like when you're making a barplot and the category is still showing up in the plot.

One quick way to get rid of this is to use the droplevels() function to drop all unused levels from your dataset. A commentator let me know that this function was introduced in R 2.12.0. Before, it was necessary to use a separate package called gdata. The function takes the whole dataframe as the argument, and you can use the except argument to list the indices of columns that you do not want subject to the dropping:

finaldata<-droplevels(mynewdata)
summary(finaldata$Race)






You could do all this very efficiently in one step like this without changing the name of your dataframe:

mydata<-droplevels(mydata[1:6,])

which is why we love R!