Sandwiches and Surface Markers: How to Analyze Flow Cytometry Data

Every cell in an organism has unique functions which require specific proteins. For example, cells in eyes need proteins that can detect light, and immune cells use a variety of different proteins to detect and kill invading pathogens. To categorize each type of cell from a big group (say, from a blood sample, which contains many different cells), we can classify them based on their proteins. Now imagine you want to look at many different proteins on a lot of individual cells, and you want to do it fast. How can a scientist analyze almost twenty proteins on each cell, from millions of single cells, in less than an hour? This isn’t the imaginary dream of a tired graduate student, or a magical machine available only to the richest labs­—this is flow cytometry. Flow cytometry is the measurement (-metry) of cells (cyto-) as they flow through a fine stream of liquid. Flow cytometry machines (flow cytometers) are ubiquitous in medical clinics as well as on laboratory benches around the world, and scientists use these incredible tools to learn about cells every day.


How does it work? First, scientists suspend cells in liquid and load them into the machine. The machine then sends every cell in single file past an array of lasers, which measure each cell for characteristics such as size, complexity, and production of certain proteins. Identification of proteins that are in or on a cell requires the use of a variety of colored markers which stick to specific proteins and can be used as “tags.” Tagging proteins is like adding multiple colored post-it notes to your files (pink for important, yellow for bills, etc). When looking at the box, it’s easy to see what types of files are present just by looking at the colors. Similarly, much can be learned about cells by looking at these markers - up to 17 can be used [1], meaning information about 17 different proteins can be obtained. As you can imagine with 17 colors, however, flow cytometry can generate extremely complex data. In this article, we’ll show you the most common ways to analyze flow cytometry data, so that you can read through research articles and make your own conclusions.


 How to Read a Flow Plot


To better understand the different levels of analysis that scientists perform on flow data, let’s imagine that we analyzed a plate of sandwiches with different characteristics: big, small, peanut butter, jelly, and every combination thereof. Each sandwich is counted as a “cell,” and each cell is represented as a dot on a graph.  


The first step is to look at all the components of the sample and remove the debris (dead cells, microscopic bits of dust, etc); here, we’ll call them crumbs. To do this, we can use forward scatter (FSC) and side scatter (SSC). These are two parameters that measure the size and complexity of cells. FSC measures the amount of light from the lasers that goes around each cell in the sample, and it gives information about the size of the cell. This can distinguish the crumbs from the intact sandwiches. SSC measures the amount of light that is deflected from the cell, which represents the complexity of the cell. In the sandwich example, this complexity is akin to differentiating sandwiches made with white bread from those made with multigrain bread, which have lots of seeds, grains, and other bits present. Although not directly linked to any specific function in a cell, SSC can help to tell different populations of cells apart. To get an overview of the types of cells in the sample, FSC can be plotted on the Y-axis and SSC on the X-axis of a graph (Figure 1). 

Figure 1: Forward scatter vs side scatter. Typically, scientists will plot FSC by SSC as a first pass look at the items that make up their population. On the X-axis (FSC), individual events go from small to large, and on the Y-axis (SSC) individual events go from less complex to more complex. (Credit: Lynnea Waters)

The next step is to electronically filter that data to focus on only the cells we’re interested in. Using software programs that analyze flow cytometry data, a circle can be drawn around just the cells of interest. This digitally files them into a group called a “gate.” Further analysis can, for example, exclude the very small crumbs, and only look at full sandwiches.

Figure 2: Peanut butter vs jelly. To look at two parameters simultaneously, (like peanut butter and jelly), we can analyze a population can be graphed on in a two-way plot. On the X-axis, events range from less peanut butter to more of peanut butter, and on the Y-axis events range from less jelly to more jelly. (Credit: Lynnea Waters)

With this essential step done, other qualities of the cells can now be analyzed. In this experiment, we want to find out how many sandwiches with peanut butter AND jelly there are. To do that, let’s look at the level of peanut butter on the X-axis and the level of jelly on the Y-axis. This is a typical two-parameter flow plot, where each dot represents a sandwich and whether it has peanut butter and/or jelly (Figure 2).

Typically, scientists will categorize the populations by the amount of a certain marker. For example, on the X-axis, everything to the left of the dotted vertical line could be categorized as peanut butter negative, or PB- for short. Everything to the right of the line would be peanut butter positive, or PB+. Similarly, for the Y-axis, there are jelly negative sandwiches, J-, below the dotted horizontal line, and jelly positive, J+, above the dotted horizontal line. By combining these two markers, the jelly-only sandwiches are labeled as PB-J+, the peanut butter-only sandwiches are PB+J-, and the peanut butter AND jelly sandwiches are PB+J+. 


Both the total numbers and the percentages of each type of sandwich of the total population can be calculated by gating, similar to how crumbs were gated out by FSC and SSC. If we draw a gate only on one population (say, PB+J-), the sandwiches in that population can then be further categorized by additional markers (remember, up to 17 can be used!). For example, there might be a population that is PB+ J- Honey+ Banana+.  This consecutive gating strategy gives a very fine understanding of the exact characteristics of sandwiches in the sample.

Figure 3: Histogram of jelly. A histogram describes how many sandwiches in a population have relative amounts of that parameter. This population is split, with most sandwiches having either a little (left peak) or a lot (right peak) of jelly, and fewer sandwiches having a medium amount. (Credit: Lynnea Waters)

Another way to analyze flow cytometry data is to use a histogram. In this type of plot, the X-axis represents a quantity of a parameter (Figure 3). The Y-axis measures how often we see sandwiches with that specific quantity of that marker. In Figure 3, there are many sandwiches with a lot of jelly or a little jelly, but fewer sandwiches with a medium amount of jelly. On a histogram plot, they would appear to be two distinct peaks with a valley in the middle: the left peak represents many sandwiches with little jelly, the valley represents the few sandwiches with a medium amount of jelly, and the right peak represents many sandwiches with lots of jelly. 


Finally, it is often useful to compare sandwiches from two different samples. For example, let’s look at sandwiches made by Amanda and sandwiches made by Daniel, and determine who makes sandwiches with more jelly (Figure 4). This can be done with an overlapping histogram. If the two histograms (one from Amanda’s sandwiches, and one from Daniel’s sandwiches) of jelly content on the same graph, we can see whether one of the sandwich artists uses more or less jelly than the other. Whichever histogram is shifted more to the right shows which sample has more jelly—looks like Amanda has the heavier hand when it comes to jelly!

Figure 4: Overlapping histograms. By plotting histograms of the same parameter (jelly) from two different experiments (sandwiches made by Amanda and sandwiches made by Daniel) on the same graph, the amounts of jelly in each experiment can be directly compared to conclude that overall Amanda’s sandwiches have more jelly than Daniel’s. (Credit: Lynnea Waters)

Put your skills to the test!


Now that you know how to read flow plots, let’s analyze real data from a research paper. Shown below (Figure 5) is a figure from a paper by Terahara et al. [2]. In this paper, the authors used a type of human immunodeficiency virus (HIV-1) that can infect cells and force them to produce a protein, EGFP. which makes them glow green under ultraviolet light. This allowed the authors to monitor which cells were infected by HIV-1. To use the same terminology from the sandwich example, these infected cells would be labeled EGFP+. The authors infected human blood cells with their modified HIV-1 virus and monitored the levels of EGFP and a viral protein, Gag p24, at different stages of infection to see how well the levels of EGFP and viral protein matched. By comparing levels of EGFP and Gag p24 at each time point in Figure 5, we can determine how long it took cells to produce these two proteins.

Figure 5: Scientists infected human blood with a version of HIV-1 that makes the cells turn green (EGFP+). They also monitored the levels of EGFP and an HIV-1 protein (Gag p24) and observe how long it took each marker to be produced. The percentage of cells in each square is written in pink in each box. (Credit: Adapted from Terahara et al. 2012 [2] by Lynnea Waters)

The “Mock” panel (a control experiment, which contained blood cells that did not undergo infection) does not have any cells that produce either EGFP or the viral protein. At Day 1, 17.6% of cells started to produce viral marker Gag p24 (bottom right square), but there was not yet any EGFP expression (EGFP- Gag 24+). These cells are infected by HIV-1, but don’t yet make any EGFP. By Day 2, a very small percentage of cells (0.63%, top right square) produced both EGFP and Gag p24 (EGFP+ Gag p24+). Finally, by Day 5, 19.4% of cells (top right square) expressed both markers.


What can we learn from this panel? It takes about 5 days to see full HIV-1 infection, so all of their future experiments should be done after day 5. Additionally, because EGFP is not present in all of the cells that are HIV-1-infected (Gag p24+), we can conclude that EGFP does not mark all infected cells. While the authors could write that this figure shows that EGFP marks infected cells, you know better!


Congratulations: you just interpreted your first flow cytometry plot! In the clinic, flow cytometry is a critical tool for diagnoses, especially for diseases of the blood. Additionally, in the lab, flow cytometry is a powerful tool to measure differences in cell populations across different experiments. Because flow cytometry can analyze millions of cells, it allows scientists and clinicians to identify very small changes and extremely rare cells. This article has only brushed the surface of what’s possible with flow cytometry, but hopefully now you feel confident that you can take your newfound skills into the world of flow cytometry data and make your own conclusions. 


Lynnea Rae Waters (
Guest Contributor, Signal to Noise Magazine
PhD Candidate, UCLA Molecular Biology IDP: Immunity, Microbes and Molecular Pathogenesis




[1] Perfetto, S. et al. Seventeen-colour flow cytometry: unravelling the immune system. Nature Reviews Immunology 4, 648-655, (2004).

[2] Terahara, K. et al. Fluorescent Reporter Signals, EGFP, and DsRed, Encoded in HIV-1 Facilitate the Detection of Productively Infected Cells and Cell-Associated Viral Replication Levels. Frontiers in Microbiology 2, (2012).