Data on tags over time

How can we tell what programming languages and technologies are used by the most people? How about what languages are growing and which are shrinking, so that we can tell which are most worth investing time in?

One excellent source of data is Stack Overflow, a programming question and answer site with more than 16 million questions on programming topics. By measuring the number of questions about each technology, we can get an approximate sense of how many people are using it. We’re going to use open data from the Stack Exchange Data Explorer to examine the relative popularity of languages like R, Python, Java and Javascript have changed over time.

Each Stack Overflow question has a tag, which marks a question to describe its topic or technology. For instance, there’s a tag for languages like R or Python, and for packages like ggplot2 or pandas.

Web capture_12-9-2022_195625_rpubs.com.jpeg

I’ll be working with a dataset with one observation for each tag in each year. The dataset includes both the number of questions asked in that tag in that year, and the total number of questions asked in that year.

Has R been growing or shrinking?

So far we’ve been learning and using the R programming language. Wouldn’t we like to be sure it’s a good investment for the future? Has it been keeping pace with other languages, or have people been switching out of it?

Let’s look at whether the fraction of Stack Overflow questions that are about R has been increasing or decreasing over time.

Στιγμιότυπο 2022-10-26, 10.54.22 μμ.png

Visualizing change over time

Rather than looking at the results in a table, we often want to create a visualization. Change over time is usually visualized with a line plot.

Στιγμιότυπο 2022-10-26, 10.54.57 μμ.png

How about dplyr and ggplot2?