Problem Statement
Gender bias in earnings is an ongoing issue. In 2019, the American Association of University Women (AAUW) stated that women who work full-time earn about 80 percent of what their male counterparts make. This report identified the following problems which we attempt to investigate in our study:
This study also attempts to answer the following questions:
Data and Methodology
For this study, we would use the Women in Workforce data which is a historical data about womens’ earnings and employment status, specific occupation and earnings from 2013-2016, compiled from the Bureau of Labor Statistics and the Census Bureau. We intend to analyze the data using the following methodologies: Trend analysis, descriptive analysis, data visualization techniques, and inferential analysis.
Proposed Analysis
We would investigate the gender pay gap over the years and the trend of total female earnings as a percentage of male earnings by age group over the same period. This investigation would provide insights into the gender pay gap faced by women of different age groups. We would conduct an exploratory analysis across the different fields/occupation groups to identify occupations with equal representation of male and female workers. Then, we would check if pay differences exist in these fields. In addition, we would highlight occupation types where women earn more than their male counterparts. With this information, we would investigate if these occupation types are male- or female-dominated.
Our Contribution
Our results would provide more insights on the gender pay gap across fields, occupations, and age groups. Our results would show the trend in the gender pay gap. Our findings would shed some light on the gender pay gap faced by women across all age groups. Finally, we hope that our study would contribute to the discussion on the gender pay gap.
We used the following packages:
library(readxl) #used to import Excel files into R
library(tidyverse) #used for data manipulation
library(dplyr) # used for data manipulation
library(DT) # used for displaying R data objects (matrices or data frames) as tables on HTML pages
library(knitr) #used to display an aligned table on the screen
library(kableExtra)#used to build with straightforward formatting options
library(ggplot2) # for data visualization
library(scales) # for scale_y_continuous(label = percent)
library(ggthemes) # for scale_fill_few('medium')
library(ggalt) #for the dumbbell plot
For this study, we use the Women in Workforce data which is a historical data about womens’ earnings and employment status, specific occupation and earnings from 2013-2016, compiled from the Bureau of Labor Statistics and the Census Bureau. The data was provided in March 2019 as part of the #TidyTuesday project to celebrate the Women’s History month.
The entire data is spread into 3 files: jobs_gender.csv, earnings_female.csv, employed_gender.csv and are described in the next tab.
The three datasets are first imported from csv files into dataframes named jobs_gender, earnings_female,and employed_gender
This dataset contains information on the total number of male and female workers and the total estimated median earnings for all employees at various occupation levels, from 2013-2016. The dataset has 12 variables, with a total of 2,088 data points, across 8 major job categories, 23 minor job categories, and 522 occupation types. There are some missing values recorded as “NA” and are taken care of during the data cleaning process.
More information on this dataset can be found here
#Import Dataset 1
jobs_gender <- read.csv("data/jobs_gender.csv", header = TRUE)
Checking the names of the data columns
#Check the names of the data columns
colnames(jobs_gender)
## [1] "year" "occupation"
## [3] "major_category" "minor_category"
## [5] "total_workers" "workers_male"
## [7] "workers_female" "percent_female"
## [9] "total_earnings" "total_earnings_male"
## [11] "total_earnings_female" "wage_percent_of_male"
Checking the dimension of the dataset
#Check the dimension of the dataset
dim(jobs_gender)
## [1] 2088 12
Counting the number of distinct values/observations
#Count the number of distinct values/observations
library(dplyr)
jobs_gender %>%
summarise_each(n_distinct, "occupation","minor_category", "major_category")
## occupation minor_category major_category
## 1 522 23 8
Checking the number of missing values per column
#Check the number of missing values per column
colSums(is.na(jobs_gender))
## year occupation major_category
## 0 0 0
## minor_category total_workers workers_male
## 0 0 0
## workers_female percent_female total_earnings
## 0 0 0
## total_earnings_male total_earnings_female wage_percent_of_male
## 4 65 846
This dataset contains the historic information of female salary as a percent of male salary for various age groups, from year 1979 to 2011. The dataset has 3 variables with 264 observations. This dataset has no missing values.
The dataset can be found here
#Imports Dataset 2
earnings_female <- read.csv("data/earnings_female.csv", header = TRUE)
Checking the names of the data columns
#Check the names of the data columns
colnames(earnings_female)
## [1] "Year" "group" "percent"
Checking the dimension of the dataset
#Check the dimension of the dataset
dim(earnings_female)
## [1] 264 3
Counting the number of missing values per column
#Count the number of missing values per column
colSums(is.na(earnings_female))
## Year group percent
## 0 0 0
This dataset shows the percentage of part-time and full-time employees for each year at the gender level. The dataset has 7 variables with 49 observations each, from year 1968 to 2016. This dataset has no missing values and can be accessed here
#Import Dataset 3
employed_gender <- read.csv("data/employed_gender.csv", header = TRUE)
Checking the names of the data columns
#Check the names of the data columns
colnames(employed_gender)
## [1] "year" "total_full_time" "total_part_time"
## [4] "full_time_female" "part_time_female" "full_time_male"
## [7] "part_time_male"
Checking the dimension of the dataset
#Check the dimension of the dataset
dim(employed_gender)
## [1] 49 7
Counting the number of missing values per column
#Count the number of missing values per column
colSums(is.na(employed_gender))
## year total_full_time total_part_time full_time_female
## 0 0 0 0
## part_time_female full_time_male part_time_male
## 0 0 0
We investigate the variable types in the dataset to see if they are accurate or need to be changed.The codes and outputs below indicate that all of the variable types in all three datasets are correct.
Dataset 1 - jobs_gender
glimpse(jobs_gender)
## Observations: 2,088
## Variables: 12
## $ year <int> 2013, 2013, 2013, 2013, 2013, 2013, 2013...
## $ occupation <fct> "Chief executives", "General and operati...
## $ major_category <fct> "Management, Business, and Financial", "...
## $ minor_category <fct> Management, Management, Management, Mana...
## $ total_workers <int> 1024259, 977284, 14815, 43015, 754514, 4...
## $ workers_male <int> 782400, 681627, 8375, 17775, 440078, 161...
## $ workers_female <int> 241859, 295657, 6440, 25240, 314436, 280...
## $ percent_female <dbl> 23.6, 30.3, 43.5, 58.7, 41.7, 63.5, 33.6...
## $ total_earnings <int> 120254, 73557, 67155, 61371, 78455, 7411...
## $ total_earnings_male <int> 126142, 81041, 71530, 75190, 91998, 9007...
## $ total_earnings_female <int> 95921, 60759, 65325, 55860, 65040, 66052...
## $ wage_percent_of_male <dbl> 76.04208, 74.97316, 91.32532, 74.29179, ...
Dataset 2 - earnings_female
glimpse(earnings_female)
## Observations: 264
## Variables: 3
## $ Year <int> 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, ...
## $ group <fct> "Total, 16 years and older", "Total, 16 years and olde...
## $ percent <dbl> 62.3, 64.2, 64.4, 65.7, 66.5, 67.6, 68.1, 69.5, 69.8, ...
Dataset 3- employed_gender
glimpse(employed_gender)
## Observations: 49
## Variables: 7
## $ year <int> 1968, 1969, 1970, 1971, 1972, 1973, 1974, 197...
## $ total_full_time <dbl> 86.0, 85.5, 84.8, 84.4, 84.3, 84.4, 84.2, 83....
## $ total_part_time <dbl> 14.0, 14.5, 15.2, 15.6, 15.7, 15.6, 15.8, 16....
## $ full_time_female <dbl> 75.1, 74.9, 73.9, 73.2, 73.1, 73.2, 73.2, 72....
## $ part_time_female <dbl> 24.9, 25.1, 26.1, 26.8, 26.9, 26.8, 26.8, 27....
## $ full_time_male <dbl> 92.2, 91.8, 91.5, 91.2, 91.1, 91.4, 91.2, 90....
## $ part_time_male <dbl> 7.8, 8.2, 8.5, 8.8, 8.9, 8.6, 8.8, 9.4, 9.4, ...
The next stage of the data cleaning process is to check for missing values and make decisions on removal or data imputation, where appropriate. As shown in the Data Import and Description tab, the only data set with some missing values is Dataset 1 - jobs_gender. So, we focus on this dataset.
#Impute missing values as zero
jobs_gender$total_earnings_female <- ifelse(jobs_gender$workers_female == 0, 0, jobs_gender$total_earnings_female)
#Remove rows with missing values for female earnings
jobs_gender_vector <- complete.cases(jobs_gender[,'total_earnings_female'])
jobs_gender <- jobs_gender[jobs_gender_vector,]
#Impute missing values as zero
jobs_gender$total_earnings_male <- ifelse(jobs_gender$workers_male == 0, 0, jobs_gender$total_earnings_male)
jobs_gender$wage_percent_of_male <- ifelse(jobs_gender$total_earnings_male == 0, 0, jobs_gender$wage_percent_of_male)
#Remove row with missing values for male earnings
jobs_gender_m_vector <- complete.cases(jobs_gender[,'total_earnings_male'])
jobs_gender <- jobs_gender[jobs_gender_m_vector,]
Finally, we notice that the column “wage_percent_of_male” showing the percentage of the total female earnings to total male earnings has a total of 846 missing values. Given that this is the percentage of the total female earnings to the total male earnings, we calculate and impute this variable using the total_earnings_male and total_earning_female columns.
#Calculate and impute results for wage_percent_of_male
jobs_gender$wage_percent_of_male<- ifelse(is.na(jobs_gender$wage_percent_of_male),jobs_gender$total_earnings_female/jobs_gender$total_earnings_male*100,jobs_gender$wage_percent_of_male)
We examine the presence of outliers using box plots in two parts. First we look at the number of workers variables. This shows the presence of outliers. However, this is not surprising because the number of workers vary by occupation.
boxplot(jobs_gender[c(5:7)],options(scipen=999))
Similarly the box plot of the earnings variables show the presence of outliers. This is also unsurprising because the median earnings vary across different occupations. Hence, we do nothing to the outliers and proceed with the rest of the analysis.
boxplot(jobs_gender[c(9:11)],options(scipen=999))
Dataset 1 - jobs_gender
Dataset 2 - earnings_female
Dataset 3- employed_gender
Dataset 1 - jobs_gender
Variable | Description |
---|---|
year | Year |
occupation | Specific job/career |
major_category | Broad category of occupation |
minor_category | Fine category of occupation |
total_workers | Total estimated full-time workers > 16 years old |
workers_male | Estimated MALE full-time workers > 16 years old |
workers_female | Estimated FEMALE full-time workers > 16 years old |
percent_female | The percent of females for specific occupation |
total_earnings | Total estimated median earnings for full-time workers > 16 years old |
total_earnings_male | Estimated MALE median earnings for full-time workers > 16 years old |
total_earnings_female | Estimated FEMALE median earnings for full-time workers > 16 years old |
wage_percent_of_male | Female wages as percent of male wages - NA for occupations with small sample size |
Dataset 2 - earnings_female
Variable | Description |
---|---|
Year | Year |
group | Age group |
percent | Female salary percent of male salary |
Dataset 3 - employed_gender
Variable | Description |
---|---|
year | Year |
total_full_time | Percent of total employed people usually working full time |
total_part_time | Percent of total employed people usually working part time |
full_time_female | Percent of employed women usually working full time |
part_time_female | Percent of employed women usually working part time |
full_time_male | Percent of employed men usually working full time |
part_time_male | Percent of employed men usually working part time |
Our analysis and findings are presented in 3 categories: Industry based, Gender based, and Time series based . These categories reveal the different levels of analysis considered in this study.
Do women earn more than men in certain industries?
We examine the industries where female workers earn more than their male counterparts. It is observed that there are occupations in all of the 8 major categories where women earn more than men. For a more in-depth analysis, we dug deeper to ascertain the number of jobs within each specific industry that fit the criteria (women earn more than male workers). Then, the 8 industries are ranked based on the identified number of jobs.
The Natural Resources, Construction and Maintenace and Production and the Transportation and Material Moving industries are ranked first and second, respectively. Although these industries are traditionally thought to be male-dominated, our findings reveal that female workers do earn way more than men in certain jobs from these industries.
To confirm this traditional thought, we explore all 8 industries to ascertain if they are male- or female-dominated.
Are these industries male-dominated or female-dominated?
An examination of the gender ratio in 8 industries where female workers earn more than male workers show that these industries are male-dominated. Also, female workers have a very small representation in the top two industries- Natural Resources, Construction and Maintenace and Production and Transportation and Material Moving.
Based on these findings, it could be suggested that the industries are making active efforts to attract more women employees by offering higher salaries.
How do women fare in the top paying jobs
We wanted to examine, for the year 2016, women representation in jobs that were considered the highest paying jobs of that year (The list of the top-paying jobs were consolidated from sites such as Forbes and Business Insider).
We notice that most of these jobs, barring 3 such as Pharmacists, Nurse anesthetists, and Financial managers, are all predominantly male-driven. Noticeably over 50% of these jobs have very poor female representation (50% percent or under).
Female only vs male only occupations
Historically there have always been jobs that have been influenced by gender biasing. Jobs that involved heavy manual labor were always male-driven whereas occupations such as midwifery were relegated to women.
We wanted to capture if these biases still exist or if the gender gap has been closed in recent years.
In 2013, 2014, and 2016, the Nurse midwives occupation only had female workers. Meanwhile, five major job categories - Construction and Extraction, material moving, Installation, Maintenance, and Repair, Transportation, and protective services had only male workers during the same period.
These observations align with the traditional gender-based roles and unfortunately haven’t changed over the years.
Female only occupations:
## occupation
## 1 Nurse midwives
Male only occupations:
## minor_category
## 1 Construction and Extraction
## 2 Material Moving
## 3 Installation, Maintenance, and Repair
## 4 Transportation
## 5 Protective Service
Does Equal representation indicate Equal Pay?
For a more accurate comparison of the gender pay gap, we focus on occupations with approximately equal gender representation in the workforce. We see that four occupation types - Food cooking machine operators and tenders; Operations research analysts; Postal service mail sorters, processors, and processing machine operators; and Postal service clerks- have equal representation of men and women in the workforce. Despite this, women earn less than men across all four occupations. In this context, female workers who are food cooking machine operators and tenders face the widest gender pay gap. Their wages are about 62.3 percent of male workers’ wages. Meanwhile, women who are postal service clerks earn about 95.03 percent of what is earned by their male counterparts.
These findings suggest that an equal representation of both genders in a field may not necessarily mean equal earnings or a lack of gender pay gap.
Do women face wider pay gap as they get older?
The chart below provides insights into three of our objectives. Firstly, we see an upward trend in the percentage of female earnings to male earnings, despite some observed fluctuations. This shows that the gender pay gap has decreased over the years. Secondly, women of all age groups earn lesser than their male counterparts. The chart reveals that none of the reported percentages was equal to 100 percent or more, indicating that from 1979 to 2011, male workers earned more than the female workers across all the age categories. This suggests that no matter what age group a woman belonged to, she is still subjected to gender-based differences in earnings.
Furthermore, younger female workers between the ages of 16 and 24 face lesser gender-based pay differences compared to older female workers. This suggests that as women grow older, they tend to make lesser than male workers. Based on this finding, it may be assumed that for jobs, with a younger workforce, the earnings for both genders is closer than jobs or positions with older workers.
Surprisingly, the chart suggests that women between the ages of 35-64 faced a wider gender pay gap before 2001 and in 2011 compared to those who are 65 years and above.
Full-time/ Part-time represenation based on gender
The proportion of full-/part-time workers is examined across both genders to gain some insights on how the increase/decrease in full-time workers may have contributed to the gender pay gap. From 2000-2016, we notice that the percentage of men working full time is greater than the percentage of women working full time.
We see a dip in the percentage of both male and female full-time percentage between 2008 and 2009 and an increase in part-time percentage between 2008 and 2009, suggesting the likely effect of the 2008 financial crisis.
On the other hand, female workers are more represented as part-time workers than male workers. This may be an important contributor to the lower total female median earnings. Although beyond the scope of our study, this may be worth investigating.
This study examined the gender bias in earnings in the US, with a focus on female workers. Using the Women in Workforce data, we explored the trend and how women of different age groups are affected. We investigated the presence or absence of a gender pay gap in occupations with an equal representation of male and female workers.
Our results show that the gender pay gap has decreased over the years, but women workers still earn lesser than their male counterparts across all the age categories. Younger female workers between the ages of 16 and 24 face lesser gender-based pay differences compared to older female workers. Over the years, female workers have had more representation as part-time workers than male workers, which may be an important contributor to the lower total female median earnings.
In conclusion, an equal representation of both genders in an occupation type may not necessarily mean equal earnings or a lack of gender pay gap.
Practical Implications
Our study provides workers and stakeholders with more insights on gender-based earning differences across industries, occupations, and age groups in the US. We highlight how women of all ages are affected by this bias. We hope that other industries emulate the Natural Resources, Construction and Maintenace and Production and the Transportation and Material Moving industries in eliminating the gender pay gap and rewarding workers based on their abilities and contributions, rather than their gender.
Limitations of Study
The absence of extensive years of data was a major limitation of our study. The data that we worked with was limited to a certain number of variables which made it impossible to draw generalizable inferences. In addition, we didn’t have access to up to date data, hence our findings can only be extended to 2016, without a good picture of what has happened in the most recent years.
Future Research Direction
Though we found that female workers are more represented as part-time workers than male workers, we couldn’t justify if this is an important contributor to the lower total female median earnings because this is beyond the scope of our study and data. This is worth investigating and could be an area of improvement and future research.