Data Critique

How was the data generated?

The dataset we chose was accessed through the World Bank’s Gender Data Portal (https://genderdata.worldbank.org/en/indicator/sg-vaw-ipve-zs?view=trend&geos=WLD). This global data displays various rates and percentages for multiple indicators. The data seen here was generated using a multilevel regression model which looked at a few different sources of data. With these different sources and the regression model, the United Nations Inter-Agency Working Group on Violence Against Women Estimation and Data (VAW-IAWGED) was able to generate prevalence estimates of intimate partner violence against women and non-partner sexual violence against women. It is important to note that the VAW-IAWGED is a working group formed by the WHO, UN Women, UNICEF, UNSD, UNFPA, and UNODC. Hence, the creation of this data can be attributed to all these organisations/bodies. The list of the original data sources used to generate this final data is listed below (under original sources).

What are the original sources?

The Gender Data Portal utilizes a variety of different data sources to formulate its dataset. First, it uses major international organizations and their collections of data. For example, on the front page, the map is generated from a source called the International Labour Organization(ILO). Other examples of this include the World Health Organization, UNESCO, and more. Another means the World Bank utilizes is surveys: using specialized surveys on violence against women using WHO multi-country study instruments. In addition to specialized surveys, the World Bank also uses larger national health surveys, such as the DHS and RHS, with supporting data points coming from national crime victimization surveys or MICS. Lastly, the Portal takes data from national statistics offices and government agencies, for the purpose of collecting country-wide data. 

Who funded the creation of the dataset?

The creation of this dataset was funded by the United Nations Inter-Agency Working Group on Violence Against Women Estimation and Data (VAW-IAWGED) which is a working group formed by the WHO, UN Women, UNICEF, UNSD, UNFPA, and UNODC. These groups work to reveal rates of inequality and injustice worldwide. 

What information is left out?

We do not have data from every country and not every country has the same amount of available data. Thus, certain countries may leave out some information while others will be able to provide a more thorough analysis. Furthermore, our dataset is specific to intimate partner violence against women so it does not contain data on violence against men or any other gender identity. It also does not contain data on women under the age of 15 or over the age of 49 which leaves out a large population of those who experience partner violence. Any conclusions drawn will only be able to be applied to women within the age range 15-49. In addition, collecting valid, reliable, and ethical data on domestic violence poses particular challenges because what constitutes violence or abuse varies across cultures and among individuals. Not to mention a culture of silence usually surrounds domestic violence and can affect reporting. As such, our dataset is limited

including all available surveys that met the following WHO study criteria: (1) population-based, (2) representative at a national or subnational level, (3) conducted between 2000 and 2018, and available by 2019, and (4) used acts-based measures. 

The country-level, regional, and global estimates are presented as 2018 data, but are derived from surveys spanning the period 2000 to 2018.

What are the ideological effects of how our sources have been divided into data?

Because the data is not always readily available for all countries, the data seen was collected between 2000-2018. Our data has been divided by year, country, and indicator topic. Because the data was divided in this way, the time and place of the data are emphasized and meant to be distinguished. Indicators being grouped into topics imply that all the indicators within a topic directly contribute only to said topic. If the dataset were our only source, we’d be missing data from regions smaller than countries, data from time periods smaller than years, data collected before 1960, data from countries that no longer exist, and data from excluded indicators. 

What information is included in the dataset? What can it illuminate?

The dataset includes information about the proportion of women who experienced domestic violence and can be sorted by income level, age, and region. This can help reveal if domestic violence is more prevalent amongst those in a lower income bracket or if DV is more prevalent in certain age ranges. This relates to our research question about how domestic violence varies with age and what demographic of women (age, ethnicity, region) experience the most DV. The dataset also includes information about whether countries have legislation about DV and sexual harassment at work. This can be linked with that country’s DV rates to see if legislation has an effect on decreasing DV rates or not. If there is legislation but a high DV rate, cultural views or norms can be researched to see if that has a bigger impact on DV rates. Additionally, there are many indicators that contain more specific data that can help reveal specific phenomena. For example, one indicator is women who have not experienced spousal physical or sexual violence (% of currently married women age 15-49 who have been married only once). This can conversely reveal which country has the lowest DV rate and whether remarriage affects DV rates. In total, there are 27 indicator data that have specifics such as the one listed above. Each indicator can illuminate a different aspect of DV rates and can be used to support a research question. Apart from data specific to violence, there are also hundreds of indicators relating to finances, education, employment, and gender-role beliefs. For example, we would be able to compare countries’ female literacy rates side by side with their female DV rates, which could illuminate any potential connection between the two indicators. The analyzed data in our project is exclusive to a few select East Asian, Southeast Asian, and South Asian countries including examples such as China, the Philippines, and India.

The Three Levels of a Digital Humanities Project

Sources: We chose our main data set because of the plethora of numeric information it provides relating to dynamics between men and women all over the world. There were hundreds of indicators (variables) and over a hundred thousand observations. We also chose it because we could be sure that the data is reliable since it’s managed by organizations funded by multiple governments. This data is available for public use and is therefore easily replicable.

Processing: We processed our data with RStudio. RStudio is a free computer program built for statistical methods. The dataset was significantly reduced to less than twenty indicators and less than fifteen economies (generally countries) within a matter of minutes. RStudio also allows for easy data lookup with just a couple lines of code, letting us observe which data points to determine which variables would provide the most information and be the most relevant. RStudio’s quick and accurate computing was why we chose the program for data processing.

Presentation: We decided to present our numerical data with Tableau. Tableau has numerous types of data representations available, from simple tables to scatterplots to maps. Thus, being able to create all our visualizations in one place was a reason for using Tableau. Tableau also has many types of customization options that allow for embedded visualizations to be interactive. We believed that interactivity could make our project more memorable and offer a deeper understanding of certain topics. WordPress was used to format our writing and our visualizations because it provides a simple and pleasant user interface and many options for formatting.

Our inspiration for our project was the “All One People and Under One King” project by Maeve Kane in 2018. This author handled the processing and presentation of her data particularly well, being very creative in her representation of personal connections.

Next Up

Learn about the sources we used!