- I want to perform “Geospatial Patterns of Protest Activities During Election Cycles”.
- I want to investigate how protest occurrences vary geographically during different phases of election cycles, such as pre-election campaigns, election days, and post-election periods.
- Further, want to apply some machine learning algorithm to predict the hotspot regions in next elections cycle.
- I have got 1 source “United States Election Assistance Commission” https://www.eac.gov/research-and-data, and I am exploring it, if it provides the elections data.
- If anyone knows the exact source, it will make my life easy :).
9th Lecture – Advance Mathematical Stats
- ACLED data has large number of columns with multiple unique values in each column.
- To better understand the data, I went through the documentation of data on ACLED website, https://acleddata.com/knowledge-base/guide-to-2023-acled-column-changes/.
- I am going through the documentation to get better understanding of the data and then come up with some questions to perform on this.
5th Lecture – Advance Mathematical Stats
- Completed the programming part of the project for problem statement “A Racial Disparity in Police Shootings Across US Counties.”
- Racial Disparity in this project is defined as a race has more share in shooting as compared to the county population.
- I have found Baltimore, Cook, Fulton counties where Black race has face continuous disparity in shooting throughout the data being recorded 2015-2023.
- Despite being may be 1/3rd of the population, they were among the majority or sometimes the only people being shot by police.
- I made another analysis and find out counties where at least 2 races have faced disparity for at least 6 years, and I found the following results.
- Orange- B, H
- El Paso- B, W
- Miami Dade- B, W
- Maricopa- B, N
- San Bernardino- B, W
- Alameda- B, H
All the above analysis are dynamic, and we can find the counties where a continuous disparity was observed within a certain time period i.e., 2020-2024, 2015-2019.
Also, we can find counties where multiple races have face disparity in a non-continuous manner.
4th Lecture – Advance Mathematical Stats
- I wrote python script to fetch Ethnicity and Race data of all counties for year 2015-2023 from data.census.gov using API key.
- I transformed the data and calculated race distribution of races in each county per year.
- In Washington police shooting data, there were 4692 records where county was missing. I impute the counties with the help of “geopy” library with city and state fields in data.
- I transformed the Washington police shooting data and calculated shot race distribution of races in each county per year.
- Next steps will be working on how to compare them to have a better idea of counties where a particular racial group experiences a disproportionate number of shootings despite not being the majority of the population.
3rd Lecture – Advance Mathematical Stats
- I have to go through extensive the documentation of United States Census Bureau https://data.census.gov/ for fetching data.
- It took me so much time, to understand the metadata and the specific codes by which the data is provided for races and ethnicities.
- I have made a python program to fetch data using API, for specific county, year combination and also modifying the code to fetch data for all countries from 2015-2023 time period.
- I want to identify counties where a particular racial group experiences a disproportionate number of shootings despite not being the majority of the population.
2nd Lecture- Advance Mathematical Stats
- I want to do the comparison of %age of races being shoot vs %age of races presents in each county of US.
- I want to identify counties where a particular racial group experiences a disproportionate number of shootings despite not being the majority of the population
- For this analysis, I need the population by race and year. I have found out a site https://data.census.gov/, that provides data on different topics like
-
Business and Economy
-
Education
-
Employment
-
Families and Living Arrangements
-
Government
-
Health
-
Housing
-
Income and Poverty
-
Populations and People
-
Race and EthnicityWith respect to different geographies:
-
Nation
-
State
-
County
-
County Subdivision
-
Place
-
ZIP Code Tabulation Area
-
Metropolitan/Micropolitan Statistical Area
-
Census Tract
-
Block Group
-
All Geographies
-
-
- Census.gov has also provided an API to access the data programmatically, here I am provided the link to the documentation:
API Documentation: https://www.census.gov/data/developers/guidance/api-user-guide.html - Currently, I am exploring the API documentation and one more direction does income, and poverty has any effect on county where non-majority races are shot more often.
- I found 1 record disturbing, that a young black woman who was identified as mentally ill, unarmed and was not trying to flee was alsosh shot in Illinois. The agency that shot the woman has just 1 record of shooting only.
id | date | threat_type | flee_status | armed_with | city | county | state | latitude | longitude | location_precision | name | age | gender | race | race_source | was_mental_illness_related | body_camera | agency_ids |
10693 | 6/7/2024 | not | unarmed | Springfield | Sangamon | IL | 39.76468314 | -89.63039384 | block | Sonya Massey | 36 | female | B | photo | TRUE | TRUE | 23443 |
21st Class – Advance Mathematical Statistics
- Professor gave a walkthrough of the mathematics552 site.
- We understand the activity diagram and its components.
- We shortly discuss about how to write a report of our project, like describing something to a layman.
- We did go through the Washington post “police-shooting-data” that has longitude and latitude information, did brainstorming, and try to analyze the data on surface level in class and come up with some generic findings.
- I personally talk with the professor about the approach on how to deeply analyze the data and come up with the best problem statement.