Project 1: COVID-19 SITUATION IN CALIFORNIA 03/2020 - 11/2021
This project is about creating visualizations based on the dataset about the COVID-19 situation in the state of California. Through the visualizations, we should be able to answers how hard COVID hit California and how California handled the pandemic.

Introduce the problem
Coronavirus disease 2019 ( as known as Covid 19) is a contagious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), according to World Health Organization (WHO). The very first case of this disease was identified in the city of Wuhan, China and has since spread worldwide, which leads to a global pandemic. According to WHO, there are almost 400 millions of confirmed cases with over 5 millions of death, as the data updated in January 2022. This pandemic has impacted notably people's lives around the world including the US. The state of California is one of the states in the United that got hit hard the most by Covid-19. In fact, the pandemic has been negatively impacting California in various aspects including healthcare, economy, live quality, ... In this project, we will focus on how Covid impacted the healthcare industry of California. Through this project, we will be able to answer these two important question:
How hard Covid hit California?
How California's healthcare system handles the Covid situation?
Introduce the data
This data set is provided by the Government of California. This data set is about the Covid situation in California in the period of time from 03/2020 to 11/2021. The main reason I chose this dataset to analyze because it contains vulnerable data that helps me find the answers for the two questions I stated in the introduction of the problem.
This dataset contains data about the name of all the counties of California, the number of hospitalized covid patients, the number of ICU covid patients, the number of ICU bed available for each date for each county in the period of time from March 2020 to November 2021.
There are some notable features contained in this dataset that I really like. First of all, this dataset contains the data about the number of hospitalized covid patients every day from o3/2020 to 11/2021, which makes it easy for me to analyze the significant rise in the number of hospitalized covid patients. This will help me answer the question about how hard Covid hit California that I stated in the introduction to the problem part. Moreover, this dataset contains data about the availability of ICU beds for each county of California, which will help me analyze and observe how California handles the Covid situation. Furthermore, the types of data in this dataset is another notable feature that I really like. The type of almost all the data in this dataset is float except the string value for the counties name column and datetime value for the column about date, which is perfect and suitable with the purpose of this project and my intended use of this dataset.
I downloaded this dataset from Snowflake. Here is the link to my data set: https://gfa66890.us-east-1.snowflakecomputing.com/console#/data/tables?databaseName=PUBLIC_MKTPLC_VIEWS
Preprocessing the data
For the preprocessing step, I cleaned up the dataset by removing 2 columns that contain all null values and then remove any row that contains missing value. Then, I observed the data types, the shape, the column's names, the appearance of 0 value. Please click to the button below to read more about how I preprocessed my data set and the explanation for each preprocessing step I did.
Data Understanding/ Visualization
I will use plotly package to create a graph that show the remarkable rise in the number of hospitalized covid patients day by day. Notably, when you hover the mouse over the graph it will show you the specific number of hospitalized covid patients for each specific date.
I will use Seaborn package to create another visualization to draw a comparison between all counties about the availability of ICU bed to see which county is able to handle the Covid situation well and which county is not prepared to handle the situation.
I will also create a visualization to compare the number of ICU covid patients and the number of ICU bed available in Los Angeles county specifically to see if the number of ICU beds is able to handle the rise in the number of ICU covid patients.
Story Telling
Through this project and visualizations I created based on the dataset about the Covid-19 pandemic in California in the period of time from March 2020 to November 2021, we cannot deny the fact that California is one of the states that get hit the most by the outbreak of Covid-19. In fact, the remarkable in the increase of the number of hospitalized covid patient and the number of ICU covid patient is able to emphasize how bad the outbreak of covid is California has been. Notably, at the peak of the outbreak, there was almost 9000 covid patients were hospitalized in a day. Even though the number of covid patients in ICU units increased dramatically, some counties such as San Francisco, Los Angeles, Tehama prepared properly to handle that dramatic rise in the ICU covid patients by having a good number of available ICU beds. Concretely. we noticed that although Los Angeles is one of the counties has the most ICU bed available but when Covid reached its peak around January 2021, the number of ICU bed was clearly not enough to handle the dramatic rise in the number of covid patients transferred to the ICU unit. We can conclude that although some counties has prepared properly to handle the situation by preparing a good number of ICU beds like Los Angeles, Tahama, San Francisco, ... it was still not enough to defeat the significant rise of the Covid patients got transferred to ICU units. This is able to emphasize how hard Covid hit California, which is able to answer one of the questions I stated in the introduction about how hard Covid hit California. Notably, no matter how hard Covid hit, California has been doing a great job in keeping the situation under control. In fact, the number of ICU covid patients in Los Angeles has decreased significantly from 1724 patients a day in January 2021 to 40 patients a day in July 2021. Moreover, the number of ICU bed available has also been kept stable. Notably, by looking at the last visualization I created, we can notice that the number of available ICU beds has been able to handle the rise in the number of ICU covid patients since February 2021. Not only being able to handle the number of patients transferred to ICU units, the number of ICU bed available in Los Angeles county exceeds the number of ICU covid patients This trends not only happened in Los Angeles county but also in almost all of counties of state of California so all of California's counties do not worry about the lack of ICU beds in the fight against Covid 19 anymore. This emphasizes how well California handles the Covid 19 situation, which also answers another question I stated in the introduction about how California's healthcare system handles this global pandemic.
References
Jupyter Notebook Tutorial in Python. https://plotly.com/python/ipython-notebook-tutorial/
Matplotlib: Visualization with Python. https://matplotlib.org/
Coronavirus Disease (Covid 19) Pandemic. https://www.who.int/emergencies/diseases/novel-coronavirus-2019