Ontario Schools COVID-19

(Written December 2021)

Crazy times...

Our family took the leap and moved out of downtown to the burbs. It was an area we gravitated to during lockdown, a place to roam with our son out in nature.

I knew there were great schools in the area, but what I didn't know was that there was a high number of COVID cases in these same schools. That piqued my interest and I asked around to see what other parents might be interested in knowing...

"How does the school rank in comparison? Are things getting better? Are some districts/boards doing better than others due to their policy on online vs in class? How is public vs catholic vs private?"

I wasn't sure exactly which questions I'd tackle but I'd see what data I could get my mitts on and see where that took me.

From a toolkit perspective, I had been looking for a reason to try out Elastic / Kibana, especially for the cool looking heat maps.

Steps:

1. Create free account with Elastic Cloud (note to self, I need to finish this within a couple weeks before my trial ends)

2. Find data...searching...searching

3. In Elastic, create "index patterns" that will be used for maps and graphs that will be aggregated into a dashboard

Code not hosted on GitHub (this was done with no coding, using Elastic)

Elasticsearch, Kibana, Logstash...

Some basics first...

At one of my previous employment, the company used Elastic and Kibana for providing interactive analysis on operational runtimes for batch jobs. I had thought it was cool to be able to view metrics across a time horizon, picking out anomalies, drilling down for detail, and being able to gain actionable insights.

Historically these tools were primarily used for searching and visualizing from logs but those boundaries have been pushed into so many other useful use cases.



Data, data, data

The Ontario government records and publishes data on a daily basis for the number of COVID cases in schools. I used the data from the period of September 2021 to December 2021.

I wanted to map the cases and in order to do so, ended up having to join the data with other sources containing the addresses of the schools.

A quirk with the data was that it reports as total COVID cases per day per school. I wasn't able to validate the increase in cases day over day. The same case could be reported multiple days and I wouldn't know if it was the exact same case or whether it was dropped, only to be replaced by a new case.

This meant that I could only report on the total per day or have an idea to the general magnitude. Not a true depiction of daily increase in cases nor a true representation across a time continuum.

Elasticsearch

Using a free trial version, I was able to quickly load the data and Elasticsearch being a document oriented database, I needed to create an index which I could search on later using index patterns. It was interesting to see concepts of NoSQL, graph-oriented database, pipelining, ETL, etc. all being possibilities with the ELK stack. I didn't even touch Logstash but through my research, could envision the power and purpose of using it to setup a pipeline for ingestion, orchestrating transformations, and the usage and availability of plugins for further extensibility.

Kibana

Now to the fun part...once I had the data, I was able to quickly map out the cases and visualize on an interactive heatmap that can be drilled down.

(apologies in advance, you will have to click the play button to play the video)

map2.mov

...or one with a sliding time scale to visualize week over week

animate map1.mov

By utilizing various graph templates, I was able to start gleaning some insights:


Municipals

Note the increases for Barrie and Sudbury

muni.mov

School districts

Note the decrease in Dufferin-Peel Catholic and the increase in York

school districts.mov

Postal code

Interest in postal codes in the Toronto area, starting with letter 'M', seeing whether trend indicates cases are recent, occurred earlier, or have been consistent throughout

postal.mov

Schools

Looking at specific school and the cases over time, able to see whether cases are happening more recently, earlier in the year, or consistent throughout

school.mov


After creating disparate charts and graphs, Kibana allows you to easily pull it together into an interactive dashboard


What did I learn?

I didn't get too crazy with trying to source supplementary data to that provided by Ontario government for COVID cases in schools. I did thoroughly enjoy dabbling with Elastic/Kibana as a non programmatic way to quickly tap into data to produce some meaningful and interactive maps and charts.

To summarize some of the trends observed:

Late September

Municipals: Barrie, Sudbury

York Region District School Board

Mid to late October

Municipals: Brampton, Mississauga

Dufferin-Peel Catholic District School

There could be potential for some interesting analysis to correlate number of COVID cases in accordance to specific events e.g. when school board changes policy to switch from in person to online or vice versa. Even with the fore-mentioned observation indicating a down trend in cases during mid/late October, it would be interesting to know what the cause / correlation could have been.

But for now, I'm content to have dabbled with Elastic/Kibana and realize that from a technical perspective, I could have explored much more in order to setup a proper pipeline should I wish to expand on this project.


From the perspective of a parent and viewer of the data, its daunting to see the numbers ever increasing. As at December 24, 2021 Ontario is reporting 1097 out of 4844 schools with cases (~23% of schools); cumulative total of 12,062 cases.