While many research projects have been completed from start to finish within university labs, there are numerous benefits to working with industry…
Last month, a PLOS ONE paper was added to PLOS’ COVID-19 Collection. What makes this paper a bit more noteworthy is its author. Charit Narayanan is a senior at Mission San Jose High School in Fremont, California and not a researcher at a University, which is regularly the case. We caught up with him for a Q&A on his PLOS ONE study: A novel cohort analysis approach to determining the case fatality rate of COVID-19 and other infectious diseases.
AV: How did you come up with the idea for the paper?
CH: During the days when the COVID-19 outbreak was largely confined to China and Italy and people were widely apprehensive about the prospect of the virus entering the United States, there was much talk about the case fatality rate (CFR)— the proportion of people with confirmed infections who die. The most prevalent method at the time, dividing deaths by cases, intuitively seemly flawed to me. That led me to explore how I would go about evaluating the CFR. Surprisingly, the answer lay in my past work on user behavior.
Over the last year or so, I’ve been writing blog posts on how users engage with products such as Facebook and Instagram. One metric used to understand the health of products is the number of active users: people who use a product in a given timeframe. On any given day, the total composition of active users is a mix of people who have installed the app at different times.
Of the >2 billion people who use Facebook each month, some began using it a year ago; others, a couple of months ago; a few may have joined yesterday. Regardless of when they joined, the percentage of people from each cohort that return to the product remains roughly the same, that is, they are “retained” at a comparable rate. Similarly, the case fatality rates of different cohorts of infected people remain roughly equal, so we can apply the methodologies developed to understand product retention to determining the CFR.
AV: How long did you work on the paper?
CH: I initially formulated the idea as a blog post. It caught the attention of Prof. Pramod Srivastava at the University of Connecticut, who encouraged me to modify and submit it to PLOS ONE. From first formulating the idea to finishing the manuscript, I spent 200 hours, more or less. I can segment all my work into roughly four stages: preliminary research and background reading, devising the method and coding my model, conducting the analysis, and lastly, writing the manuscript.
AV: Can you tell us about your experience approaching this project using open data sets?
CH: Open datasets are very useful for research similar to mine. Without the Johns Hopkins data on COVID-19, I wouldn’t have been able to finish this project. The open data was updated on a daily basis and provided in a very intuitive format. Moreover, with Python code, it was possible to automatically access this data directly from the internet. The availability of easily accessible, comprehensive datasets has revolutionized research. Given the importance of data in today’s world, I would encourage students to develop familiarity with mining, analyzing, interpreting, and solving problems with data early on.
AV: You seem to have a huge interest in data. Can you tell us more about your Tik Tok analysis?
CH: During the time when I wrote the article, in late 2019, TikTok had seemingly spontaneously appeared on the top charts of the app store. Was it just another one of those short-lived viral sensations that sporadically pop up or was it here to stay? I looked to user data in order to understand the app’s dynamics. Analyzing a number of product metrics, such as downloads, cohort retention, and open rates. I found you could segment TikTok users into two groups: those who absolutely loved the product and people who gave it a try but ultimately didn’t get hooked. Additionally, it was acquiring new users quickly and users were growing increasingly engaged, a promising sign of long-term user retention (users are more likely to stay on the app).
I found that it largely built off the momentum of its initial virality and adolescents (its core users) were primarily responsible for driving its initial success. This did indeed hold true, as the app is hugely more popular than it was at the time I published the article. I suspect that over time, they have been able to convert a greater proportion of new users to the highly engaged group.
AV: Where did you come by your interest in science? Does it run in your family or was it sparked in school?
CH: Influences from school, my family, and books/the internet have all shaped my interests. School has taught me discipline and provided a structured way of learning. For instance, my Psychology class introduced me to the study of judgement and decision making, and through my Human Geography class, I developed an interest in learning about cultural influences on behavior, voting patterns, etc. My family also has a strong background in research, and I have numerous relatives who have published scientific papers. Additionally, I owe a lot to the internet, which has helped contextualize my learning.
A decade from now, I see myself doing work to help solve social issues such as economic inequality and climate change. Ultimately, I want to do what I can to help the world achieve clean, sustainable economic growth while reducing the wealth gap that exists between the rich and the poor. In order to do so, not only is it necessary to be “data fluent”— able to approach questions and solve problems using data— but also to have a deep understanding of human behavior, society, and economics.
AV: How much of your data work is influenced by formal schooling vs individual interest? Motivations, etc?
CH: Considering I am mostly self-taught, my formal education has not had a strong influence on my analytical skills. I initially began learning data science by downloading open datasets and investigating trends on the spreadsheets themselves. Soon, tinkering with these large datasets with Excel grew tedious and I realized that programming could help scale my analyses. Once I began learning Python, I understood the boundless possibilities offered by code and greatly preferred it to manually working off of spreadsheets for its efficiency and simplicity. My paper required a fair amount of sophisticated code, and I’m very glad I learned programming prior to it (while I’ve never taken a formal course).
I acquired an interest in research primarily probably due to my intrinsic curiosity about things. I’ve always liked telling stories and understanding behavior and phenomena through data. That is not to say school has not played any role in cultivating my interests; courses such as Human Geography and Economics have been instrumental in shaping my interests and analytical skills, and in the future, I plan to explore the applications of data science to the social sciences.
AV: Did you run into any barriers while seeking publication as a high-school student?
CH: Most of the roadblocks I encountered were during the stage of writing my manuscript. As I had never done anything comparable, I encountered difficulties structuring my thoughts, using the correct terminology, and even using LaTeX to transcribe my manuscript.
AV: Have you gotten any reactions on your paper?
CH: I’ve gotten quite a few. Thankfully, they are overwhelmingly positive, and I’m very grateful for the extensive constructive feedback I’ve received.
AV: Is there anything we could be doing better to help researchers working on COVID-19 related research in real time to help them advance faster?
CH: For epidemiological/disease modeling research such as mine, easily accessible and transparent data is imperative. Since I initially had trouble searching for scientific work relevant to my project, I believe that increasing the accessibility of ongoing and previous research in specific subfields (i.e. CFR modeling) is also vital, especially for novice researchers like myself.
AV: Are there any lessons we can learn from the wide availability of COVID-19 research that could apply to advancing all science faster in the future?
CH: I think the COVID-19 pandemic has taught us three valuable lessons about research: the availability of data, dissemination of research, and widespread collaboration among researchers has enabled advancement in science at a rapid pace.
AV: How do you think our lives are going to be impacted by COVID-19 in 1 year vs. in 5 years?
CH: The short-term impact: economic effects like mass unemployment, worsened wealth gaps and socioeconomic racial disparities, etc. will unfortunately persist over the next several years. I don’t see public transportation, shopping establishments/restaurants, and many workplaces widely opening without restrictions any time soon. Even if things begin opening up quickly, people’s attitudes and behavior may not drastically change; most people, especially the elderly and other high-risk groups, will likely still quarantine and practice social distancing. Improved testing, contact tracing, and the successful development and administration of a vaccine globally will be paramount to containing COVID-19 in the future.In the long term, I think countries will rethink globalization and attempt to become more self-reliant. Countries and corporations will invest more into internet infrastructure so people can work remotely.
AV: Besides doing science what do you do for fun?
CH: I enjoy photography, listening to music/podcasts, and watching TV shows.
AV: Anything else you want to add?
CH: I am sincerely grateful to the community for helping me in so many different ways.
AV: Thank you for taking the time with us and good luck.
Each cell contains the case fatality rate (CFR) for a given cohort on a given day. Each row represents a cohort who experienced the onset of disease on the same day. From the PLOS ONE study A novel cohort analysis approach to determining the case fatality rate of COVID-19 and other infectious diseases.