Full project - link

Introduction & Background

The growing non-medical use of prescription drugs is a global health concern. The Canadian’s Non-Medical Use of Prescription Drugs Survey data collected by RADARS® System gives an opportunity to understand the situation. Our team explored the raw survey data by visualizing, and came up with three research questions for the report.

Data Wrangling/ Feature Engineering

Assumptions

  1. We assumed all surveys were answered truthfully and recorded without errors

Questions/Problems

  1. There are 185 variables in total to consider. How do we explore all of these?
  2. About 20% of the cells in the dataset are NANs.

Solution

  • We used principal components and mutual information to compute which variables to keep.
  • We fill the missing value with best logical answers. For example, variables on pregnancy for all males were ‘NA’. We fill the missing value with 0.

Research Question 1: What types of social groups best explain non-medical use?

We used principal components and mutual information to compute which variables to keep.

Steps

  1. Principal components algorithm to reduce dimensionality and capture social groups.
  2. Took the top components from PCA then calculated mutual information to find which component best explained ‘NMU’ variable
  3. Visualized each social groups
C:\Users\tonyl\anaconda3\lib\site-packages\fastbook\__init__.py:18: UserWarning: Missing `graphviz` - please run `conda install fastbook`
  except ModuleNotFoundError: warn("Missing `graphviz` - please run `conda install fastbook`")

Above graph shows the process of capturing 'Top Non-Medical Drug Use Social Groups'.

  • Top Mutual Information of 'NMU': MI scores of original dataset with 'NMU' as response variable. Showed that Opioid, Codeine and Coccaine were the top explanation for NMU.
  • Principal Component Variance Capture Plots: We can see that PCA succesfully captured about 90% of total variance with just 100 components.
  • Top Mutual Information of 'NMU' with Principal Components: Greatest score was 'PC3' followed by 'PC10' then so on. Below it, we can see the example of dissecting PC3 where it is used to find out that PC3 corresponds to people from Quebec.

View interactive visualization_1 in Tableau - link

Top Non-Medical Drug Use Social Groups

View interactive visualization_2 in Tableau - link

The specificness of PCA's component interpretation allows us to analyze the group closely. (Without having to explore billions of possible combinations of groups)

View interactive visualization_3 in Tableau - link

Conclusion

  • Quebec was most severe social group suffering from Non-Medical Drug Use in Canada and we have looked at several other social groups