library(tidyverse)
library(survey)
library(viridis)
library(table1)
library(kableExtra)

knitr::opts_chunk$set(
  warning = FALSE,
  message = FALSE,
  echo = FALSE,
  fig.width = 10
)

theme_set(theme_minimal() + theme(legend.position = "bottom"))

options(
  ggplot2.continuous.colour = "viridis",
  ggplot2.continuous.fill = "viridis"
)

scale_colour_discrete = scale_colour_viridis_d(option = "viridis")
scale_fill_discrete = scale_fill_viridis_d(option = "viridis")

This data analysis uses the resulted data set cleaned by the processing described in data analysis part of Data. According to our motivation, we are interested the relationship between drug use and youth behaviors. In this analysis, we will explore the association between drug use status and each health behavior variables.

There are three kinds of health-related variables and we will use different methods to analyze: - Bi-level categorical variables: drunk driving, carrying weapon, suicide attempt, physical fight, quit-smoking, early-age sexual intercourse, heavy screening usage, enough sleeping - Multivariate categorical variables: smoking status, binge drinking status, seat belt usage, GPA(grades in school) - Quantitative Variables: initial smoking age, initial drinking age, BMI

Here is a table summarized the overall distribution of important variables by drug use status.

Three-line-table

Table 1: Distribution of risky behaviors by drug use status
No Drug Use
(N=13819)
Light Dose
(N=4257)
Heavy Dose
(N=1778)
Overall
(N=19854)
Drunk Driving
Yes 1333 (9.6%) 875 (20.6%) 652 (36.7%) 2860 (14.4%)
No 12486 (90.4%) 3382 (79.4%) 1126 (63.3%) 16994 (85.6%)
Text Driving
Yes 8191 (59.3%) 2892 (67.9%) 1282 (72.1%) 12365 (62.3%)
No 5628 (40.7%) 1365 (32.1%) 496 (27.9%) 7489 (37.7%)
Weapon Carrying
Yes 323 (2.3%) 179 (4.2%) 266 (15.0%) 768 (3.9%)
No 13496 (97.7%) 4078 (95.8%) 1512 (85.0%) 19086 (96.1%)
Suicide Attempt
Yes 1645 (11.9%) 1205 (28.3%) 668 (37.6%) 3518 (17.7%)
No 12174 (88.1%) 3052 (71.7%) 1110 (62.4%) 16336 (82.3%)
Quit Smoking
Never Smoke 12344 (89.3%) 2245 (52.7%) 359 (20.2%) 14948 (75.3%)
Yes 751 (5.4%) 1069 (25.1%) 656 (36.9%) 2476 (12.5%)
No 724 (5.2%) 943 (22.2%) 763 (42.9%) 2430 (12.2%)
Physical Fight
Yes 1620 (11.7%) 1080 (25.4%) 820 (46.1%) 3520 (17.7%)
No 12199 (88.3%) 3177 (74.6%) 958 (53.9%) 16334 (82.3%)
Sexual Intercourse
Yes 807 (5.8%) 839 (19.7%) 741 (41.7%) 2387 (12.0%)
No 13012 (94.2%) 3418 (80.3%) 1037 (58.3%) 17467 (88.0%)
Screening Use
Mean (SD) 3.65 (2.36) 3.86 (2.48) 3.80 (2.64) 3.71 (2.41)
Median [Min, Max] 3.50 [0, 10.0] 4.00 [0, 10.0] 3.50 [0, 10.0] 3.50 [0, 10.0]
Sleeping Time
Mean (SD) 6.71 (1.30) 6.36 (1.33) 6.16 (1.38) 6.59 (1.33)
Median [Min, Max] 7.00 [4.00, 10.0] 6.00 [4.00, 10.0] 6.00 [4.00, 10.0] 7.00 [4.00, 10.0]
Smoking Status
Never Smoker 13596 (98.4%) 3743 (87.9%) 1078 (60.6%) 18417 (92.8%)
Light Smoker 212 (1.5%) 499 (11.7%) 642 (36.1%) 1353 (6.8%)
Heavy Smoker 11 (0.1%) 15 (0.4%) 58 (3.3%) 84 (0.4%)
Binge Drinking
No Binge Drinking 13359 (96.7%) 3308 (77.7%) 921 (51.8%) 17588 (88.6%)
Light Binge Drinking 419 (3.0%) 792 (18.6%) 596 (33.5%) 1807 (9.1%)
Heavy Binge Drinking 41 (0.3%) 157 (3.7%) 261 (14.7%) 459 (2.3%)
Seat Belt Use
Never or Rarely 526 (3.8%) 307 (7.2%) 284 (16.0%) 1117 (5.6%)
Sometimes 1002 (7.3%) 515 (12.1%) 304 (17.1%) 1821 (9.2%)
Most of the time or Always 12291 (88.9%) 3435 (80.7%) 1190 (66.9%) 16916 (85.2%)
Grades in School
Mostly A's 7325 (53.0%) 1565 (36.8%) 431 (24.2%) 9321 (46.9%)
Mostly B's 4403 (31.9%) 1627 (38.2%) 663 (37.3%) 6693 (33.7%)
Mostly C's 1404 (10.2%) 744 (17.5%) 418 (23.5%) 2566 (12.9%)
Mostly Below C's 304 (2.2%) 172 (4.0%) 177 (10.0%) 653 (3.3%)
None of these grades 28 (0.2%) 10 (0.2%) 17 (1.0%) 55 (0.3%)
Not sure 355 (2.6%) 139 (3.3%) 72 (4.0%) 566 (2.9%)

Hypothesis Testing

Chi-square Test

For bi-level and multi-level categorical variables, we test the independence of two categorical variables, so we should use the R \(\times\) C chi-square test with hypotheses that

\[ H_0:\text{There is no association between drug use and the health-related variable of interest} \] \[ H_1:\text{There is an association between drug use and the health-related variable of interest} \] Other than the chi-square test, we also use the Cramer’s V Effect Size to measure the magnitude of the association. It is measure by the manner: \[ V = \sqrt{\frac{X^2}{n*df}} \] where \(X^2\) is the chi-square statistic, n is the sample size and \(df\) is the degrees of freedom.

Usually, we use the table below to assess the association through the Cramer’s V effect size

df small_effect medium_effect large_effect
1 0.10 0.30 0.50
2 0.07 0.21 0.35
3 0.06 0.17 0.29
4 0.05 0.15 0.25
5 0.04 0.13 0.22

Besides the crude analysis which test the association for all observations in the data set, we also perform a stratified analysis over grade, gender and race to investigate the interaction or confounding effect of these demographic variable on the crude association.

ANOVA

In order to test whether there is association between drug use and BMI, which is a continuous variable, we perform a ANOVA test to test the hypothesis that \[ H_0:\text{the mean BMI of students are same across all drug use status} \] \[ H_1: \text{At least two drug use status groups have different mean BMI} \]

Bi-level

1. Drunk Driving

Table 2: Drug Use Status vs Drunk Driving
stratum statistic p.value df cramer_v_effect_size
crude
Overall 1099.24319 0 2 0.3327652
grade
9th Grade 207.30809 0 2 0.2691369
10th Grade 292.83687 0 2 0.3270364
11th Grade 255.36494 0 2 0.3195389
12th Grade 315.05617 0 2 0.4153781
sex
female 468.87966 0 2 0.2991335
male 656.12915 0 2 0.3741512
race
White 837.21536 0 2 0.3756641
Black or African American 53.02572 0 2 0.2281722
Hispanic/Latino 133.39145 0 2 0.2694694
All other Races 114.75455 0 2 0.3174118

From Table 2, we can see that the overall p-value is significantly small and the p-value of each stratum is significantly small to say that there is association between drug use status and drunk driving. There are differences of effect size among grade, gender and race, implying some interaction effect, but the lowest effect size is still above 0.22, suggesting a moderate association between drug use status and the drunk driving. The effect size demonstrates that the association might be stronger among higher grade, male and white race.

Overall:

Stratified Analysis:

All proportion plots display that an increasing level of drug use frequency has a higher proportion of drunk driving. The dose-response relationship also verifies the existence of the association.

2. Cell Phone Use While Driving

Table 3: Drug Use Status vs Cell Phone Use During Driving
stratum statistic p.value df cramer_v_effect_size
crude
Overall 184.161787 0.0000000 2 0.1362043
grade
9th Grade 2.514184 0.2844800 2 0.0296390
10th Grade 37.199427 0.0000000 2 0.1165605
11th Grade 96.546099 0.0000000 2 0.1964765
12th Grade 116.836271 0.0000000 2 0.2529522
sex
female 80.399822 0.0000000 2 0.1238688
male 104.714674 0.0000000 2 0.1494708
race
White 170.964904 0.0000000 2 0.1697597
Black or African American 6.351851 0.0417555 2 0.0789714
Hispanic/Latino 25.164216 0.0000034 2 0.1170408
All other Races 4.285642 0.1173234 2 0.0613403

From Table 3, we can see that not all p-values are significant enough to conclude an association between drug_use status and cell phone use while driving. Since younger children have much less chance of driving than older children, it is make sense that the association is heavily impacted by the grade. However, it also shows no association between drug_use status and cell phone use while driving among Black or African American and all other races. In a conservative perspective, we would conclude that the association between drug_use status and cell phone use while driving is weak, especially among Black or African American and all other races students.

Overall:

Stratified Analysis:

The overall proportion of cell phone use while driving rises while the level of drug use rises. However, the increment is small and invisible after stratifying by races. Female tends to have higher proportion of cell phone use while driving regardless of drug use status.

3. Carrying Weapon

Table 4: Drug Use Status vs Carrying Weapon
stratum statistic p.value df cramer_v_effect_size
crude
Overall 676.6931 0 2 0.2610880
grade
9th Grade 275.8729 0 2 0.3104700
10th Grade 172.8207 0 2 0.2512356
11th Grade 139.7170 0 2 0.2363566
12th Grade 125.8154 0 2 0.2624922
sex
female 221.2204 0 2 0.2054693
male 416.7429 0 2 0.2981856
race
White 257.4158 0 2 0.2083045
Black or African American 150.6375 0 2 0.3845794
Hispanic/Latino 190.5880 0 2 0.3221019
All other Races 151.6157 0 2 0.3648465

From Table 4, we can see that all p-values are significantly small to conclude that there is association between drug use status and carrying weapon. The overall effect size is around 0.26 while the lowest effect size is greater than 0.2, so we are confident to say that it is an medium association. However, the large difference of effect size among gender and race implies the interaction effects of gender and race on the association. The association between drug use status and carrying weapon is stronger among male compared to female; the association between drug use status and carrying weapon is strongest among Black or African American and weakest among White.

Overall:

Stratified Analysis:

All plots demonstrate an rising trend of proportion of carrying weapon with the increase level of drug use frequency. The proportion of carrying weapon boosts from light dose drug use to heavy dose drug use. 9th Grade has highest proportion of carrying weapon regardless of drug use status while male has higher proportion of carrying weapon than female ar all drug use status. The proportion of Black and African American students carrying weapon suddenly escalates on heavy dose drug use.

4. Suicide Attempt

Table 5: Drug Use Status vs Suicide Attempt
stratum statistic p.value df cramer_v_effect_size
crude
Overall 1128.37931 0 2 0.3371464
grade
9th Grade 412.00584 0 2 0.3794171
10th Grade 370.52483 0 2 0.3678679
11th Grade 271.59791 0 2 0.3295387
12th Grade 159.41991 0 2 0.2954751
sex
female 716.88919 0 2 0.3698796
male 438.91473 0 2 0.3060149
race
White 735.17942 0 2 0.3520285
Black or African American 47.03447 0 2 0.2148956
Hispanic/Latino 189.10957 0 2 0.3208501
All other Races 192.74839 0 2 0.4113708

From Table 5, we can see that all p-values are small enough, so we could reject the null hypothesis that there is association between drug use status and suicide attempt. The overall effect size is around 0.33 while the lowest effect size is greater than 0.21, so the association is moderately strong. The interaction effect of race in the association is detected by the difference of effect size among different races. The association is strongest for all other races while it is weakest for Black or African American.

Overall:

Stratified Analysis:

Based on these proportion plots, the proportion of suicide attempt increases while the level of drug use frequency goes up. This trend also shows up at each stratum in the stratified plot. The dose-response relationship supports our conclusion about the association between suicide attempt and drug use status.

5. Quit smoking

Table 6: Drug Use Status vs Quit Smoking
stratum statistic p.value df cramer_v_effect_size
crude
Overall 16.0233514 0.0003316 2 0.0808217
grade
9th Grade 17.1191810 0.0001917 2 0.1856872
10th Grade 7.7000134 0.0212796 2 0.1090922
11th Grade 1.9523859 0.3767427 2 0.0532514
12th Grade 0.9685637 0.6161395 2 0.0394928
sex
female 5.0835165 0.0787279 2 0.0650595
male 9.5952996 0.0082491 2 0.0875441
race
White 15.1210110 0.0005206 2 0.0950840
Black or African American 18.0007244 0.0001234 2 0.3297965
Hispanic/Latino 15.9987629 0.0003357 2 0.2177240
All other Races 1.9253531 0.3818694 2 0.0832959

For quit smoking, we should focus on students who smoke so the number of non-smokers won’t bias our results. After filtering out the nonsmokers, the resulted table is in Table 6. Some p-values are not significantly small enough to conclude the association. The overall effect size is around 0.08 and the difference of effect size among grades and among races are significantly large, so the confounding and interaction effects heavily impacted the analysis of association. We would conclude that there is no association between drug use status and quit smoking.

Overall:

Stratified Analysis:

The proportion bar plots do not demonstrate any dose-response relationship since the proportion has no trend of increasing or decreasing across the drug use status. Moreover, the difference of proportion is small between each drug use status. Thus, these plots support the conclusion in the analysis table that there is no association between quit smoking and drug use status.

6. Physical Fight

Table 7: Drug Use Status vs Physical Fight
stratum statistic p.value df cramer_v_effect_size
crude
Overall 1494.64346 0 2 0.3880251
grade
9th Grade 576.12126 0 2 0.4486649
10th Grade 477.08710 0 2 0.4174285
11th Grade 398.62480 0 2 0.3992320
12th Grade 271.52502 0 2 0.3856155
sex
female 848.99278 0 2 0.4025190
male 693.55868 0 2 0.3846751
race
White 884.81376 0 2 0.3861954
Black or African American 98.51405 0 2 0.3110059
Hispanic/Latino 296.64639 0 2 0.4018509
All other Races 227.42897 0 2 0.4468492

From Table 7, we can see that all p-value are significantly small and the effect size are around 0.4, so the association between drug use status and physical fight is strong. The effect size over each stratum is very close to each other, so the interaction effect is negligible.

Overall:

Stratified Analysis:

The plots illustrate a positive relationship between drug use status and proportion of physical fight, so the proportion of physical fight is higher when the drug use is heavier. This positive trend is stable in each stratum in the stratified plot. The plots are consistent with our conclusion based on the statistical analysis table and enhance the inference about the association between drug use status and physical fight.

7. Early-age Sexual Intercourse

Table 8: Drug Use Status vs Early-Age Sexual Intercourse
stratum statistic p.value df cramer_v_effect_size
crude
Overall 2215.3034 0 2 0.4723975
grade
9th Grade 925.9117 0 2 0.5687874
10th Grade 689.4462 0 2 0.5018033
11th Grade 541.0647 0 2 0.4651229
12th Grade 307.7284 0 2 0.4105191
sex
female 1189.1246 0 2 0.4763740
male 1026.5625 0 2 0.4679993
race
White 1402.3034 0 2 0.4861856
Black or African American 114.6841 0 2 0.3355607
Hispanic/Latino 456.4091 0 2 0.4984511
All other Races 298.2025 0 2 0.5116745

From Table 8, all p-value are significantly small and the effect size are around 0.4, implying that the association between drug use status and early age sex is strong. The effect size over each stratum is close to each other, so the interaction effect is negligible.

Overall:

Stratified Analysis:

The plots illustrate a positive relationship between drug use status and proportion of early age sex, so the proportion of early age sex is higher when the drug use is heavier. 9th Grade has highest proportion of having early age sex regardless of drug use status while male has higher proportion of having early age sex than female ar all drug use status. Hispanic race has higher proportion of having early age sex when they used heavy dose compared to did not use drug and used light dose drug, which implies a high risky sexual behavior can be triggered when they use heavy dose drug.

8. Screening Use

Table 9: Drug Use Status vs Heavy Screening Use
stratum statistic p.value df cramer_v_effect_size
crude
Overall 30.524402 0.0000002 2 0.0554517
grade
9th Grade 25.638308 0.0000027 2 0.0946477
10th Grade 10.164083 0.0062072 2 0.0609281
11th Grade 16.490417 0.0002625 2 0.0812006
12th Grade 2.739169 0.2542126 2 0.0387310
sex
female 49.915034 0.0000000 2 0.0976001
male 2.420158 0.2981737 2 0.0227235
race
White 15.533604 0.0004236 2 0.0511702
Black or African American 4.444540 0.1083628 2 0.0660591
Hispanic/Latino 2.091141 0.3514913 2 0.0337394
All other Races 6.125398 0.0467613 2 0.0733340

From Table 9, p-value is not significantly small and effect size is small, implying that there is not a strong association between screening use and drug use frequency.

Overall:

Stratified Analysis:

All plots show that there is not too much difference of screening use for different drug use frequency groups.

9. Sleeping Time

The density plot shows that heavy dose use subjects sleep less than no drug use subjects and light dose use subjects. Also, more portions of subjects sleep less than 5 hours for heavy dose use compared to no drug use and light dose use subjects.

Table 10: Drug Use Status vs Enough Sleeping
stratum statistic p.value df cramer_v_effect_size
crude
Overall 224.523895 0.0000000 2 0.1503911
grade
9th Grade 67.607132 0.0000000 2 0.1536956
10th Grade 38.283535 0.0000000 2 0.1182468
11th Grade 26.206480 0.0000020 2 0.1023641
12th Grade 33.113961 0.0000001 2 0.1346651
sex
female 90.411458 0.0000000 2 0.1313548
male 137.899449 0.0000000 2 0.1715275
race
White 136.895936 0.0000000 2 0.1519065
Black or African American 8.871195 0.0118480 2 0.0933277
Hispanic/Latino 68.553378 0.0000000 2 0.1931790
All other Races 16.573826 0.0002518 2 0.1206284

From Table 10, we can see that the overall p-value is significantly small. However, the effect size is around 0.1, suggesting the association is not that strong.

Overall:

Stratified Analysis:

The “enough sleeping” plots shows that heavy dose subjects tend to have least proportions of having enough sleep in all grades and races. Male subjects tend to have more enough sleeping for no drug and use light dose, whereas they tend to have less enough sleeping time when use heavy dose of drug. It infers an association between sleeping time and drug use frequency.

Multi-level

10. Smoking Status

Table 11: Drug Use Status vs Smoking Status
stratum statistic p.value df cramer_v_effect_size
crude
Overall 3588.1901 0 4 0.8502443
grade
9th Grade 1147.1991 0 4 0.8953639
10th Grade 1001.8200 0 4 0.8554470
11th Grade 906.6144 0 4 0.8514702
12th Grade 543.8701 0 4 0.7718132
sex
female 1762.6172 0 4 0.8202162
male 1804.7565 0 4 0.8775600
race
White 2396.6740 0 4 0.8988777
Black or African American 245.6196 0 4 0.6944900
Hispanic/Latino 511.0736 0 4 0.7459370
All other Races 546.7436 0 4 0.9798170

From table 11, we can see that all p-values are significantly small and the effect sizes are around 0.8, which means there is a strong association between drug use and smoking status. The association between drug use and smoking status is weaker among 12th Grade compared to other grades. Among race, All Other Races has the strongest association between drug use and smoking status, and Black or African American has the weakest association.

Overall:

Stratified Analysis:

Based on the plot, it shows that heavy dose subjects tend to smoke more because there are more light dose and heavy dose smokers compared to no drug use and light dose drug use subjects. Higher portions of heavy smokers are 9th grade subjects and male gender. In addition, for heavy dose subjects, there are higher portions of heavy smokers in white race compared to Hispanic and Black race.

11. Binge Drinking

Table 12: Drug Use Status vs Binge Drinking
stratum statistic p.value df cramer_v_effect_size
crude
Overall 4031.0318 0 4 0.9011853
grade
9th Grade 1331.0011 0 4 0.9644271
10th Grade 1015.7536 0 4 0.8613754
11th Grade 914.2769 0 4 0.8550609
12th Grade 715.6054 0 4 0.8853223
sex
female 1958.3677 0 4 0.8645627
male 2040.1988 0 4 0.9330475
race
White 2685.8813 0 4 0.9515674
Black or African American 162.7432 0 4 0.5653090
Hispanic/Latino 771.7241 0 4 0.9166245
All other Races 541.1635 0 4 0.9748042

From table 12, all p-values are significantly small and the overall effect size is as high as 0.9, which indicate that there is a very strong association between drug use and binge drinking. The association between drug use and binge drinking is stronger among 9th Grade compared to other grades, and is stronger among male compared to female. Among race, Black or African American has a relatively weak association between drug use and binge drinking compared to other classification of races.

Overall:

Stratified Analysis:

With the increase of drug use, the proportion of no binge drinking decreases, and the proportion of both light and heavy binge drinking increase. The trend of proportion of binge drinking is similar among grade and gender. Among race, no matter in what classification of drug use status, the proportion of no binge drinking among Black or African American is higher than other races, and the proportion of both light and heavy binge drinking is lower than other races.

12. Seat belt use

Table 13: Drug Use Status vs Seat Belt Use
stratum statistic p.value df cramer_v_effect_size
crude
Overall 758.34933 0 4 0.3908775
grade
9th Grade 301.94172 0 4 0.4593479
10th Grade 192.13099 0 4 0.3746252
11th Grade 171.55200 0 4 0.3703873
12th Grade 156.53345 0 4 0.4140647
sex
female 384.79732 0 4 0.3832352
male 367.71545 0 4 0.3961170
race
White 547.86098 0 4 0.4297653
Black or African American 57.11642 0 4 0.3348999
Hispanic/Latino 139.06322 0 4 0.3891048
All other Races 52.95343 0 4 0.3049300

From table 13, all p-values are significantly small, and generally the effect size is around 0.4, which indicate that there is a strong association between drug use and seat belt use. The association between drug use and seat belt use is the strongest among 12th Grade compared to other grades, and is stronger among White compared to other races. As for gender, the association between drug use and seat belt use among male and female are very similar.

Overall:

With the increase of drug use, the proportion of both never and sometimes use seat belt increase, and the proportion of always use seat belt decreases. The trend of proportion of seat belt use is similar among grade, gender, and sex, with a slightly higher proportion of never use seat belt than sometimes use seat belt in heavy drug dose status among male.

13. Grades in school

Table 14: Drug Use Status vs Grades in School
stratum statistic p.value df cramer_v_effect_size
crude
Overall 1056.01928 0 6 0.5739685
grade
9th Grade 335.68761 0 6 0.6052035
10th Grade 318.89934 0 6 0.5999429
11th Grade 331.21759 0 6 0.6401861
12th Grade 136.69654 0 6 0.4793813
sex
female 628.98011 0 6 0.6079389
male 456.57736 0 6 0.5510378
race
White 742.64017 0 6 0.6204197
Black or African American 59.61303 0 6 0.4287210
Hispanic/Latino 124.03560 0 6 0.4594828
All other Races 182.43724 0 6 0.7076317

From table 14, all p-values are significantly small, and the overall effect size is around 0.6, which means that there is a strong association between drug use and GPA. The association between drug use and GPA is the strongest among 11th Grade compared to other grades, and is the weakest among 10th Grade compared to other grades. Also, the association between drug use and GPA is stronger among female than male, and is stronger among students who are classified into all other races compared to students who are classified into other race groups.

Overall:

Stratified Analysis:

With the increase of drug use, the proportion of Mostly A’s decreases, the proportion of Mostly B’s has a little change, and the proportion of both mostly C’s and below C’s increase. The trend is particularly significant among 11th grade compared to other grades, and among all other races compared with White, Black or African American, and Hispanic/Latino.

Continuous Variable

14. Smoke initial age

15.Alcohol initial age

16.BMI

##                  Df Sum Sq Mean Sq F value   Pr(>F)    
## drug_use_freq     2   1841   920.4   32.45 8.54e-15 ***
## Residuals     18923 536712    28.4                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 928 observations deleted due to missingness

More heavy dose subjects started to smoke when they were 9 to 10 and 11 to 12 years old compared to no drug use subjects and light dose drug use subjects. For those who used light dose drug, highest percentage of them started to smoke when they were 13 or 14 years old. For those who used heavy dose drug, most of them started to smoke when they were 15 or 16 years old. It shows that heavy dose subjects start to smoke earlier. Similar things can be seen in drinking alcohol start age. More heavy dose subjects started to drink alcohol when they were 8 years old or younger and 11 or 12 years old compared to no drug use subjects and light dose drug use subjects. For no drug use and light dose drug use subjects, most of them started to drink alcohol when they were 15 or 16 years old. For heavy dose drug use subjects, most of them started to drink alcohol earlier at 13 or 14 years old. It shows that heavy dose subjects also start to drink alcohol earlier. Density plots of BMI are all right-skewed, which means that the mean is greater than the median. Highest percentage of observations fall in 20-30 bmi. The box-plot shows that medians for three dose usgae groups are similar. In addition, one-way anova table implies that there is not a significant difference of BMI among three different drug use frequency groups.