Want to learn more? Take the full course at [ Ссылка ] at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work.
---
Hello, and welcome to the course!
My name is Justin Bois, and I'm a lecturer in the Division of Biology and Biological Engineering at Caltech. My goal for you in this course is to hone and extend your statistical thinking skills by working through real data sets.
Before we dive into the two main case studies, it is important to review what you learned in Statistical Thinking I and II. I thought a great way to do that would be to play around with a couple of data sets from my colleagues in the biological sciences at Caltech.
The first data set we will practice with comes from the lab of David Prober, a leading expert on sleep. In this study, the researchers in Prof. Prober's lab studied the activity of zebrafish larvae. Each fish was put in its own little well and recorded with a camera. Whenever a fish moves, the system detects and records the movement, indicated here by the red flashes in the video. The more movement, the more wakeful the fish.
These fish are interesting because some of them have a mutation in a gene involved in producing melatonin, an important hormone for sleep regulation. Fish that have the mutation are called mutants, and those that do not are called wild type.
If we look at the mean activity of the fish over time, we see that compared to wild type the mutant fish are more active at night, which is indicated by the gray regions on the plot. Our goal with this warm-up analysis is to quantify the effect of this mutation on wakefulness.
In the exercises, you will use nighttime active bouts as a metric for wakefulness of the fish. An active bout is a period of time where a fish is consistently active. The length of an active bout is the number of consecutive minutes that a fish is active.
This is enough background about the zebrafish experiment to get started, and I'll let you dive into the exercises in a moment.
But before I do, I want to remind you about some of the concepts about probability distributions you learned in Statistical Thinking I and II. Generically speaking, a probability distribution is a mathematical description of outcomes. But they are easier to think about as stories.
You learned about the stories behind the Uniform, Binomial, Poisson, Normal, and Exponential distributions in Statistical Thinking I.
Here, I will review the Exponential distribution. Its story has to do with Poisson processes. For a Poisson process, the timing of an event, called an "arrival," is completely independent of when the previous event arrived. The waiting time between arrivals is Exponentially distributed.
In Statistical Thinking I, we considered incidents at nuclear power plants as a Poisson process, so the time between them should be Exponentially distributed. To check, we plotted the empirical cumulative distribution function, or ECDF. Each point on the ECDF represents the fraction of observations less than or equal to the value on the x-axis.
When we overlay the theoretical CDF, we see good agreement, which suggests that nuclear incidents are indeed described by a Poisson process.
To generate the x and y values for the ECDF, I used the ecdf() function you wrote in Statistical Thinking I. In fact, you wrote lots of useful functions in the prequels to this course.
Throughout this course, all of them are available through the `dc_stat_think` module, which we import as `dcst`. The functions all have complete doc strings, so you can access them using a question mark in your IPython console, as shown here for the `pearson_r()` function.
So, in this course, to use the ECDF function, use `dcst.ecdf()`. To install the `dc_stat_think` module on your own machine, you can do `pip install dc_stat_think` on the command line.
Ok, I think you're ready to apply hacker stats to learn about fish sleep!
#DataCamp #PythonTutorial #CaseStudiesinStatisticalThinking #StatisticalThinking
Ещё видео!