Documentation for Johnson County’s DDJ Early Intervention System¶
Contents:
Johnson County Early Intervention System¶
DSSG has partnered with Johnson County and Salt Lake County to build a prototype early intervention system (EIS) for individuals who repeatedly cycle through multiple systems, including jails, EMS, mental health services. Currently, there is little coordination between systems to address each person’s underlying needs. An accurate Early Intervention System will quickly identify individuals at risk of contact with any or all systems so our partners can provide appropriate services and interventions to them.
To achieve this goal, we developed models that assign risk scores to individuals making contact with one system of making future contact with another system. These models produce ranked lists of individuals at risk who may receive follow-up care or interventions. The models provide proactive risk warnings at points of contact (e.g., EMS dispatch, jail bookings).
See this blog post for a broad overview of the project. The paper Reducing Incarceration through Prioritized Interventions is under review and provides a more in-depth description of the project’s implementation and its results. More specific documentation of the code itself can be built with Sphinx by make -C doc html or the public documentation may be viewed online at johnson-county-ddj.readthedocs.io.
Installation¶
Use Git to clone the repository.
Setting up the Virtual Environment¶
The scripts, notebooks, and other tools in this repository rely on a specific Python enviroment combining Python 2.7 and a set of package versions specified in requirements.txt. To ensure that the code runs on your machine, follow the steps outlined below to set up and activate a Python virtual environment with the required configuration:
ONE: Ensure that you have Python 2.7 installed on your machine and that you know the directory where it is installed.
TWO: If you do not have virtual environment installed, install it using:
$ pip install virtualenv
THREE: Create the virtual environment to use with this software. First change your working directory to the directory where you would like to install the virtual environment. Then, create a virtual environment with the following command, replacing /usr/bin/python2.7 with the location of your Python 2.7 installation and venv with the name you would like to give to the environment:
$ virtualenv -p /usr/bin/python2.7 venv
FOUR: Activate the virtual environment, replacing venv with the directory you just created:
$ source venv/bin/activate
To make activating the virtual environment in the future easier, consider adding an alias to your .bashrc or .bash_profile:
alias venv=”source /PATH/TO/VIRTUAL/ENVIRONMENT/venv/bin/activate”
FIVE: To configure the virtual environment to use the correct packages and versions, run the follwing commands, pointing to the requirements.txt file in the repository for the final one:
$ pip install numpy==1.11.2 $ pip install scipy==0.18.1 $ pip install -r requirements.txt
If this fails, you may need to open requirements.txt and install each package individually. For example:
$ pip install collate==0.1.0
SIX: To set up the virtual environment for use within Jupyter Notebooks, run the following command:
$ ipython kernel install –user Installed kernelspec python2 in /home/USER/.local/share/jupyter/kernels/python2
Copy the kernelspec to a directory where ipython will find it and give it a name you will recognize as your virtual environment (venv in this example):
$ mkdir -p ~/.ipython/kernels $ mv ~/.local/share/jupyter/kernels/python2 ~/.ipython/kernels/venv
Then, edit the kernel.json file in the directory you just created, changing the JSON key called display_name to the name of your virtual environment (e.g., venv).
SEVEN: When you are finished working with the tools in this repository, deactivate your virtual environment with:
$ deactivate
Configuration¶
A number of configuration files are used to setup database credentials and specify various names and parameters.
Credentials¶
There are several configuration files that contain confidential information
(like credentials) that cannot be committed to the repository. Example files
are provided in the config
directory for both database and s3 credentials.
Simply remove the example_
prefix and populate each appropriately.
Constants¶
The file pipeline/default_profile.yaml
contains a number of constants that
are used throughout the pipeline codebase, including the paths to the
credential files above as well as specific table and column names. It generally
does not need to be modified for use with Johnson County’s data.
Experiment configuration¶
The yamls
folder contains the experimental configuration that gets passed
to the pipeline preprocessing and modeling scripts. Each yaml file within that
folder specifies a very specific experimental configuration. See the
yamls/default_sample.yaml
file for an example configuration. There are
several very important categories that are broken out as block comments in that
sample configuration:
- Type of Experiment rarely changes; the unit entity is a ‘person’.
- Temporal parameters specify the time blocking, including start dates, the prediction window (in days), and the “fake today” – this specifies the date to simulate the experiment.
- Labeling details specify which labels to consider as outcomes. The
current set of labels are all based upon interactions with a given data
provider. The label names are underscore-delimited, separating the data
providers. All but the final component specify the population of interest and
are currently exclusive of any of the other providers (this is something
that we want to change). The final component specifies the outcome of
interest. For example:
jims_mh_jims
labels the population who has previously interacted with criminal justice and mental health (but not EMS) who have a new interaction with the criminal justice system as positive labels. - Feature Selection specifies the feature groups that should be included in the models.
- Model selection specifies the models to run and the parameters over which they should be parameterized. All model parameters may be lists and get combined with the cross-product so all possible combinations get tested.
- Output file details specifies where the outputs should go.
Evaluation configuration¶
The variables of interest for evaluation may be specified in the
pipeline/evaluation/eval_profile.yaml
file. These determine which
metrics get calculated for each model that was run.
Data extraction and cleaning¶
ETL is handled by drake
, which is expected to be called from the top (or
root) directory of this repository. All paths assume that drake was invoked
from the root of the repository.
There are four conceptual steps: extract, clean, deduplicate, and process.
Extract raw data¶
The scripts here expect to find a zipped data dump in a data
directory in
the repository root. It will extract that, and restore into the database. The
database configuration is specified within a config
directory in the
repository root.
Clean relevant tables¶
The raw tables, as restored from the database dump, get placed into the public
schema of the database. Before being used by the pipeline, all tables go through
a cleaning and normalization process. All scripts contained within the
etl/data_cleaning
directory are executed at this point.
Deduplicate identities¶
Deduplication is handled by superdeduper.
A SQL script creates a master “entries” table by combining all relevant columns
from all the data sources, and then superdeduper is called with the saved
configuration in etl/dedupe/config.yaml
. After a dedupe_id
is appended
to the entries table, the apply_results.py
script goes back to the
specified tables in the clean schema to append a dedupe_id column.
Further processing¶
Finally, a few SQL scripts are used to create computed tables for convenience.
These include things like a timeline of events. All scripts contained within the
etl/data_processing
directory are executed at this point.
The Pipeline: Feature building, modeling, and evaluation¶
There are three major steps to the pipeline: features building, modeling,
and model evaluation. Each step is a submodule of the pipeline
and has its
own run
command-line interface, designed to be run from the repository root
as python -m pipeline.component.run
with command-line arguments. Or all
three steps may be run with a single invocation of python -m pipeline.run
.
The -h
flag will show the help for each command. A broad overview of each
component is provided here, with more specific inline documentation in the code
and exposed as module documentation below.
Preprocessing: Feature building¶
The command python -m pipeline.preprocessing.run yamls/default_sample.yaml
will use the sample experiment configuration to build the required feature
table. The feature tables are timestamped with the time at which the command
was run.
Modeling¶
The command python -m pipeline.modeling.run yamls/default_sample.yaml
will
use the sample experiment configuration and the most recently created feature
tables in order to train all the models specified in the files at the given
splits.
Evaluation¶
The command python -m pipeline.evaluation.run
will evaluate all unprocessed
models it finds in the database and compute the metrics found in the default
evaluation configuration file.
Module contents¶
pipeline.preprocessing package¶
Subpackages¶
pipeline.preprocessing.features package¶
-
class
pipeline.preprocessing.features.abstract.
TimeBoundedFeature
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.SimpleFeature
-
class
pipeline.preprocessing.features.emsfeatures.
CountOfEms
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.emsfeatures.
Destination
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.emsfeatures.
DifferentResidenceOfCityRecorded
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.emsfeatures.
EverHomelessness
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.emsfeatures.
Homelessness
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.emsfeatures.
LastMonthEmsCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.emsfeatures.
LastWeekEmsCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.emsfeatures.
LastYearEmsCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.emsfeatures.
NoTreamentRequiredCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.emsfeatures.
PrimaryImpression
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.emsfeatures.
RefusedCareCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.emsfeatures.
TransportedSum
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.emsfeatures.
TreatedRefusedTransportCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.emsfeatures.
TreatedSum
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.emsfeatures.
TreatedTransferredCareCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.emsfeatures.
TreatedTransportedALSCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.emsfeatures.
TreatedTransportedBLSCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.emsfeatures.
TreatedTransportedByLawEnforcementCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.emsfeatures.
TriageOfEms
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
ArrestingAgencyCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
AvgBailAmount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
BailTypeCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
BailedOutCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
CaseTypeCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
CountOfJims
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
CurrentChargesCoarseFinding
[source]¶ Bases:
pipeline.preprocessing.features.abstract.SimpleFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
CurrentChargesDrugOffense
[source]¶ Bases:
pipeline.preprocessing.features.abstract.SimpleFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
CurrentChargesFelonyOrMisdemeanor
[source]¶ Bases:
pipeline.preprocessing.features.abstract.SimpleFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
CurrentChargesFindingTrialOccurred
[source]¶ Bases:
pipeline.preprocessing.features.abstract.SimpleFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
CurrentChargesFoundOrPleadGuilty
[source]¶ Bases:
pipeline.preprocessing.features.abstract.SimpleFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
LastMonthAvgBailAmount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
LastMonthBailTypeCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
LastMonthCaseTypeCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
LastMonthJimsCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
LastWeekAvgBailAmount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
LastWeekBailTypeCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
LastWeekCaseTypeCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
LastWeekJimsCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
LastYearAvgBailAmount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
LastYearBailTypeCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
LastYearCaseTypeCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
LastYearJailDaysAvg
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
LastYearJailDaysStddev
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
LastYearJailDaysSum
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
LastYearJimsCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.jimsfeatures.
ProbationType
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.mentalhealthfeatures.
CountOfMentalHealth
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.mentalhealthfeatures.
Diagnoses
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.mentalhealthfeatures.
Discharge
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.mentalhealthfeatures.
ImportantDiagnoses
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.mentalhealthfeatures.
LastMonthMhCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.mentalhealthfeatures.
LastWeekMhCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.mentalhealthfeatures.
LastYearMhCount
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.mentalhealthfeatures.
LastYearMhDaysAvg
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.mentalhealthfeatures.
LastYearMhDaysStddev
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.mentalhealthfeatures.
LastYearMhDaysSum
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.mentalhealthfeatures.
MostCommonTherapistNumber
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.mentalhealthfeatures.
NumOfMHServices
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.mentalhealthfeatures.
NumOfUniqueMHServices
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.mentalhealthfeatures.
NumberOfTherapists
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.mentalhealthfeatures.
Program
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.mentalhealthfeatures.
ProgramsDischarges
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.mentalhealthfeatures.
Referral
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.mentalhealthfeatures.
ServicesRecieved
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.miscfeatures.
AvgDaysBetweenEvents
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.miscfeatures.
IntersectionsPublicService
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.miscfeatures.
StdDaysBetweenEvents
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.person.
Age
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.person.
AgeDiscrete
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.person.
AgeFirstInteractionPublicService
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.person.
AgeFirstInteractionPublicServiceDiscrete
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.person.
AgeFirstInteractionPublicServiceInYears
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.person.
AgeLastInteractionPublicService
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.person.
AgeLastInteractionPublicServiceDiscrete
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.person.
AgeLastInteractionPublicServiceInYears
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.person.
AgeYears
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.person.
Gender
[source]¶ Bases:
pipeline.preprocessing.features.abstract.SimpleFeature
-
class
pipeline.preprocessing.features.person.
Race
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.SimpleFeature
-
class
pipeline.preprocessing.features.rsifeatures.
AvgIntervalFromInAndDisposition
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.rsifeatures.
CountOfRSI
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.rsifeatures.
ResidencyRecordedMost
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.rsifeatures.
TransportedBy
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingBooking182
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingBooking365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingBooking730
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingBooking90
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingBookingBooking182
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingBookingBooking365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingBookingBooking730
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingBookingBookingBooking365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingBookingBookingBooking730
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingBookingEms365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingBookingMh365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingEms182
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingEms30
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingEms365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingEms7
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingEms730
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingEms90
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingEmsBooking182
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingEmsBooking365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingEmsBooking730
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingEmsEms365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingMh182
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingMh730
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
BookingMhBooking365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsBooking182
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsBooking30
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsBooking365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsBooking730
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsBookingBooking365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsBookingEms182
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsBookingEms365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsEms182
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsEms30
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsEms365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsEms7
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsEms730
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsEms90
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsEmsBooking365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsEmsEms182
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsEmsEms30
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsEmsEms365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsEmsEms90
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsEmsEmsEms182
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsEmsEmsEms365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsEmsEmsEms90
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsEmsMh90
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsMh182
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsMh365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
EmsMh90
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
MhBooking182
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
MhBooking365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
MhBooking730
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
MhBooking90
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
MhBookingBooking365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
MhEms182
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
MhEms365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
MhEms90
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
MhMh182
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
MhMh365
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
-
class
pipeline.preprocessing.features.seqfeatures.
MhMh90
(**kwargs)[source]¶ Bases:
pipeline.preprocessing.features.abstract.TimeBoundedFeature
Submodules¶
pipeline.preprocessing.feature_processor module¶
pipeline.preprocessing.feature_table_builder module¶
-
pipeline.preprocessing.feature_table_builder.
generate_fake_todays
(fake_today, prediction_window, start_date)[source]¶ Given a final prediction window start date, the length of the prediction windows, and a training start date, return the start and end dates for all prediction windows as a dictionary.
Parameters: - fake_today (datetime) – start date for the final prediction window
- prediction_window (int) – length of the prediction windows in days
- start_date (datetime) – start date for the training period
Returns: start and end dates for all prediction windows
Return type: dict
-
pipeline.preprocessing.feature_table_builder.
generate_feature_table
(config, fake_today, prediction_window, start_date, feature_timestamp)[source]¶
Module contents¶
pipeline.evaluation package¶
Submodules¶
pipeline.evaluation.eval_old_models module¶
pipeline.evaluation.evaluation module¶
pipeline.evaluation.make_precision_recall_at_k_graphs module¶
pipeline.evaluation.run module¶
-
pipeline.evaluation.run.
main
(eval_config_file)[source]¶ Runs evaluation code to generate metrics for models in the models table, stash them in a csv, and upload them in bulk to the metrics table.
Parameters: config_file_name (str) – path to evaluation configuration file Returns: None – always returns None as default Return type: None
pipeline.evaluation.user_timeline module¶
pipeline.evaluation.utils module¶
Module contents¶
pipeline.modeling package¶
Submodules¶
pipeline.modeling.feature_model_grabber module¶
-
class
pipeline.modeling.feature_model_grabber.
FeatureModelGrabber
(fake_today, prediction_window, config, feature_timestamp, s3_profile, discard_model)[source]¶ -
-
compile_results
(res, bulk_model_list, force_write=False)[source]¶ After a model is run, compile the model information and the predictions. Temporarily stash them in csvs. If more than 49 models have been stashed or this is the last model to be run, copy the csvs to the models and predictions table, remove the csvs, and return an empty list.
Parameters: - self (FeatureModelGrabber) – inherit object properties
- res (dict) – dictionary of model information
- bulk_model_list (list) – list of model info to be saved to database
- force_write (bool) – should the stashed info be saved to the database regardless of length?
Returns: list of models run since last write
Return type: list
-
connect_to_s3
()[source]¶ Open a connection to s3 and return the resource objects and a dictionary of s3 configuration details.
Returns: s3 resource and s3_config Return type: boto3 resource and dict
-
csv_to_database
(file_name, table_name)[source]¶ Given a csv name and a database table name, append the contents of the csv to the database table and remove the csv.
Parameters: - file_name (str) – name of csv to save to database
- table_name (str) – name of the database table to copy to
Returns: None
Return type: None
-
export_data_table
(table, end_date, label, feature_names)[source]¶ Save a data set as an HDF table for later reuse.
Parameters: - table (pandas DataFrame) – the DataFrame to save
- end_date (a date format of some kind) – end of labeling period
- label (str) – name of the column containing labels
- feature_names (list) – names of the columns containing features
Returns: the prefix of the HDF filename
Return type: str
-
export_metadata
(end_date, label, feature_names)[source]¶ Construct and export metadata for a matrix. Return a unique identifier based on this metadata to used as a filename.
Parameters: - end_date (str) – the end date of the labeling period for the matrix
- label (str) – name of the column containing labels
- feature_names (list) – names of the columns containing features
Returns: unique identifier for the matrix
Return type: str
-
generate_uuid
(metadata)[source]¶ Generate a unique identifier given a dictionary of matrix metadata.
Parameters: metadata (dict) – metadata for the matrix Returns: unique name for the file Return type: str
-
iterate_train_test
(iterable)[source]¶ Iterate over prediction window start dates, returning the start dates for train and test data for the current model.
Parameters: prediction_window_start_dates – list of prediction window start dates Type: list Returns: train date and test date Return type:
-
pickle_results
(res_dict, clf)[source]¶ Pickle the model object locally, upload to s3, and delete local copy
Parameters: - self (FeatureModelGrabber) – inherit object properties
- res_dict (dict) – dictionary of model information
- clf (model) – model object
Returns: path to pickle file
Return type: str
-
write_matrix_pairs
(train_test_combos)[source]¶ Given a list of train-test pairs, write them locally, check s3 for an existing set, combine the sets, remove duplicates, and upload new copy to s3.
Parameters: train_test_combos (list) – list of dictionaries with keys ‘train’ and ‘test’ with filenames of HDF matrices as values Returns: None Return type: None
-
write_to_csv
(df, column_order, file_name)[source]¶ Given a dataframe, a specific column order, and a csv filename, enforce the column order on the dataframe and then append the data to the specified csv file.
Parameters: - self (FeatureModelGrabber) – inherit object properties
- df – data output by the modeling process containing either model information or predictions
- column_order (list) – the order of columns in the relevant database table
- file_name (str) – the name of the csv file to write to
Returns: None
Return type: None
-