pipeline.modeling package¶

Submodules¶

pipeline.modeling.feature_model_grabber module¶

class pipeline.modeling.feature_model_grabber.FeatureModelGrabber(fake_today, prediction_window, config, feature_timestamp, s3_profile, discard_model)[source]¶

add_labels_to_feature_sets(feature_sets, labels)[source]¶

combine_models_labels_features(models, labelled_features)[source]¶

compile_results(res, bulk_model_list, force_write=False)[source]¶

After a model is run, compile the model information and the predictions. Temporarily stash them in csvs. If more than 49 models have been stashed or this is the last model to be run, copy the csvs to the models and predictions table, remove the csvs, and return an empty list.

Parameters:	self (FeatureModelGrabber) – inherit object properties res (dict) – dictionary of model information bulk_model_list (list) – list of model info to be saved to database force_write (bool) – should the stashed info be saved to the database regardless of length?
Returns:	list of models run since last write
Return type:	list

connect_to_s3()[source]¶

Open a connection to s3 and return the resource objects and a dictionary of s3 configuration details.

Returns:	s3 resource and s3_config
Return type:	boto3 resource and dict

csv_to_database(file_name, table_name)[source]¶

Given a csv name and a database table name, append the contents of the csv to the database table and remove the csv.

Parameters:	file_name (str) – name of csv to save to database table_name (str) – name of the database table to copy to
Returns:	None
Return type:	None

export_data_table(table, end_date, label, feature_names)[source]¶

Save a data set as an HDF table for later reuse.

Parameters:	table (pandas DataFrame) – the DataFrame to save end_date (a date format of some kind) – end of labeling period label (str) – name of the column containing labels feature_names (list) – names of the columns containing features
Returns:	the prefix of the HDF filename
Return type:	str

export_metadata(end_date, label, feature_names)[source]¶

Construct and export metadata for a matrix. Return a unique identifier based on this metadata to used as a filename.

Parameters:	end_date (str) – the end date of the labeling period for the matrix label (str) – name of the column containing labels feature_names (list) – names of the columns containing features
Returns:	unique identifier for the matrix
Return type:	str

extract_train_x(feature_set, full_feature_table)[source]¶

generate_feature_group(feature_sets)[source]¶

generate_feature_group_combinations(feature_groups)[source]¶

generate_model_parameter_list()[source]¶

generate_uuid(metadata)[source]¶

Generate a unique identifier given a dictionary of matrix metadata.

Parameters:	metadata (dict) – metadata for the matrix
Returns:	unique name for the file
Return type:	str

get_feature_sets(feature_names_dict)[source]¶

iterate_train_test(iterable)[source]¶

Iterate over prediction window start dates, returning the start dates for train and test data for the current model.

Parameters:	prediction_window_start_dates – list of prediction window start dates
Type:	list
Returns:	train date and test date
Return type:

load_feature_name_dictionary()[source]¶

load_table(train_or_test, feature_timestamp)[source]¶

load_test_table()[source]¶

load_train_table()[source]¶

parameter_generator(params_lst)[source]¶

pickle_results(res_dict, clf)[source]¶

Pickle the model object locally, upload to s3, and delete local copy

Parameters:	self (FeatureModelGrabber) – inherit object properties res_dict (dict) – dictionary of model information clf (model) – model object
Returns:	path to pickle file
Return type:	str

run(labels)[source]¶

upload_file_to_s3(key_name, bucket, local_file_path)[source]¶

write_matrix_pairs(train_test_combos)[source]¶

Given a list of train-test pairs, write them locally, check s3 for an existing set, combine the sets, remove duplicates, and upload new copy to s3.

Parameters:	train_test_combos (list) – list of dictionaries with keys ‘train’ and ‘test’ with filenames of HDF matrices as values
Returns:	None
Return type:	None

write_to_csv(df, column_order, file_name)[source]¶

Given a dataframe, a specific column order, and a csv filename, enforce the column order on the dataframe and then append the data to the specified csv file.

Parameters:	self (FeatureModelGrabber) – inherit object properties df – data output by the modeling process containing either model information or predictions column_order (list) – the order of columns in the relevant database table file_name (str) – the name of the csv file to write to
Returns:	None
Return type:	None

pipeline.modeling.feature_model_grabber.chunker(seq, size)[source]¶

pipeline.modeling.feature_model_grabber.write_dataframe_chunks(df_name, df)[source]¶

pipeline.modeling.models module¶

class pipeline.modeling.models.ConfigError[source]¶

class pipeline.modeling.models.Model(model_name, model_params, label, training_data, testing_data, cols_to_use, config)[source]¶

compute_confusion_matrix(predicted_labels, labels)[source]¶

define_model(model, parameters, n_cores=0)[source]¶

get_data(df, undersample=False)[source]¶

get_feature_importance(clf, model_name)[source]¶

get_test_data()[source]¶

get_training_data()[source]¶

run()[source]¶

pipeline.modeling.run module¶

pipeline.modeling.run.main(config_file_name, feature_timestamp, discard_models)[source]¶

Replaces template placeholder with values.

Parameters:	config_file_name (str) – path to config yaml file
Returns:	None – always returns None as default
Return type:	None

pipeline.modeling package¶

Submodules¶

pipeline.modeling.feature_model_grabber module¶

pipeline.modeling.models module¶

pipeline.modeling.run module¶

Module contents¶