spacekit.preprocessor.prep¶

class spacekit.preprocessor.prep.HstCalPrep(data, y_target, X_cols=[], norm_cols=['n_files', 'total_mb'], rename_cols=['x_files', 'x_size'], tensors=True, normalize=True, random=None, tsize=0.2, encode_targets=True)[source]¶

prep_mem_bin()[source]¶: main calling function

class spacekit.preprocessor.prep.JwstCalPrep(data, y_target='imgsize_gb', X_cols=[], norm_cols=[], exp_mode='image', tensors=True, normalize=True, random=None, tsize=0.2, encode_targets=False, **log_kws)[source]¶

Class for preprocessing JWST calibration pipeline metadata prior to training neural networks for estimating memory footprint.

Parameters:

data (pandas.DataFrame) – training dataset to be preprocessed
y_target (str, optional) – target column name (dependent variable), by default “imgsize_gb”
X_cols (list, optional) – feature column names (independent variables), by default []
norm_cols (list, optional) – columns on which to apply normalization, by default []
exp_mode (str, optional) – model training set (image, spec, tac, fgs), by default “image”
tensors (bool, optional) – convert model inputs into tensors, by default True
normalize (bool, optional) – apply normalization, by default True
random (int, optional) – random seed for train-test splits, by default None
tsize (float, optional) – test size ratio, by default 0.2
encode_targets (bool, optional) – encode target values (categorical classifiers), by default False

classify_targets()[source]¶: Creates temporary target class ‘mem_bin’ based on max RAM levels specified by memory_classes property.

prep_data(existing_splits=False, stratify=False)[source]¶

Splits data into training (X_train) and test (X_test) sets and applies a PowerTransform normalization to each.

Parameters:

existing_splits (bool, optional) – Split the data using values in ‘split’ column, by default False
stratify (bool, optional) – Stratify splits according to target class distribution (mem_bin), by default False

prep_targets()[source]¶: main calling function

class spacekit.preprocessor.prep.Prep(data, y_target, X_cols=[], tensors=True, normalize=True, random=None, tsize=0.2, encode_targets=True, norm_params=None)[source]¶

Base class for preprocessing data sets prior to training a machine learning model. This class can be used directly or subclassed for additional custom preprocessing. Existing subclasses for HST and JWST skopes are also available.

Parameters:

data (pandas.DataFrame) – training dataset to be preprocessed
y_target (str, optional) – target column name (dependent variable), by default “imgsize_gb”
X_cols (list, optional) – feature column names (independent variables), by default []
tensors (bool, optional) – convert model inputs into tensors, by default True
normalize (bool, optional) – apply normalization, by default True
random (int, optional) – random seed for train-test splits, by default None
tsize (float, optional) – test size ratio, by default 0.2
encode_targets (bool, optional) – encode target values (categorical classifiers), by default False
norm_params (dict, optional) – normalization parameters (see apply_normalization for acceptable key-val pairs), by default None

class spacekit.preprocessor.prep.SvmPrep(data, y_target='label', X_cols=[], tensors=True, normalize=False, random=None, tsize=0.2, encode_targets=False, norm_params=None)[source]¶