spacekit.preprocessor.prep

class spacekit.preprocessor.prep.HstCalPrep(data, y_target, X_cols=[], norm_cols=['n_files', 'total_mb'], rename_cols=['x_files', 'x_size'], tensors=True, normalize=True, random=None, tsize=0.2, encode_targets=True)[source]
prep_mem_bin()[source]

main calling function

class spacekit.preprocessor.prep.JwstCalPrep(data, y_target='imgsize_gb', X_cols=[], norm_cols=[], exp_mode='image', tensors=True, normalize=True, random=None, tsize=0.2, encode_targets=False, **log_kws)[source]

Class for preprocessing JWST calibration pipeline metadata prior to training neural networks for estimating memory footprint.

Parameters:
  • data (pandas.DataFrame) – training dataset to be preprocessed

  • y_target (str, optional) – target column name (dependent variable), by default “imgsize_gb”

  • X_cols (list, optional) – feature column names (independent variables), by default []

  • norm_cols (list, optional) – columns on which to apply normalization, by default []

  • exp_mode (str, optional) – model training set (image, spec, tac, fgs), by default “image”

  • tensors (bool, optional) – convert model inputs into tensors, by default True

  • normalize (bool, optional) – apply normalization, by default True

  • random (int, optional) – random seed for train-test splits, by default None

  • tsize (float, optional) – test size ratio, by default 0.2

  • encode_targets (bool, optional) – encode target values (categorical classifiers), by default False

classify_targets()[source]

Creates temporary target class ‘mem_bin’ based on max RAM levels specified by memory_classes property.

prep_data(existing_splits=False, stratify=False)[source]

Splits data into training (X_train) and test (X_test) sets and applies a PowerTransform normalization to each.

Parameters:
  • existing_splits (bool, optional) – Split the data using values in ‘split’ column, by default False

  • stratify (bool, optional) – Stratify splits according to target class distribution (mem_bin), by default False

prep_targets()[source]

main calling function

class spacekit.preprocessor.prep.Prep(data, y_target, X_cols=[], tensors=True, normalize=True, random=None, tsize=0.2, encode_targets=True, norm_params=None)[source]

Base class for preprocessing data sets prior to training a machine learning model. This class can be used directly or subclassed for additional custom preprocessing. Existing subclasses for HST and JWST skopes are also available.

Parameters:
  • data (pandas.DataFrame) – training dataset to be preprocessed

  • y_target (str, optional) – target column name (dependent variable), by default “imgsize_gb”

  • X_cols (list, optional) – feature column names (independent variables), by default []

  • tensors (bool, optional) – convert model inputs into tensors, by default True

  • normalize (bool, optional) – apply normalization, by default True

  • random (int, optional) – random seed for train-test splits, by default None

  • tsize (float, optional) – test size ratio, by default 0.2

  • encode_targets (bool, optional) – encode target values (categorical classifiers), by default False

  • norm_params (dict, optional) – normalization parameters (see apply_normalization for acceptable key-val pairs), by default None

class spacekit.preprocessor.prep.SvmPrep(data, y_target='label', X_cols=[], tensors=True, normalize=False, random=None, tsize=0.2, encode_targets=False, norm_params=None)[source]