spacekit.preprocessor.prep¶
- class spacekit.preprocessor.prep.HstCalPrep(data, y_target, X_cols=[], norm_cols=['n_files', 'total_mb'], rename_cols=['x_files', 'x_size'], tensors=True, normalize=True, random=None, tsize=0.2, encode_targets=True)[source]¶
- class spacekit.preprocessor.prep.JwstCalPrep(data, y_target='imgsize_gb', X_cols=[], norm_cols=[], exp_mode='image', tensors=True, normalize=True, random=None, tsize=0.2, encode_targets=False, **log_kws)[source]¶
Class for preprocessing JWST calibration pipeline metadata prior to training neural networks for estimating memory footprint.
- Parameters:
data (pandas.DataFrame) – training dataset to be preprocessed
y_target (str, optional) – target column name (dependent variable), by default “imgsize_gb”
X_cols (list, optional) – feature column names (independent variables), by default []
norm_cols (list, optional) – columns on which to apply normalization, by default []
exp_mode (str, optional) – model training set (image, spec, tac, fgs), by default “image”
tensors (bool, optional) – convert model inputs into tensors, by default True
normalize (bool, optional) – apply normalization, by default True
random (int, optional) – random seed for train-test splits, by default None
tsize (float, optional) – test size ratio, by default 0.2
encode_targets (bool, optional) – encode target values (categorical classifiers), by default False
- classify_targets()[source]¶
Creates temporary target class ‘mem_bin’ based on max RAM levels specified by
memory_classesproperty.
- class spacekit.preprocessor.prep.Prep(data, y_target, X_cols=[], tensors=True, normalize=True, random=None, tsize=0.2, encode_targets=True, norm_params=None)[source]¶
Base class for preprocessing data sets prior to training a machine learning model. This class can be used directly or subclassed for additional custom preprocessing. Existing subclasses for HST and JWST skopes are also available.
- Parameters:
data (pandas.DataFrame) – training dataset to be preprocessed
y_target (str, optional) – target column name (dependent variable), by default “imgsize_gb”
X_cols (list, optional) – feature column names (independent variables), by default []
tensors (bool, optional) – convert model inputs into tensors, by default True
normalize (bool, optional) – apply normalization, by default True
random (int, optional) – random seed for train-test splits, by default None
tsize (float, optional) – test size ratio, by default 0.2
encode_targets (bool, optional) – encode target values (categorical classifiers), by default False
norm_params (dict, optional) – normalization parameters (see apply_normalization for acceptable key-val pairs), by default None