spacekit.extractor.load¶
- spacekit.extractor.load.load_datasets(filenames, index_col='index', column_order=None, verbose=1)[source]¶
Import one or more dataframes from csv files and merge along the 0 axis (rows / horizontal). Assumes the datasets use the same index_col name and identical column names (although this is not strictly required) since this function does not handle missing data or NaNs.
- spacekit.extractor.load.stratified_splits(df, target='label', v=0.85)[source]¶
Splits Pandas dataframe into feature (X) and target (y) train, test and validation sets.
- Parameters:
df (Pandas dataframe) – preprocessed SVM regression test dataset
target (str, optional) – target class label for alignment model predictions, by default “label”
test_size (int, optional) – size of the test set, by default 0.2
val_size (int, optional) – create a validation set separate from train/test, by default 0.1
- Returns:
data, labels: features (X) and targets (y) split into train, test, validation sets
- Return type:
tuples of Pandas dataframes
- spacekit.extractor.load.read_channels(channels, w, h, d, exp=None, color_mode='rgb')[source]¶
Loads PNG image data and converts to 3D arrays.
- Parameters:
channels (tuple) – image frames (original, source, gaia)
w (int) – image width
h (int) – image height
d (int) – depth (number of image frames)
exp (int, optional) – expand array dimensions ie reshape to (exp, w, h, 3), by default None
color_mode (str, optional) – RGB (3 channel images) or grayscale (1 channel), by default “rgb”. SVM predictions requires exp=3; set to None for training.
- Returns:
image pixel values as array
- Return type:
numpy array
- class spacekit.extractor.load.ImageIO(img_path, format='png', data=None, name='ImageIO', **log_kws)[source]¶
Bases:
objectParent Class for image file input/output operations
- check_format(format)[source]¶
Checks the format type of
img_path(png,jpgornpz) and initializes theformatattribute accordingly.
- load_multi_npz(i='img_index.npz', X='img_data.npz', y='img_labels.npz')[source]¶
Load numpy arrays from individual feature/image data, label and index compressed files on disk. As the counterpart function to
save_multi_npz, keys within each file are expected to be named as follows: i: “train_idx”, “test_idx”, “val_idx” X: “X_train, “X_test”, “X_val” y: “y_train”, “y_test”, “y_val”
- load_npz(npz_file=None, keys=['index', 'images', 'labels'])[source]¶
_summary_
- Parameters:
- Returns:
If three keys are passed into the keyword arg
keys, a tuple of 3 arrays matching these keys is returned. If only 2 keys are passed, returns 2 arrays matching the 2 keys.- Return type:
arrays or tuple of arrays
- split_arrays_from_npz(v=0.85)[source]¶
Loads images (X), labels (y) and index (i) from a single .npz compressed numpy file. Splits into train, test, val sets using 70-20-10 ratios.
- Returns:
train, test, val tuples of numpy arrays. Each tuple consists of an index, feature data (X, for images these are the actual pixel values) and labels (y).
- Return type:
tuples
- class spacekit.extractor.load.SVMImageIO(img_path, w=128, h=128, d=9, inference=True, format='png', data=None, target='label', v=0.85, **log_kws)[source]¶
Bases:
ImageIOSubclass for loading Single Visit Mosaic total detection .png images from local disk into numpy arrays and performing initial preprocessing and labeling for training a CNN or generating predictions on unlabeled data.
- Parameters:
ImageIO (class) – ImageIO parent class
Instantiates an SVMImageIO object.
- Parameters:
img_path (string) – path to local directory containing png files
w (int, optional) – image pixel width, by default 128
h (int, optional) – image pixel height, by default 128
d (int, optional) – channel depth, by default 9
inference (bool, optional) – determines how to load images (set to False for training), by default True
format (str, optional) – format type of image file(s),
png,jpgornpz, by default “png”data (dataframe, optional) – used to load mlp data inputs and split into train/test/validation sets, by default None
target (str, optional) – name of the target column in dataframe, by default “label”
v (float, optional) – size ratio for validation set, by default 0.85
- detector_prediction_images(X_data, exp=3)[source]¶
Load image files from pngs into numpy arrays. Image arrays are reshaped into the appropriate dimensions for generating predictions in a pre-trained image CNN (no data augmentation is performed).
- Parameters:
X_data (Pandas dataframe) – input data (assumes index values are the image filenames)
exp (int, optional) – expand image array shape into its constituent frame dimensions, by default 3
- Returns:
image name index, arrays of image pixel values
- Return type:
Pandas Index, numpy array
- detector_training_images(X_data, exp=None)[source]¶
Load image files from class-labeled folders containing pngs into numpy arrays. Image arrays are not reshaped since this assumes data augmentation will be performed at training time.
- get_labeled_image_paths(i)[source]¶
Creates lists of negative and positive image filepaths, assuming the image files are in subdirectories named according to the class labels e.g. “0” and “1” (Similar to how Keras
flow_from_directoryworks). Note: this method expects 3 images in the subdirectory, two of which have suffices _source and _gaia appended, and a very specific path format:{img_path}/{label}/{i}/{i}_{suffix}.pngwhereiis typically the full name of the visit. This may be made more flexible in future versions but for now is more or less hardcoded for SVM images generated byspacekit.skopes.hst.svm.preporcorruptmodules.- Parameters:
i (str) – image filename
- Returns:
image filenames for each image type (original, source, gaia)
- Return type:
tuples
- load_from_data_splits(X_train, X_test, X_val)[source]¶
Read in train/test files and produce X-y data splits.
- Parameters:
X_train (numpy.ndarray) – training image inputs
X_test (numpy.ndarray) – test image inputs
X_val (numpy.ndarray) – validation image inputs
- Returns:
train, test, val nested lists each containing an index of the visit names and png image data as numpy arrays.
- Return type:
nested lists
- spacekit.extractor.load.save_dct_to_txt(data_dict)[source]¶
Saves the key-value pairs of a dictionary to text files on local disk, with each key as a filename and its value(s) as the contents of that file.