2. Documentation for the som module
- class som.SOM(x: int, y: int, alpha_start: float = 0.6, sigma_start: Optional[float] = None, seed: Optional[int] = None)
Class implementing a self-organizing map with periodic boundary conditions. It has the following methods:
- cycle(vector: ndarray, verbose: bool = True)
Perform one iteration in adapting the SOM towards a chosen data point
- Parameters
vector (np.ndarray) – current data point
verbose (bool) – verbosity control
- distance_map(metric: str = 'euclidean')
Get the distance map of the neuron weights. Every cell is the normalised average of all distances between the neuron and all other neurons.
- Parameters
metric (str) – distance metric to be used (see
scipy.spatial.distance.cdist
)- Returns
normalized sum of distances for every neuron to its neighbors, stored in
SOM.distmap
- fit(data: ndarray, epochs: int = 0, save_e: bool = False, interval: int = 1000, decay: str = 'hill', verbose: bool = True)
Train the SOM on the given data for several iterations
- Parameters
data (np.ndarray) – data to train on
epochs (int, optional) – number of iterations to train; if 0, epochs=len(data) and every data point is used once
save_e (bool, optional) – whether to save the error history
interval (int, optional) – interval of epochs to use for saving training errors
decay (str, optional) – type of decay for alpha and sigma. Choose from ‘hill’ (Hill function) and ‘linear’, with ‘hill’ having the form
y = 1 / (1 + (x / 0.5) **4)
verbose (bool) – verbosity control
- get_neighbors(datapoint: ndarray, data: ndarray, labels: ndarray, d: int = 0) ndarray
return the labels of the neighboring data instances at distance d for a given data point of interest
- Parameters
datapoint (np.ndarray) – descriptor vector of the data point of interest to check for neighbors
data (np.ndarray) – reference data to compare datapoint to
labels (np.ndarray) – array of labels describing the target classes for every data point in data
d (int) – length of Manhattan distance to explore the neighborhood (0: same neuron as data point)
- Returns
found neighbors (labels)
- Return type
np.ndarray
- initialize(data: ndarray, how: str = 'pca')
Initialize the SOM neurons
- Parameters
data (numpy.ndarray) – data to use for initialization
how (str) – how to initialize the map, available: pca (via 4 first eigenvalues) or random (via random values normally distributed in the shape of data)
- Returns
initialized map in
SOM.map
- load(filename: str)
Load a SOM instance from a pickle file.
- Parameters
filename (str) – filename (best to end with .p)
- Returns
updated instance with data from filename
- plot_class_density(data: ndarray, targets: Union[list, ndarray], t: int = 1, name: str = 'actives', colormap: str = 'gray', example_dict: Optional[dict] = None, filename: Optional[str] = None)
Plot a density map only for the given class
- Parameters
data (np.ndarray) – data to visualize the SOM density (number of times a neuron was winner)
targets (list, np.ndarray) – array of target classes (0 to len(targetnames)) corresponding to data
t (int) – target class to plot the density map for
name (str) – target name corresponding to target given in t
colormap (str) – colormap to use, select from matplolib sequential colormaps
example_dict (dict) – dictionary containing names of examples as keys and corresponding descriptor values as values. These examples will be mapped onto the density map and marked
filename (str) – optional, if given, the plot is saved to this location
- Returns
plot shown or saved if a filename is given
- plot_density_map(data: ndarray, colormap: str = 'gray', filename: Optional[str] = None, example_dict: Optional[dict] = None, internal: bool = False)
Visualize the data density in different areas of the SOM.
- Parameters
data (np.ndarray) – data to visualize the SOM density (number of times a neuron was winner)
colormap (str) – colormap to use, select from matplolib sequential colormaps
filename (str) – optional, if given, the plot is saved to this location
example_dict (dict) – dictionary containing names of examples as keys and corresponding descriptor values as values. These examples will be mapped onto the density map and marked
internal (bool) – if True, the current plot will stay open to be used for other plot functions
- Returns
plot shown or saved if a filename is given
- plot_distance_map(colormap: str = 'gray', filename: Optional[str] = None)
Plot the distance map after training.
- Parameters
colormap (str) – colormap to use, select from matplolib sequential colormaps
filename (str) – optional, if given, the plot is saved to this location
- Returns
plot shown or saved if a filename is given
- plot_error_history(color: str = 'orange', filename: Optional[str] = None)
plot the training reconstruction error history that was recorded during the fit
- Parameters
color (str) – color of the line
filename (str) – optional, if given, the plot is saved to this location
- Returns
plot shown or saved if a filename is given
- plot_point_map(data: ndarray, targets: Union[list, ndarray], targetnames: Union[list, ndarray], filename: Optional[str] = None, colors: Optional[Union[list, ndarray]] = None, markers: Optional[Union[list, ndarray]] = None, colormap: str = 'gray', example_dict: Optional[dict] = None, density: bool = True, activities: Optional[Union[list, ndarray]] = None)
Visualize the som with all data as points around the neurons
- Parameters
data (np.ndarray) – data to visualize with the SOM
targets (list, np.ndarray) – array of target classes (0 to len(targetnames)) corresponding to data
targetnames (list, np.ndarray) – names describing the target classes given in targets
filename (str, optional) – if provided, the plot is saved to this location
colors (list, np.ndarray, None; optional) – if provided, different classes are colored in these colors
markers (list, np.ndarray, None; optional) – if provided, different classes are visualized with these markers
colormap (str) – colormap to use, select from matplolib sequential colormaps
example_dict (dict) – dictionary containing names of examples as keys and corresponding descriptor values as values. These examples will be mapped onto the density map and marked
density (bool) – whether to plot the density map with winner neuron counts in the background
activities (list, np.ndarray, None; optional) – list of activities (e.g. IC50 values) to use for coloring the points accordingly; high values will appear in blue, low values in green
- Returns
plot shown or saved if a filename is given
- save(filename: str)
Save the SOM instance to a pickle file.
- Parameters
filename (str) – filename (best to end with .p)
- Returns
saved instance in file with name filename
- som_error(data: ndarray) float
Calculates the overall error as the average difference between the winning neurons and the data points
- Parameters
data (np.ndarray) – data to calculate the overall error for
- Returns
normalized error
- Return type
float
- transform(data: ndarray) ndarray
Transform data in to the SOM space
- Parameters
data (np.ndarray) – data to be transformed
- Returns
transformed data in the SOM space
- Return type
np.ndarray
- winner(vector: ndarray) ndarray
Compute the winner neuron closest to the vector (Euclidean distance)
- Parameters
vector (np.ndarray) – vector of current data point(s)
- Returns
indices of winning neuron
- Return type
np.ndarray
- winner_map(data: ndarray) ndarray
Get the number of times, a certain neuron in the trained SOM is the winner for the given data.
- Parameters
data (np.ndarray) – data to compute the winner neurons on
- Returns
map with winner counts at corresponding neuron location
- Return type
np.ndarray
- winner_neurons(data: ndarray) ndarray
For every datapoint, get the winner neuron coordinates.
- Parameters
data (np.ndarray) – data to compute the winner neurons on
- Returns
winner neuron coordinates for every datapoint
- Return type
np.ndarray
- som.man_dist_pbc(m: ndarray, vector: ndarray, shape: tuple = (10, 10)) ndarray
Manhattan distance calculation of coordinates with periodic boundary condition
- Parameters
m (np.ndarray) – array / matrix (reference)
vector (np.ndarray) – array / vector (target)
shape (tuple, optional) – shape of the full SOM
- Returns
Manhattan distance for v to m
- Return type
np.ndarray