2. Documentation for the som module

class som.SOM(x: int, y: int, alpha_start: float = 0.6, sigma_start: Optional[float] = None, seed: Optional[int] = None)

Class implementing a self-organizing map with periodic boundary conditions. It has the following methods:

cycle(vector: ndarray, verbose: bool = True)

Perform one iteration in adapting the SOM towards a chosen data point

Parameters
  • vector (np.ndarray) – current data point

  • verbose (bool) – verbosity control

distance_map(metric: str = 'euclidean')

Get the distance map of the neuron weights. Every cell is the normalised average of all distances between the neuron and all other neurons.

Parameters

metric (str) – distance metric to be used (see scipy.spatial.distance.cdist)

Returns

normalized sum of distances for every neuron to its neighbors, stored in SOM.distmap

fit(data: ndarray, epochs: int = 0, save_e: bool = False, interval: int = 1000, decay: str = 'hill', verbose: bool = True)

Train the SOM on the given data for several iterations

Parameters
  • data (np.ndarray) – data to train on

  • epochs (int, optional) – number of iterations to train; if 0, epochs=len(data) and every data point is used once

  • save_e (bool, optional) – whether to save the error history

  • interval (int, optional) – interval of epochs to use for saving training errors

  • decay (str, optional) – type of decay for alpha and sigma. Choose from ‘hill’ (Hill function) and ‘linear’, with ‘hill’ having the form y = 1 / (1 + (x / 0.5) **4)

  • verbose (bool) – verbosity control

get_neighbors(datapoint: ndarray, data: ndarray, labels: ndarray, d: int = 0) ndarray

return the labels of the neighboring data instances at distance d for a given data point of interest

Parameters
  • datapoint (np.ndarray) – descriptor vector of the data point of interest to check for neighbors

  • data (np.ndarray) – reference data to compare datapoint to

  • labels (np.ndarray) – array of labels describing the target classes for every data point in data

  • d (int) – length of Manhattan distance to explore the neighborhood (0: same neuron as data point)

Returns

found neighbors (labels)

Return type

np.ndarray

initialize(data: ndarray, how: str = 'pca')

Initialize the SOM neurons

Parameters
  • data (numpy.ndarray) – data to use for initialization

  • how (str) – how to initialize the map, available: pca (via 4 first eigenvalues) or random (via random values normally distributed in the shape of data)

Returns

initialized map in SOM.map

load(filename: str)

Load a SOM instance from a pickle file.

Parameters

filename (str) – filename (best to end with .p)

Returns

updated instance with data from filename

plot_class_density(data: ndarray, targets: Union[list, ndarray], t: int = 1, name: str = 'actives', colormap: str = 'gray', example_dict: Optional[dict] = None, filename: Optional[str] = None)

Plot a density map only for the given class

Parameters
  • data (np.ndarray) – data to visualize the SOM density (number of times a neuron was winner)

  • targets (list, np.ndarray) – array of target classes (0 to len(targetnames)) corresponding to data

  • t (int) – target class to plot the density map for

  • name (str) – target name corresponding to target given in t

  • colormap (str) – colormap to use, select from matplolib sequential colormaps

  • example_dict (dict) – dictionary containing names of examples as keys and corresponding descriptor values as values. These examples will be mapped onto the density map and marked

  • filename (str) – optional, if given, the plot is saved to this location

Returns

plot shown or saved if a filename is given

plot_density_map(data: ndarray, colormap: str = 'gray', filename: Optional[str] = None, example_dict: Optional[dict] = None, internal: bool = False)

Visualize the data density in different areas of the SOM.

Parameters
  • data (np.ndarray) – data to visualize the SOM density (number of times a neuron was winner)

  • colormap (str) – colormap to use, select from matplolib sequential colormaps

  • filename (str) – optional, if given, the plot is saved to this location

  • example_dict (dict) – dictionary containing names of examples as keys and corresponding descriptor values as values. These examples will be mapped onto the density map and marked

  • internal (bool) – if True, the current plot will stay open to be used for other plot functions

Returns

plot shown or saved if a filename is given

plot_distance_map(colormap: str = 'gray', filename: Optional[str] = None)

Plot the distance map after training.

Parameters
  • colormap (str) – colormap to use, select from matplolib sequential colormaps

  • filename (str) – optional, if given, the plot is saved to this location

Returns

plot shown or saved if a filename is given

plot_error_history(color: str = 'orange', filename: Optional[str] = None)

plot the training reconstruction error history that was recorded during the fit

Parameters
  • color (str) – color of the line

  • filename (str) – optional, if given, the plot is saved to this location

Returns

plot shown or saved if a filename is given

plot_point_map(data: ndarray, targets: Union[list, ndarray], targetnames: Union[list, ndarray], filename: Optional[str] = None, colors: Optional[Union[list, ndarray]] = None, markers: Optional[Union[list, ndarray]] = None, colormap: str = 'gray', example_dict: Optional[dict] = None, density: bool = True, activities: Optional[Union[list, ndarray]] = None)

Visualize the som with all data as points around the neurons

Parameters
  • data (np.ndarray) – data to visualize with the SOM

  • targets (list, np.ndarray) – array of target classes (0 to len(targetnames)) corresponding to data

  • targetnames (list, np.ndarray) – names describing the target classes given in targets

  • filename (str, optional) – if provided, the plot is saved to this location

  • colors (list, np.ndarray, None; optional) – if provided, different classes are colored in these colors

  • markers (list, np.ndarray, None; optional) – if provided, different classes are visualized with these markers

  • colormap (str) – colormap to use, select from matplolib sequential colormaps

  • example_dict (dict) – dictionary containing names of examples as keys and corresponding descriptor values as values. These examples will be mapped onto the density map and marked

  • density (bool) – whether to plot the density map with winner neuron counts in the background

  • activities (list, np.ndarray, None; optional) – list of activities (e.g. IC50 values) to use for coloring the points accordingly; high values will appear in blue, low values in green

Returns

plot shown or saved if a filename is given

save(filename: str)

Save the SOM instance to a pickle file.

Parameters

filename (str) – filename (best to end with .p)

Returns

saved instance in file with name filename

som_error(data: ndarray) float

Calculates the overall error as the average difference between the winning neurons and the data points

Parameters

data (np.ndarray) – data to calculate the overall error for

Returns

normalized error

Return type

float

transform(data: ndarray) ndarray

Transform data in to the SOM space

Parameters

data (np.ndarray) – data to be transformed

Returns

transformed data in the SOM space

Return type

np.ndarray

winner(vector: ndarray) ndarray

Compute the winner neuron closest to the vector (Euclidean distance)

Parameters

vector (np.ndarray) – vector of current data point(s)

Returns

indices of winning neuron

Return type

np.ndarray

winner_map(data: ndarray) ndarray

Get the number of times, a certain neuron in the trained SOM is the winner for the given data.

Parameters

data (np.ndarray) – data to compute the winner neurons on

Returns

map with winner counts at corresponding neuron location

Return type

np.ndarray

winner_neurons(data: ndarray) ndarray

For every datapoint, get the winner neuron coordinates.

Parameters

data (np.ndarray) – data to compute the winner neurons on

Returns

winner neuron coordinates for every datapoint

Return type

np.ndarray

som.man_dist_pbc(m: ndarray, vector: ndarray, shape: tuple = (10, 10)) ndarray

Manhattan distance calculation of coordinates with periodic boundary condition

Parameters
  • m (np.ndarray) – array / matrix (reference)

  • vector (np.ndarray) – array / vector (target)

  • shape (tuple, optional) – shape of the full SOM

Returns

Manhattan distance for v to m

Return type

np.ndarray