MLBoh

mlboh.mlboh.manual_parallel_cv_processes(estimator, X, y, cv, metric, max_workers=4, *args, **kwargs)[source]

Run cross-validation manually using multiple processes.

Parameters:
  • estimator (BaseEstimator) – Machine learning estimator pipeline to use

  • X (np.ndarray) – Input variables/features to analyze without train/test subdivision

  • y (np.ndarray) – Input labels/ground-truths to analyze without train/test subdivision

  • train_idx (list) – List of indices to use for the train set

  • test_idx (list) – List of indices to use for the test set

  • metric (callable) – Sklearn-like callable function in which the first argument is the y_true list and the second is the y_pred list

  • max_worker (int (default := 4)) – Maximum number of threads to use for the parallelization

  • max_workers (int)

Returns:

score – Output list of the metric function applied on the predicted labels of the provided ML model.

Return type:

np.ndarray

mlboh.mlboh.manual_parallel_cv_threads(estimator, X, y, cv, metric, max_workers=4, *args, **kwargs)[source]

Run cross-validation manually using multiple threads.

Parameters:
  • estimator (BaseEstimator) – Machine learning estimator pipeline to use

  • X (np.ndarray) – Input variables/features to analyze without train/test subdivision

  • y (np.ndarray) – Input labels/ground-truths to analyze without train/test subdivision

  • train_idx (list) – List of indices to use for the train set

  • test_idx (list) – List of indices to use for the test set

  • metric (callable) – Sklearn-like callable function in which the first argument is the y_true list and the second is the y_pred list

  • max_worker (int (default := 4)) – Maximum number of threads to use for the parallelization

  • max_workers (int)

Returns:

score – Output list of the metric function applied on the predicted labels of the provided ML model.

Return type:

np.ndarray

Functions

mlboh.mlboh.manual_parallel_cv_threads(estimator, X, y, cv, metric, max_workers=4, *args, **kwargs)[source]

Run cross-validation manually using multiple threads.

Parameters:
  • estimator (BaseEstimator) – Machine learning estimator pipeline to use

  • X (np.ndarray) – Input variables/features to analyze without train/test subdivision

  • y (np.ndarray) – Input labels/ground-truths to analyze without train/test subdivision

  • train_idx (list) – List of indices to use for the train set

  • test_idx (list) – List of indices to use for the test set

  • metric (callable) – Sklearn-like callable function in which the first argument is the y_true list and the second is the y_pred list

  • max_worker (int (default := 4)) – Maximum number of threads to use for the parallelization

  • max_workers (int)

Returns:

score – Output list of the metric function applied on the predicted labels of the provided ML model.

Return type:

np.ndarray

mlboh.mlboh.manual_parallel_cv_processes(estimator, X, y, cv, metric, max_workers=4, *args, **kwargs)[source]

Run cross-validation manually using multiple processes.

Parameters:
  • estimator (BaseEstimator) – Machine learning estimator pipeline to use

  • X (np.ndarray) – Input variables/features to analyze without train/test subdivision

  • y (np.ndarray) – Input labels/ground-truths to analyze without train/test subdivision

  • train_idx (list) – List of indices to use for the train set

  • test_idx (list) – List of indices to use for the test set

  • metric (callable) – Sklearn-like callable function in which the first argument is the y_true list and the second is the y_pred list

  • max_worker (int (default := 4)) – Maximum number of threads to use for the parallelization

  • max_workers (int)

Returns:

score – Output list of the metric function applied on the predicted labels of the provided ML model.

Return type:

np.ndarray

mlboh.mlboh._train_and_score(estimator, X, y, train_idx, test_idx, metric, *args, **kwargs)[source]

Single fold fit->prediction of the estimator pipeline.

Parameters:
  • estimator (BaseEstimator) – Machine learning estimator pipeline to use

  • X (np.ndarray) – Input variables/features to analyze without train/test subdivision

  • y (np.ndarray) – Input labels/ground-truths to analyze without train/test subdivision

  • train_idx (list) – List of indices to use for the train set

  • test_idx (list) – List of indices to use for the test set

  • metric (callable) – Sklearn-like callable function in which the first argument is the y_true list and the second is the y_pred list

Returns:

score – Output of the metric function applied on the predicted labels of the provided ML model.

Return type:

float

Notes

This function performes an internal copy of the provided estimator. This is particularly import when you want to use a parallelism based on theads in which ALL the involved variables are SHARED among all the threads; if the pipeline is not manually copied, a “slow thread” could find the estimator already fitted, avoiding the re-fit and so introducing errors in the data management (!!!)