resqpy.multi_processing.function_multiprocessing

resqpy.multi_processing.function_multiprocessing(function: Callable, kwargs_list: List[Dict[str, Any]], recombined_epc: Union[Path, str], cluster, consolidate: bool = True, require_success=False, tmp_dir_path: Union[Path, str] = '.', backend: str = 'dask', clean_up: bool = True) List[bool]

Calls a function concurrently with the specfied arguments.

Parameters
  • function (Callable) – the wrapper function to be called; needs to return: - index (int): the index of the kwargs in the kwargs_list; - success (bool): whether the function call was successful, however that is defined; - epc_file (Path or str): the epc file path where the objects are stored; - uuid_list (list of str): list of UUIDs of relevant objects;

  • kwargs_list (list of dict) – A list of keyword argument dictionaries that are used when calling the function

  • recombined_epc (Path or str) – A pathlib Path or path string of where the combined epc will be saved

  • cluster – if using the Dask backend, a LocalCluster is a Dask cluster on a local machine. If using a job queing system, a JobQueueCluster can be used such as an SGECluster, SLURMCluster, PBSCluster, LSFCluster etc

  • consolidate (bool) – if True and an equivalent part already exists in a model, it is not duplicated and the uuids are noted as equivalent

  • require_success (bool) – if True and any instance fails, then an exception is raised

  • tmp_dir_path (str) – path where the temporary directory is saved; defaults to the calling code directory

  • backend (str) – the joblib parallel backend used. Dask is used by default so a Dask cluster must be passed to the cluster argument

  • clean_up (bool, default True) – if True, the temporary directory used during multi processing is deleted; if False, it is left in place with its contents (to facilitate debugging)

Returns

success_list (List[bool]) – A boolean list of successful function calls

Notes

a multiprocessing pool is used to call the function multiple times in parallel; once all results are returned, they are combined into a single epc file; this function uses the Dask backend by default to run the given function in parallel, so a Dask cluster must be setup and passed as an argument if Dask is used; Dask will need to be installed in the Python environment because it is not a dependency of the project; more info can be found at https://resqpy.readthedocs.io/en/latest/tutorial/multiprocessing.html