Skip to content

MPI job fails due to startup timeout #13

@Zhengsx-p

Description

@Zhengsx-p

Dear Yang Zhong:
I‘m trying to test the example of GaAs on the supercomputing platform. When running the program on the login node, executing
HamEPC --config EPC_input.yaml
produces a response, with the following output:

################################################## Mobility Calculation ##################################################
Sampling with Cauchy distribution.
Using ERTA method to calculate mobility.

However, when running in parallel with mpirun -np 1 HamEPC --config EPC_input.yaml, there is no response. Besides, when submitting the job via an sbatch script using mpirun -np ncores HamEPC --config EPC_input.yaml, the .log file shows the following error:

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 654045 RUNNING AT l11c79n1
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================
I_MPI_JOB_TIMEOUT = -1 second(s): job ending due to startup timeout

The job failed to start and was terminated due to timeout.
I have tried reinstalling HamEPC and I used intel mpi 2021.8.0, but the issue persists. Could you please provide some guidance on how to resolve this error?

Thanks,
Shixuan Zheng

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions