Remote workers exist to aid the task-completing process. Their purpose is to give users the flexibility to complete their tasks throughout different networks.
There are two main characteristics each worker shares: the ability to work on a task and the ability to report its current state to an API. Workers are assigned specific tasks, varying from one to several at a time. Then, at a fixed time interval, they send information to their master, or API, about their progress and how they are operating. Schedulers are used to tell each API to refresh the state of the workers. This is done so that users are aware of what’s happening with their tasks.
Remote workers are started in groups by default. Each worker can be in good condition, overwhelmed, offline, or permanently damaged. Delays in task-completion and/or current state reports are certain criteria for abnormal worker states. The longer a delay, the more likely that a worker is lost forever.
Masters work with workers and schedulers, sustaining a virtual heartbeat mechanism. If, for example, the mechanism runs every five seconds, any communication delay for over a minute could be interpreted as a worker having crashed. All jobs this worker is currently executing will be suspended, killed, or finished based on a global system setting. The master will mark them as incomplete. If the time delay in communication exceeds a certain interval (the total timeout), no future updates about the worker states are expected – communication is permanently lost. However, an update in less the total timeout could mean restoring all jobs previously suspended.
Users have the ability to set the timers used by the heartbeat mechanism. They can also use settings that deal with what happens to the tasks an abnormal worker is assigned. Crashed workers can be suspended and their tasks killed. The same applies to offline workers, but they can also be paused until they become online, as no tasks will be lost in the process.