Announcement Announcement Module
No announcement yet.
Scheduling most problematic in cluster? Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Scheduling most problematic in cluster?


    I mostly keep my services stateless, thus there's no problem when customer requires from me to run my app on multiple machines under load balancer (not so much because of performance but mostly for fail safety). BTW, I use sticky session always. Separate nodes don't know about each other. And everything goes fine.

    But things get really complicated when there's some stateful service, such as one for scheduling that I stumble upon from time to time. Of course, scheduling service is duplicated on both nodes, but I cannot allow that both nodes process same scheduled task (which is usualy stored in db). Then I make some mess like introducing periodic polling of tasks in db, and marking somehow currently processed task with some flag in db, thus other node would know that it is being processed by first one, and so on.

    How do you handle this situation ? Scheduling is most common things, thus I guess there's something elegant out there ?!


  • #2
    As I understood, the tasks to be executed are stored n the database. I also suppose the have an executed (boolean) or last_executed (timestamp) columns. Then when a server receives an execute task it does a SELECT ... FOR UPDATE, and checks if the task has not been executed in the meanwhile by another server. If the task is not executed, then is processed and the task table is updated accordingly. If the select shows that the task is executed, you just skip it. If you are using Oracle database you might use the NO WAIT.

    Another thing you might also want to distributre the task executioon accross the servers in the cluster. By this I mean that a task is executed on one server only. You also want to keep the task execution status in the database. So, in the rare case when a task could not be executed because of some reason, you can see this in the database and execute it on another server. It might also help if the task execution is "idempotent": if the task somehow gets executed twice, you just wasted some computer cycles, but you haven't damaged your data.