Announcement Announcement Module
Collapse
No announcement yet.
Extending Job Parameters Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extending Job Parameters

    I would want to extend Job Parameters to store some transient, non-identifying parameters which will pass to each job. In order to achieve, I should have the following assumption:
    JobParameter refered in JobExecution/StepExecuction will always be the instance I passed to JobLauncher, instead of a new instance constructed from persisted data (even in a re-run situation)

    I know it is the case in current milestone (1.0m4). Can I take this assumption for the coming versions?

    Thanks a lot

    Adrian

  • #2
    I don't see that changing any time soon. But I am curious about your use case, what non-identifying information do you need to store that the Step also needs access to?

    Comment


    • #3
      in fact what I am trying to do is provide "real" parameters to the job, instead of an "identifier" currently JobParameters is acting as.

      For example, in a batch that will export certain data from DB to a file, I may only put a sequence number as in JobParameter to act as a Identifier of Job, while I may put the request user's id, date range to export, and format to export etc as 'non-identifying job parameters'

      There is always some case I need to pass something to the job execution, which shold not be considered as identifier of job. like, I may want to store the request user's ID in my task, while re-run of same job may not necessary be sent by same user.

      Comment


      • #4
        You realize that by doing that you are sacrificing the ability to properly restart in the case of failure, right? If the non-identifying parameters contain data relevant to the work being done (e.g. a date range) then when the job is restarted, if a different non-identifying date range is passed in restart may have undesired results...

        Comment


        • #5
          I am bit confused about what you are trying to do. As Doug points out you can't really do restart if the non-identifying parameters you mention are relevant for the job execution (and passing irrelevant parameters doesn't make sense) - so why are you trying to tie the new execution to the same job instance? Why not create a new job instance for new parameter values?
          Concerning your example I don't see why the batch framework would be interested in userId. I think you don't really want to pass the userId to execution - the execution has no use for it. I guess you want to log which user launched the execution - so I would look towards tweaking the launcher. Does that make sense?

          Comment


          • #6
            yup of course I do aware of the restartability of job.
            But sometimes, I may really need non-identifying attributes passed to job.
            For example, requester user id as I mentioned, I may , in the job, have validations to see if the requester has permission to do the job. I don't really care who re-run the job but I need to ensure he is permitted; or, in my record updated, I need to set the requester ID as the 'last updated user', which is not really related to restartability but I shall need such information.

            Secondly, in fact I do want to use it as a work-around for lacking of job-context , which in my situation, I only need a job-scoped context to let my steps (may be my first step doing some data-pre-fetching) and put to that transient job-scoped context to share throughout the job.

            Comment


            • #7
              This still seems like a 'Scheduling Concern' Most good enterprise schedulers provide this functionality (ensuring a user has rights) I really can't recommend launching a job, then getting to a step and bombing out because a user can't run that job. It seems to me like it would be much better to make that determination well before you even launch the job.

              I also don't understand the JobContext issue as well.

              Comment


              • #8
                Originally posted by lucasward View Post
                This still seems like a 'Scheduling Concern' Most good enterprise schedulers provide this functionality (ensuring a user has rights) I really can't recommend launching a job, then getting to a step and bombing out because a user can't run that job. It seems to me like it would be much better to make that determination well before you even launch the job.

                I also don't understand the JobContext issue as well.
                so what if I want my some of my data updated (as my example said) according to the request user? I don't think it is unreasonable to consider that some non-identifiying parameters maybe passed which do not affect restartability.

                for job context issue, as currently Spring Batch is lacking of features of a Job-scoped context, I cannot do some data pre-fetching and stored throughout the job and shared between steps.

                For example, if I am doing a long-running file export which generate several file about, let's say, our company's client's transaction and balances etc. However, what clients will be included in the export depends on the requesting user's right. Instead of retrieve and analyse what are the 'visible clients' for each step individually, I may simply do it once so that all my steps can easily use it for their data export. Such information is way too clumsy and meaningless if I pass it as JobParameter. I prefer not putting such info in a utility that make use of ThreadLocal bcos I am considering parallel steps.

                Comment


                • #9
                  @Non identifying JobParameters

                  I'm not completely against them by any means. When I first coded it up, I thought long and hard about if I should create two distinct sets. In fact, in a version I did months and months ago it did have that. However, here's the issue, if you're 'updated data' based upon this parameter, is not this a new JobInstance? If you're using this value to say, update some data (I still don't understand completely though) then restarting a job that will start at the same place it left off, but with different 'updated data' seems wrong to me. It still seems like, if you're using this parameter *at all* in your step, which I'm assuming you would if you bothered putting it in JobParameters in the first place, then restarting a job with a different parameter would be bad. Keep in mind, it's not preventing you from running the job, but simply causes a new instance to be created.

                  That being said, I'm still open to it, but I would need the law of 'job parameters are either used to identify or modify/control processing' to be broken, and I still can't come up with a use case that breaks it.

                  @JobContext
                  It still sounds like a caching solution would be a bit better than JobContext. I've seen a similar solution used in quite a few scenarios. Further, more caches are built to handle concurrent requests. This is another one I'm still open to (and I think the rest of the team as well) but I'm still not seeing a solid use case. However, it's probably another one of those things we'll look at when creating the feature/improvement list after release 1.

                  Comment


                  • #10
                    Originally posted by lucasward View Post
                    @Non identifying JobParameters

                    I'm not completely against them by any means. When I first coded it up, I thought long and hard about if I should create two distinct sets. In fact, in a version I did months and months ago it did have that. However, here's the issue, if you're 'updated data' based upon this parameter, is not this a new JobInstance? If you're using this value to say, update some data (I still don't understand completely though) then restarting a job that will start at the same place it left off, but with different 'updated data' seems wrong to me. It still seems like, if you're using this parameter *at all* in your step, which I'm assuming you would if you bothered putting it in JobParameters in the first place, then restarting a job with a different parameter would be bad. Keep in mind, it's not preventing you from running the job, but simply causes a new instance to be created.

                    That being said, I'm still open to it, but I would need the law of 'job parameters are either used to identify or modify/control processing' to be broken, and I still can't come up with a use case that breaks it.

                    @JobContext
                    It still sounds like a caching solution would be a bit better than JobContext. I've seen a similar solution used in quite a few scenarios. Further, more caches are built to handle concurrent requests. This is another one I'm still open to (and I think the rest of the team as well) but I'm still not seeing a solid use case. However, it's probably another one of those things we'll look at when creating the feature/improvement list after release 1.
                    Thanks a lot Lucasward,

                    Hope I am not creating too much troubles here :P but I just want to find way to achieve what I want as I am building a 'long-running job framework' for our new system to use

                    @JobParam
                    To be more precise, in fact we are building some kind of settlement system, which we need to run a day-end batch job once and only once everyday. I need to make use of the restart feature as I don't want completed steps to be run twice. However, job re-run may not be issued by same operator staff here. During our batch, we may need to update our data, and in our tables, we have a information field denoting who is the last user updating this record. Therefore, for example, if different user re-run a failed dayend batch job, although the parameter to the job (the user id) is different, I need to make them the same job instance because it is the day-end job of same day (which I shall give the date as the identifying job parameter)

                    @JobContext
                    External caching maybe a valid choice. Assume I build some kind of "cache manager", make it a singleton and inject to my steps, is it reasonable if I use the Job Exeuction ID as the key to put stuff into the cache?

                    Comment


                    • #11
                      @JobContext

                      I thought something like that existed already in the open source space, but I could be mistaken. The JobExecution Id should be unique to the execution, so it should be safe as a key.

                      @JobParams

                      Right, I have a few clients with a similar scenario, but they created their own table to record this kind of information separate from the framework. It's really scheduling meta-data, not processing meta-data. With that being said, I understand it's a common scenario, and I think long-term we'll address it, and a few other issues that make integration with various ways of launching a job easier. However, for right now scheduling meta-data must be stored separate from the framework.

                      Comment

                      Working...
                      X