Announcement Announcement Module
No announcement yet.
Which is the best solution to download/split into chunks/save chunks to cache Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Which is the best solution to download/split into chunks/save chunks to cache

    Hi guys,

    I'm looking for an advice to understand better if spring batch is good for me. Here is a basic requirement/flow I have to implement:

    Input data: A collection that contains "file paths" to download from local share
    The strategy interfaces are PartitionHandler and StepExecutionSplitter (MultiResourcePartitioner), so each step takes one File/Item to download and then proceed it (let's say in parallel I will download 10-15 files)

    1. Download file
    - custom ItemReader will read file
    - save it to local file system
    - pass it to ItemProcessor as a reference/location link

    2. Split/break it to parts (with nio FileChannel)
    - ItemProcess will read file from filesystem based on reference/location link we get from ItemReader
    - break it into chunks (fixed size, like 64kb)
    - create MD5 sum from those parts
    - delete original file from file system
    - pass those parts to ItemWriter - here I see a problem since it's possible to get Out of Memory if input files are large - how to solve this if I don't want to save file parts during ItemProcess and actually it's an ItemWriter job to do this?

    3. Store parts in local cache - memory/disk depends on memory availability in that moment.
    - Update/Add parts location to a Collection/Storage object that can be used outside of Job execution - is it possible?

    Then I need to have a reference to that storage/collection and use it later on in my flow - read parts and do some other transformations.

    Few other questions:
    1. How to launch a job so it can partition an input data from the queue/collection (wait if there is no data available in that moment) as an infinite loop?

    as a prototype I did it this way (but it's not dynamic and stick to a folder), so how to read and partition data in a dynamic way? Read from some external Object storage/Thread-Safe Collection that is updating on the fly by a different process populating it with the list of files to download by step #1?

    <beans:bean name="step1:master"	class="">
    		<beans:property name="jobRepository" ref="jobRepository" />
    		<beans:property name="stepExecutionSplitter">
    			<beans:bean	class="">
    				<beans:constructor-arg ref="jobRepository" />
    				<beans:constructor-arg ref="step1" />
    					<beans:bean	class="">
    						<beans:property name="resources" value="file:D:/Music/*.mp3" />
    		<beans:property name="partitionHandler">
    	<!-- The TaskExecutorPartitionHandler is quite useful for IO intensive Steps, 
    		like copying large numbers of files or replicating filesystems into content 
    		management systems. -->
    			<beans:bean	class="">
    				<beans:property name="taskExecutor" ref="asyncTaskExecutor" />
    				<beans:property name="step" ref="step1" />
    				<!-- <beans:property name="gridSize" value="3" /> -->
    I will appreciate your input. Don't hesitate to ask questions if something is unclear.

    Last edited by lingor; Apr 12th, 2012, 05:28 AM.

  • #2
    guys, do you have any input on my questions raised above ?

    thank you