Announcement Announcement Module
Collapse
No announcement yet.
why DistributedCache don't work ?? Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • why DistributedCache don't work ??

    Code:
     <hdp:job id="news_result_sim_job"
    	    input-path="${sim.output.path}/news/convert/" output-path="${sim.output.path}/news/result/" 
    		mapper="com.xxx.wap.algorithm.mapred.sim.SimResultJob.MapClass"
    		reducer="com.xxx.wap.algorithm.mapred.sim.SimResultJob.Reduce"	
    		combiner="com.xxx.wap.algorithm.mapred.sim.SimResultJob.Combine"
    		jar="file:/data/DATA/smc/whftest/newrecom/algorithmUtils/algorithmUtils-1.0-SNAPSHOT.jar"
    		input-format="org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat"
    		output-format="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat"	
    		map-key="com.sohu.wap.algorithm.model.sim.SimKeyPair"
    		map-value="org.apache.hadoop.io.DoubleWritable"
    		key="com.xxx.wap.algorithm.model.sim.SimKeyPair"
    		value="org.apache.hadoop.io.DoubleWritable"
    		number-reducers="1"
    		configuration-ref="hadoopConfiguration"
    		>
    	</hdp:job>
    	
    	<hdp:cache configuration-ref="hadoopConfiguration" file-system-ref="fs">
           <hdp:cache value="/tmp/wanghf/sim/output/news/length/part-r-00000.lzo_deflate" />
        </hdp:cache>
    java code
    Code:
    private static Map<Integer, Double> loadCache(Configuration conf) {
    		Map<Integer, Double> gidLength = new HashMap<Integer, Double>();
    		try {
    		    System.out.println("$$$$$$$$$$$$$$$$$$$$"+conf.get("mapred.cache.localFiles"));
    			Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
    why localFiles is null?why?why?why?,please,

  • #2
    is this spring hadoop bug?

    Comment


    • #3
      i execute jobs in order of job1--->job2-->job3.but cache file only use in job3,and the cache file is created by job2.what' wrong,please tell me

      Comment


      • #4
        no one reply me ??

        Comment


        • #5
          Can you summary what exactly is the problem and specifically how are you creating the cache in job2? Is this through SHDP and if so how exactly?
          Also have you tried the other DistributedCache entries (such as classpath)?

          Comment


          • #6
            thank your reply.
            my configuration
            Code:
            <hdp:configuration
            	resources="classpath:/hdp/capacity-scheduler.xml,classpath:/hdp/core-site.xml,classpath:/hdp/hadoop-policy.xml,classpath:/hdp/hdfs-site.xml,classpath:/hdp/mapred-site.xml" id="hadoopConfiguration" >
            	</hdp:configuration>
            <hdp:job id="news_length_sim_job" 
            	    input-path="${sim.input.path}/news/length/" output-path="${sim.output.path}/news/length/" 
            		mapper="com.sohu.wap.algorithm.mapred.sim.SimLengthJob.MapClass"
            		reducer="com.sohu.wap.algorithm.mapred.sim.SimLengthJob.Reduce"
            		jar="${jar.file.path}"
            		input-format="org.apache.hadoop.mapreduce.lib.input.TextInputFormat"
            		output-format="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat"
            		map-key="org.apache.hadoop.io.Text"
            		map-value="com.xx.wap.algorithm.model.sim.SimKeyValue"
            		key="org.apache.hadoop.io.Text"
            		value="org.apache.hadoop.io.DoubleWritable"
            		number-reducers="1"		
            		scope="prototype"
            		/>    
            	
            	<hdp:job id="news_convert_sim_job" 
            	    input-path="${sim.input.path}/news/length/" output-path="${sim.output.path}/news/convert/" 
            		mapper="com.sohu.wap.algorithm.mapred.sim.SimConvertJob.MapClass"
            		reducer="com.sohu.wap.algorithm.mapred.sim.SimConvertJob.Reduce"
            		jar="${jar.file.path}"	
            		input-format="org.apache.hadoop.mapreduce.lib.input.TextInputFormat"
            		output-format="org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat"
            		map-key="org.apache.hadoop.io.Text"
            		map-value="com.sohu.wap.algorithm.model.sim.SimKeyValue"
            		key="org.apache.hadoop.io.Text"
            		value="com.sohu.wap.algorithm.model.sim.SimKeyValueSet"			
            		scope="prototype"
            		/>
            
            	
                
                <hdp:job id="news_result_sim_job"
            	    input-path="${sim.output.path}/news/convert/" output-path="${sim.output.path}/news/result/" 
            		mapper="com.sohu.wap.algorithm.mapred.sim.SimResultJob.MapClass"
            		reducer="com.sohu.wap.algorithm.mapred.sim.SimResultJob.Reduce"	
            		combiner="com.sohu.wap.algorithm.mapred.sim.SimResultJob.Combine"
            		jar="${jar.file.path}"
            		input-format="org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat"
            		output-format="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat"	
            		map-key="com.sohu.wap.algorithm.model.sim.SimKeyPair"
            		map-value="org.apache.hadoop.io.DoubleWritable"
            		key="com.sohu.wap.algorithm.model.sim.SimKeyPair"
            		value="org.apache.hadoop.io.DoubleWritable"
            		number-reducers="1"		
            		scope="prototype"
            		>
            	</hdp:job>
            <hdp:cache  >
            		<hdp:cache value="${length.file.path}" />
            	</hdp:cache>
            my problem:
            the job news_result_sim_job 's configuration can't get cache file,because its configuration has not set cache file path.

            finally i resolve it by setting "<hdp:cache >" between "<hdp job id=news_result_sim_job" and "<hdp job id=news_convert_sim_job"
            the cache file value="${length.file.path}" is generated by news_convert_sim_job.i must give news_result_sim_job a new hadoopconfiguration to resolve this problem.

            Comment


            • #7
              Interesting.

              hdp:cache is evaluated at startup so I'm not sure if you have actually solved your problem or are just seeing the side-effects of a previous run.
              If I understand correctly you want to add dynamic entries to the cache evaluated through the properly holder? Or is it SpEL you are looking at?
              Since the properties are also evaluated at startup while SpEL is evaluated dynamically but since hdp:cache is a singleton it means also startup.

              That is, the order of the definition of hdp:cache doesn't matter - it can be the first or last, it does not matter. Same goes for hdp:job - even though they are prototype, their declaration order is irrelevant as they are not executed through the declaration.

              Comment


              • #8
                i solved by doing this---creating a new hadoopConfiguration for news_result_sim_job
                my config:
                Code:
                <hdp:configuration
                	resources="classpath:/hdp/capacity-scheduler.xml,classpath:/hdp/core-site.xml,classpath:/hdp/hadoop-policy.xml,classpath:/hdp/hdfs-site.xml,classpath:/hdp/mapred-site.xml" id="hadoopConfiguration" >
                	</hdp:configuration>
                    
                   <hdp:configuration id="simResultConfiguration" configuration-ref="hadoopConfiguration" >	
                	</hdp:configuration>
                
                	<hdp:job id="news_length_sim_job" 
                	    input-path="${sim.input.path}/news/length/" output-path="${sim.output.path}/news/length/" 
                		mapper="com.sohu.wap.algorithm.mapred.sim.SimLengthJob.MapClass"
                		reducer="com.sohu.wap.algorithm.mapred.sim.SimLengthJob.Reduce"
                		jar="${jar.file.path}"
                		input-format="org.apache.hadoop.mapreduce.lib.input.TextInputFormat"
                		output-format="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat"
                		map-key="org.apache.hadoop.io.Text"
                		map-value="com.sohu.wap.algorithm.model.sim.SimKeyValue"
                		key="org.apache.hadoop.io.Text"
                		value="org.apache.hadoop.io.DoubleWritable"
                		number-reducers="1"
                		configuration-ref="hadoopConfiguration"
                		scope="prototype"
                		/>    
                	
                	<hdp:job id="news_convert_sim_job" 
                	    input-path="${sim.input.path}/news/length/" output-path="${sim.output.path}/news/convert/" 
                		mapper="com.sohu.wap.algorithm.mapred.sim.SimConvertJob.MapClass"
                		reducer="com.sohu.wap.algorithm.mapred.sim.SimConvertJob.Reduce"
                		jar="${jar.file.path}"	
                		input-format="org.apache.hadoop.mapreduce.lib.input.TextInputFormat"
                		output-format="org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat"
                		map-key="org.apache.hadoop.io.Text"
                		map-value="com.sohu.wap.algorithm.model.sim.SimKeyValue"
                		key="org.apache.hadoop.io.Text"
                		value="com.sohu.wap.algorithm.model.sim.SimKeyValueSet"	
                		configuration-ref="hadoopConfiguration"
                		scope="prototype"
                		/>
                
                	<hdp:cache configuration-ref="simResultConfiguration" >
                		<hdp:cache value="${length.file.path}" />
                	</hdp:cache>
                    
                    <hdp:job id="news_result_sim_job"
                	    input-path="${sim.output.path}/news/convert/" output-path="${sim.output.path}/news/result/" 
                		mapper="com.sohu.wap.algorithm.mapred.sim.SimResultJob.MapClass"
                		reducer="com.sohu.wap.algorithm.mapred.sim.SimResultJob.Reduce"	
                		combiner="com.sohu.wap.algorithm.mapred.sim.SimResultJob.Combine"
                		jar="${jar.file.path}"
                		input-format="org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat"
                		output-format="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat"	
                		map-key="com.sohu.wap.algorithm.model.sim.SimKeyPair"
                		map-value="org.apache.hadoop.io.DoubleWritable"
                		key="com.sohu.wap.algorithm.model.sim.SimKeyPair"
                		value="org.apache.hadoop.io.DoubleWritable"
                		number-reducers="1"
                		configuration-ref="simResultConfiguration"
                		scope="prototype"
                		>
                	</hdp:job>

                Comment


                • #9
                  What happens if you don't use a separate configuration?
                  It looks like you're just cloning the initial config and adding the cache value - this should work with the initial configuration as well. Unless the cache entry gets replaced somehow.

                  Comment

                  Working...
                  X