Announcement Announcement Module
Collapse
No announcement yet.
PigTemplate and multithreading Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • PigTemplate and multithreading

    Hi guys,

    I use PigTemplate to execute programmatically pig scripts. I use same instance from multiple threads. I have troubles that my hadoop cluster is rejecting map reduce jobs generated by Pig due to missing configuration properties. When I use only one thread jobs are executed correctly. I would like to make sure that I use pig template correctly.

    Fragment of my context in xml
    Code:
    <hdp:configuration id="hadoopConfiguration" resources="${customHadoopConfigYarn}, ${customHadoopConfigHdfs}, ${customHadoopConfigMapred}, ${customHadoopConfigCore}" />
    
    <hdp:pig-factory id="pigFactory" user="${hdfs.user}"/>
    <hdp:pig-template id="pigTemplate"/>
    Usage in java code
    Code:
    @Autowired
    private PigTemplate pigTemplate;
    
    pigTemplate.execute(new PigCallback<Void>() {
                @Override
                public Void doInPig(PigServer pig) throws IOException {
                    InputStream script = ....
                    pig.registerScript(script);
                }
            });
    I dug little bit deeper and I've found that PigServerFactoryBean which creates fresh instance of PigServer for PigTemplate uses single instance of PigContext. I've found that PigContext#connect which is called by PigTemplate might be not thread safe. You can find that during that call there is a modification of hadoop configuration (org.apache.pig.backend.hadoop.executionengine.HEx ecutionEngine#recomputeProperties). If I'm right it could lead to issue which I'm observing. Or am I missing something.

    Thanks

    I use these versions:
    spring-data-hadoop-2.0.0.M4-hadoop22
    pig-0.12.0-cdh5.0.0
    cloudera 5
Working...
X