You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I am running Yahoo LDA on my hadoop cluster, I found the following problems:
permission denied for executable contained in jar package
To resolve this issue, I added chmod 755 $LDALibs/* at Formatter.sh and LDA.sh
synchronization problem of global/lda.dict.dump
I've found that before the process 0 finished writing global/lda.dict.dump if other processes tried to run the following script:
it cannot download the file and whole process is going crashed. So, I put the synchronization code such as wait_for 60 ${mapred_output_dir}/global/lda.dict.dump.
The critical problem of multi-machine of Yahoo LDA
Finally, I got the following problem, this is not related with running script, so how can I recover this situation?
1020 03:57:06.626588 20423 Merge_Topic_Counts.cpp:103] Initializing global dictionary from lda.dict.dump.global
W1020 03:57:11.659412 20423 Merge_Topic_Counts.cpp:105] global dictionary Initialized
terminate called after throwing an instance of 'Ice::ConnectionLostException'
what(): TcpTransceiver.cpp:248: Ice::ConnectionLostException:
connection lost: Connection reset by peer
Should I modify LDA.sh script to check the error code of each module execution and repeat unless the error code is success?
Thank you!
The text was updated successfully, but these errors were encountered:
On Friday 21 October 2011 03:23 AM, metacret wrote:
Hi
When I am running Yahoo LDA on my hadoop cluster, I found the following problems:
permission denied for executable contained in jar package
To resolve this issue, I added chmod 755 $LDALibs/* at Formatter.sh and LDA.sh
cool!
2. synchronization problem of global/lda.dict.dump
I've found that before the process 0 finished writing global/lda.dict.dump if other processes tried to run the following script:
I dunno why you say that other processes try to get the global
dictionary before its written. There is already a
wait_for_all 60 ${synch_dir}"/global_dict";
that takes care of it.
it cannot download the file and whole process is going crashed. So, I put the synchronization code such as wait_for 60 ${mapred_output_dir}/global/lda.dict.dump.
The critical problem of multi-machine of Yahoo LDA
Finally, I got the following problem, this is not related with running script, so how can I recover this situation?
1020 03:57:06.626588 20423 Merge_Topic_Counts.cpp:103] Initializing global dictionary from lda.dict.dump.global
W1020 03:57:11.659412 20423 Merge_Topic_Counts.cpp:105] global dictionary Initialized
terminate called after throwing an instance of 'Ice::ConnectionLostException'
what(): TcpTransceiver.cpp:248: Ice::ConnectionLostException:
connection lost: Connection reset by peer
This is more of a hadoop problem. Connection can be lost due to many
reasons beyond the control of LDA. So its only the checkpointing &
restart mechanism that will take care of these. You need to worry about
these. LDA is confiured to automatically restart from the last
checkpointed iteartion.
Thanks,
--Shravan
PS: Sorry for the late response. Was really busy with some other stuff
Should I modify LDA.sh script to check the error code of each module execution and repeat unless the error code is success?
Hi
When I am running Yahoo LDA on my hadoop cluster, I found the following problems:
I've found that before the process 0 finished writing global/lda.dict.dump if other processes tried to run the following script:
${HADOOP_CMD} dfs -get ${mapred_output_dir}/global/lda.dict.dump lda.dict.dump.global
it cannot download the file and whole process is going crashed. So, I put the synchronization code such as wait_for 60 ${mapred_output_dir}/global/lda.dict.dump.
Finally, I got the following problem, this is not related with running script, so how can I recover this situation?
1020 03:57:06.626588 20423 Merge_Topic_Counts.cpp:103] Initializing global dictionary from lda.dict.dump.global
W1020 03:57:11.659412 20423 Merge_Topic_Counts.cpp:105] global dictionary Initialized
terminate called after throwing an instance of 'Ice::ConnectionLostException'
what(): TcpTransceiver.cpp:248: Ice::ConnectionLostException:
connection lost: Connection reset by peer
Should I modify LDA.sh script to check the error code of each module execution and repeat unless the error code is success?
Thank you!
The text was updated successfully, but these errors were encountered: