Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the results of Y!LDA with multi machines #10

Open
yanbo68 opened this issue Apr 12, 2012 · 6 comments
Open

the results of Y!LDA with multi machines #10

yanbo68 opened this issue Apr 12, 2012 · 6 comments

Comments

@yanbo68
Copy link

yanbo68 commented Apr 12, 2012

Hi,

 I am using Y!LDA in Hadoop with 3 computers.
 I got the results of "train mode" and found it a little bit confusion.  I ran the script with --topics=20, and found that the files "lda.docToTop.txt, lda.topToWor.txt, lda.worToTop.txt" exist in 3 different directories. Each directory has 20 topics. Is it correct? 
What am I supposed to get the "test" result from the "trained model"? Still 3 different directories?

Hope somebody can help me. Thanks a lot!

Yanbo

@shravanmn
Copy link
Collaborator

Yes that is correct.

Lda.docToTop & lda.worToTop are local to each machine. Essentially topic assignments for documents in the chunk assigned to a machine.

Lda.topToWor is expected to be similar across the 3 machines. For an interpretation of the topic model you can use any one of them.

But there is only one global model built which is stored in the global folder along with the global dictionary. This is the one used while testing.

--Shravan

-----Original Message-----
From: yanbo68 [mailto:[email protected]]
Sent: Thursday, April 12, 2012 1:06 PM
To: Shravan Narayanamurthy
Subject: [Yahoo_LDA] the results of Y!LDA with multi machines (#10)

Hi,

 I am using Y!LDA in Hadoop with 3 computers.
 I got the results of "train mode" and found it a little bit confusion.  I ran the script with --topics=20, and found that the files "lda.docToTop.txt, lda.topToWor.txt, lda.worToTop.txt" exist in 3 different directories. Each directory has 20 topics. Is it correct? 
What am I supposed to get the "test" result from the "trained model"? Still 3 different directories?

Hope somebody can help me. Thanks a lot!

Yanbo


Reply to this email directly or view it on GitHub:
https://github.com/shravanmn/Yahoo_LDA/issues/10

@yanbo68
Copy link
Author

yanbo68 commented Apr 13, 2012

Thanks a lot!

I checked the lda.topToword file.
For the result of "train mode", each topic has almost 4 different words for different machine.
But "test mode" is much better, only 1 different word for each topic. I think I can interpret the topic model using "test mode" result.

Btw, for the topic counts table, though there are 3 tables after "train mode", I found that it seems the system will merge the 3 tables together during the "test mode"?
The LOG says :"Initializing Word-Topic counts table from 3 dumps with topic_counts/lda.ttc.dump as prefix ......"
So each machine is using the same big table?

@shravanmn
Copy link
Collaborator

In line...

-----Original Message-----
From: yanbo68 [mailto:[email protected]]
Sent: Friday, April 13, 2012 8:29 AM
To: Shravan Narayanamurthy
Subject: Re: [Yahoo_LDA] the results of Y!LDA with multi machines (#10)

Thanks a lot!

I checked the lda.topToword file.
For the result of "train mode", each topic has almost 4 different words for different machine.

[shrav] How many iterations did you run?

But "test mode" is much better, only 1 different word for each topic. I think I can interpret the topic model using "test mode" result.

Btw, for the topic counts table, though there are 3 tables after "train mode", I found that it seems the system will merge the 3 tables together during the "test mode"?
The LOG says :"Initializing Word-Topic counts table from 3 dumps with topic_counts/lda.ttc.dump as prefix ......"
So each machine is using the same big table?

[shrav] Yes. A global table is created and a local table per machine is induced using the global table.

--Shravan


Reply to this email directly or view it on GitHub:
https://github.com/shravanmn/Yahoo_LDA/issues/10#issuecomment-5107635

@yanbo68
Copy link
Author

yanbo68 commented Apr 13, 2012

I ran 200 iterations

@shravanmn
Copy link
Collaborator

If you run about 500 to 600 iterations, the words will look similar in the different topToWor files. This is what we have observed.
--Shravan

-----Original Message-----
From: yanbo68 [mailto:[email protected]]
Sent: Friday, April 13, 2012 4:07 PM
To: Shravan Narayanamurthy
Subject: Re: [Yahoo_LDA] the results of Y!LDA with multi machines (#10)

I ran 200 iterations


Reply to this email directly or view it on GitHub:
https://github.com/shravanmn/Yahoo_LDA/issues/10#issuecomment-5112216

@yanbo68
Copy link
Author

yanbo68 commented Apr 15, 2012

Thanks a lot! I will try more iterations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants