Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about training #28

Open
MikeDean2367 opened this issue Mar 13, 2024 · 0 comments
Open

Questions about training #28

MikeDean2367 opened this issue Mar 13, 2024 · 0 comments

Comments

@MikeDean2367
Copy link

Hi, it's a great work!

We have three inputs designated as i1, i2, and i3, which are to be processed by the llama-7b. For input i1, I will extract two hidden states at two distinct locations and label them p11 and p12, respectively. Regarding the remaining inputs, i2 and i3, I will select a single hidden state for each, which will be denoted as n21 and n31 correspondingly.

In this setup, p11 paired with n21 constitutes a positive pair, whereas p11 coupled with n22 forms a negative pair. Meanwhile, p12 paired with n22 constitutes a positive pair, whereas p12 coupled with n21 forms a negative pair. My objective is to compute the InfoNCE loss between these pairs.

So I set the get_rep_fn in the class GradCache to handle the different situations. Here is a sample snippet or a piece of example code:

def get_rep_fn(x):
    if x.label == 2:
        return [x.e1, x.e2]
    else:
        return [x.e1]

In the same time, I changed the following code from append to extend:

model_reps.append(self.get_reps(y))

all_reps.append(model_reps)

I'd like to inquire about the correctness of the gradient computation. Could you please confirm if it's being done accurately?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant