You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was working on the "05- Neuron Factors.ipynb" notebook and noticed the presence of character Ġ before each token in the output. The output is for the code "nmf_1.explore()". I am not quite sure why it is doing that. Please check the screenshot below.
Your help is appreciated.
The text was updated successfully, but these errors were encountered:
Happened to me using GPT-2 and solved this issue by adding the following line: if self.config['token_prefix'] is not None and token[0] == self.config['token_prefix']: token = token[1:]
right after the first loop line of nmf.explore() method: for idx, token in enumerate(self.tokens[input_sequence]): # self.tokens[:-1] if self.config['token_prefix'] is not None and token[0] == self.config['token_prefix']: token = token[1:] type = "input" if idx < self.n_input_tokens else 'output' tokens.append({'token': token, 'token_id': int(self.token_ids[input_sequence][idx]), # 'token_id': int(self.token_ids[idx]), 'type': type, # 'value': str(components[0][comp_num][idx]), # because json complains of floats 'position': idx })
Yeah, that shouldn't happen. A bunch of tokenizers have a character like Ġ in the beginning of a token to indicate that the token is linked to whatever token comes before them in the sequence. Which is why rendering the output needs to run in tandem with the tokenizer and its settings.
I was working on the "05- Neuron Factors.ipynb" notebook and noticed the presence of character Ġ before each token in the output. The output is for the code "nmf_1.explore()". I am not quite sure why it is doing that. Please check the screenshot below.
Your help is appreciated.
The text was updated successfully, but these errors were encountered: