Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Implementing EAT_SSL #15

Draft
wants to merge 23 commits into
base: main
Choose a base branch
from
Draft

Conversation

PariaValizadeh
Copy link

@PariaValizadeh PariaValizadeh commented Oct 9, 2024

This branch included Implementation of the EAT model, which extracted from its repo.

EAT_SSL Info

  • README file
  • Add Requirements

Fine_tuned model checkpoints

  • EAT_large_epoch20 (fine-tuning on AS-2M)
  • backbone ViT_L, 309M Parameters, 49.5 % mAP
  • EAT_base_epoch30 (fine-tuning on AS-2M)
  • backbone ViT_L, 309M Parameters, 48.9 % mAP

model and dependencies

  • Add EAT model from EAT repo (EAT_audio_classification.py)
  • Add EAT model dependencies from EAT repo (other files in the models)
  • Add utils from EAT repo
  • Add eat.py (the model based on original model, containing preprocessing info for input)

embedding model

  • Add embedding model for eat

Efforts to get fairseq to run

Reimplementing EAT

To reimplement EAT we use the models code from https://github.com/[nhaH-luaP/PyEat](https://github.com/nhaH-luaP/PyEat) as it only relies on a small local fairseq.

  • Test what parts are in the checkpoint
  • Download and add the fariseq and model files
  • Implement correct parameter parsing when creating the model
  • Change the eat.yaml to fit the new implementation
  • Correctly load the checkpoint, implement get_embeddings() and forward()
  • Check different experiments using EAT (Watkins, Bats...)
  • Try out different settings
  • Recreate ESC accuracies from the paper
    • implement cross-validation on esc50 (take another look in the SSL-Eat and BEATS paper)
    • check normalization (check paper and original implementation)

    parser.add_argument('--norm_mean', type=float, help='mean value for normalization', default=-4.268) parser.add_argument('--norm_std', type=float, help='standard deviation for normalization', default=4.569)

  • Add license, more documentation
  • ...

@PariaValizadeh PariaValizadeh changed the title WIP: Implement Eat_ssl WIP: Implementing EAT_SSL Oct 9, 2024
@PariaValizadeh PariaValizadeh force-pushed the EAT_SSL_Implementation branch 3 times, most recently from f955b2c to 135d015 Compare October 11, 2024 10:27
@raphaelschwinger
Copy link

@XgamerTV
Copy link

XgamerTV commented Oct 26, 2024

The repo from Lukas' students was extremely helpful and allowed me to implement the model without too much hassle. The accuracies of the AS FT checkpoint for Watkins are ~63% at the moment, which is not great so we should experiment with different settings and see if the AS/ESC accuracies match the ones from the paper!

@raphaelschwinger
Copy link

@XgamerTV Perfect, thanks! Good idea to test AS/ESC50 performance!

@PariaValizadeh
Copy link
Author

@XgamerTV I add new experiment on ESC-50. it has the same accuracy, around 64%

@XgamerTV
Copy link

The dataset mean (0) and std (0.5) values seemed to have been the problem. Where did you get the values @PariaValizadeh? I used the AS values for now and the watkins accuracy is ~86 now :)

@PariaValizadeh
Copy link
Author

The dataset mean (0) and std (0.5) values seemed to have been the problem. Where did you get the values @PariaValizadeh? I used the AS values for now and the watkins accuracy is ~86 now :)

I saw that in the article, I also have a doubt on that so I change it to check if it will change the accuracy or not but for the value I put, It haven't changed

@PariaValizadeh
Copy link
Author

PariaValizadeh commented Oct 28, 2024

The dataset mean (0) and std (0.5) values seemed to have been the problem. Where did you get the values @PariaValizadeh? I used the AS values for now and the watkins accuracy is ~86 now :)

I saw that in the article, I also have a doubt on that so I change it to check if it will change the accuracy or not but for the value I put, It haven't changed

@XgamerTV In the training details headline, it said 'the audio spectrogram patches are then normalized with a mean value of 0 and a standard deviation of 0.5, following the approach used in previous works.' but not sure about when we put linear classifier on the top

@XgamerTV
Copy link

Ok I found it in the article but I think the values are weird and it does heavily change the accuracy. Regarding your change of values and not noticing any differences in the accuracy: When using the embedding_datamodule not all changes to the model result in a new extraction of the embeddings. This is because the usual fingerprint method didn't work for the embedding datasets which is why we used the important params in the name. If you change a value and see that no ">> Extracting Embeddings for train Split" is logged the old set has to be deleted manually in the data_birdset folder.

@PariaValizadeh
Copy link
Author

PariaValizadeh commented Oct 28, 2024

Ok I found it in the article but I think the values are weird and it does heavily change the accuracy. Regarding your change of values and not noticing any differences in the accuracy: When using the embedding_datamodule not all changes to the model result in a new extraction of the embeddings. This is because the usual fingerprint method didn't work for the embedding datasets which is why we used the important params in the name. If you change a value and see that no ">> Extracting Embeddings for train Split" is logged the old set has to be deleted manually in the data_birdset folder.

Thank you, I will check it tomorrow, also will check if it works for AS values or not.

@XgamerTV
Copy link

You can check if we can AS if you want but I don't think its necessary. The ESC-50 accuracy is 98.5 which is better than theirs in the Paper and they do Finetuning 🤣 On the ESC the embedding datamodule is a bit weird I will check it next time can you also get the accuracies?

@PariaValizadeh
Copy link
Author

Ok I found it in the article but I think the values are weird and it does heavily change the accuracy. Regarding your change of values and not noticing any differences in the accuracy: When using the embedding_datamodule not all changes to the model result in a new extraction of the embeddings. This is because the usual fingerprint method didn't work for the embedding datasets which is why we used the important params in the name. If you change a value and see that no ">> Extracting Embeddings for train Split" is logged the old set has to be deleted manually in the data_birdset folder.

I deleted it manually and changed mean and STD which shows no difference in results.

@XgamerTV
Copy link

Yeah there is something wrong with ESC if you try it with BEANs you should see the difference 🤔

@PariaValizadeh
Copy link
Author

u try it with BEANs you should see the difference

yes I saw that. I also ask some other questions in Mattermost :Some thing is not completely clear to me about our talk in git. first:by how many epochs you get 98.5 % accuracy on ESC50 because i got 64 for 4 epoch? second: by getting the accuracy, you mean I need to check what's the problem which it doesn't have the paper results or just check the accuracy in different datasets?

@PariaValizadeh
Copy link
Author

@XgamerTV I also think about getting the accuracy for beats on ESC50, which encounter errors, first asking for embedding size, when I determine it, error for not matching the classifier and embedding matrix

@XgamerTV
Copy link

I couldn't recreate the accuracies myself and I believe it was a weird caching error.

@PariaValizadeh
Copy link
Author

PariaValizadeh commented Nov 12, 2024

@XgamerTV I saw d04b852 but when I set that False still using loading data from cache until I delete that one from cache, and for loading it does not split the data into 3 part which strsnge. I add the spliting part myself and it works! maybe I am not on the updated version or you have sth else in your code but for me during loading data, data set just contains test and train folder without updating this in embedding_datamodule.py :
`> def prepare_data(self):

    Same as prepare_data in BaseDataModuleHF but checks if path exists and skips rest otherwise
    
    log.info("Check if preparing has already been done.")
    if self._prepare_done:
        log.info("Skip preparing.")
        return
            # Check if the embeddings for the dataset have already been computed
    if os.path.exists(self.embeddings_save_path):
        log.info(f"Embeddings found in {self.embeddings_save_path}, loading from disk")
        dataset = load_from_disk(self.embeddings_save_path)
    else:
        log.info("Prepare Data")
        dataset = self._load_data()
        ###dataset = self._create_splits(dataset)
        ###log.info("print Data")
        dataset = self._compute_embeddings(dataset)
    dataset = self._preprocess_data(dataset)
    # set the length of the training set to be accessed by the model
    self.len_trainset = len(dataset["train"])
    self._save_dataset_to_disk(dataset)

    # set to done so that lightning does not call it again
    self._prepare_done = True `

this is also the part tha shows it has only 2split :
Repo card metadata block was not found. Setting CardData to empty. [2024-11-12 11:48:41,584][huggingface_hub.repocard][WARNING] - Repo card metadata block was not found. Setting CardData to empty. [2024-11-12 11:48:43,903][birdset.datamodule.embedding_datamodule][INFO] - >> Extracting Embeddings for train Split [2024-11-12 11:48:43,906][birdset.datamodule.embedding_datamodule][INFO] - >> Extracting Embeddings for test Split [2024-11-12 11:48:43,908][birdset.datamodule.embedding_datamodule][INFO] - Saving emebeddings to disk: /workspace/data_birdset/esc50/esc50_processed_embedding_model_audio_mae_True_16000_10 Saving the dataset (1/1 shards): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1600/1600 [00:00<00:00, 106791.53 examples/s] Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 400/400 [00:00<00:00, 75973.45 examples/s]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants