Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear how to use the code #5

Open
r-barnes opened this issue Sep 29, 2018 · 22 comments
Open

Unclear how to use the code #5

r-barnes opened this issue Sep 29, 2018 · 22 comments

Comments

@r-barnes
Copy link

Despite looking at the recommended repo, it's still a little unclear how to use this.

An example include an appropriately-formatted input file and a couple of example images would go along way towards making this useful to others.

@JohannesBrand
Copy link

JohannesBrand commented Sep 29, 2018

Hi @r-barnes ,

I was able to run phase 1 using

python eval.py preds.txt --num_threads 4 --architecture vgg --log_dir path/to/downloaded/phase1/weights --path_prefix path/to/preprocessed/images --data_info data_info.txt

I had to fix a few small issues, though and used python 3 instead of python 2.

Furthermore, I had to resize the images first using the provided resize.py script.

In data_info.txt you have to list your pre-processed images as described in the recommended repo.

@arashno
I assume in the output [0, 1] means empty while [1, 0], means animal and accordingly label 0 means empty and label 1 means animal?

@r-barnes
Copy link
Author

r-barnes commented Sep 30, 2018

Thanks @JohannesBrand : I'll give that a try, though I still think the documentation on this project should be expanded.

@jianqiu-xu
Copy link

jianqiu-xu commented Nov 5, 2018

Despite looking at the recommended repo, it's still a little unclear how to use this.

An example include an appropriately-formatted input file and a couple of example images would go along way towards making this useful to others.

@r-barnes I also have problems running pre-trained models with images I got. I wonder if you have figured out any clear ways to input the images. Thanks!

@arashno
Copy link
Collaborator

arashno commented Nov 8, 2018

Hi All,
Sorry about my late reply. I was very busy.
I would be happy to improve the documentation.
Could you please tell me what part of the documentation is unclear to you?
Thanks

@arashno
Copy link
Collaborator

arashno commented Nov 8, 2018

Hi @r-barnes ,

I was able to run phase 1 using

python eval.py preds.txt --num_threads 4 --architecture vgg --log_dir path/to/downloaded/phase1/weights --path_prefix path/to/preprocessed/images --data_info data_info.txt

I had to fix a few small issues, though and used python 3 instead of python 2.

Furthermore, I had to resize the images first using the provided resize.py script.

In data_info.txt you have to list your pre-processed images as described in the recommended repo.

@arashno
I assume in the output [0, 1] means empty while [1, 0], means animal and accordingly label 0 means empty and label 1 means animal?

Yes, 0 means empty and 1 means animal.

@fischhoff
Copy link

Thank you for sharing this repo. We would like to use the phase1 model to make predictions of animal vs. no animal in new images. Initially we intend to make predictions without fine-tuning, so our input is images without labels. Therefore the recommended repo (https://github.com/arashno/tensorflow_multigpu_imagenet) does not seem to fit our application. In the recommended repo, data_info.txt includes labels for each image, whereas in our case we do not have labels but are rather interested in predicting the labels using the phase1 model. We have loaded the phase1 model using the code below, but we are new to tensorflow and do not know how to use the model to make predictions on new images. Any advice (especially additional code to make predictions) would be much appreciated! Thanks!

import os
cur_dir = "C:/etc/phase1/"

#script_dir = os.path.dirname(file) #<-- absolute dir the script is in
rel_path_meta = "snapshot-55.meta"
abs_file_path_meta = os.path.join(cur_dir, rel_path_meta)
#abs_file_path_meta = os.path.join(script_dir, rel_path_meta)

print(abs_file_path_meta)
import tensorflow as tf

config = tf.ConfigProto(allow_soft_placement=True)
with tf.Session(config=config) as sess:
#with tf.Session() as sess:
new_saver = tf.train.import_meta_graph(abs_file_path_meta)
new_saver.restore(sess, tf.train.latest_checkpoint(cur_dir))

@arashno
Copy link
Collaborator

arashno commented Jan 15, 2019

There are two solutions:

1- in this repo (Evolving-AI-Lab/deep_learning_for_camera_trap_images), provide fake labels (for example, all empty or all full or even random labels) and then run the evaluation (i.e. python eval.py ...) of the phase 1 model over the provided labels. Then, in the output file, disregard the fake labels and take out the model predictions only.

2- The recommended repo (arashno/tensorflow_multigpu_imagenet) now support "inference" (prediction), you will need to run a command like this:

python run.py inference preds.txt --log_dir path/to/downloaded/phase1/weights --path_prefix path/to/preprocessed/images --data_info data_info.txt ...

Please let me know if any part of the explanation is unclear or you have any trouble.

@fischhoff
Copy link

Thanks for the helpful reply, @arashno! We tried solution 1. We get a syntax error in eval.py. I checked that we are able to import datetime in python in the active environment, so that does not seem to be the problem. I guess I may be missing something that will seem obvious once you've pointed it out! Thanks again for troubleshooting.

In C:/Users/etc/Documents/R/bats/phase1, we are not sure whether we have the weights. We have checkpoint, snapshot-55.data-00000-of-00001, snapshot-55.index, and snapshot-55.meta.

Our data_info.txt reads:
C:/Users/etc/Documents/R/bats/jpg/Bat_licking_DPS - Copy.mov.jpg 1
C:/Users/etc/Documents/R/bats/jpg/Bat_licking_DPS.mov.jpg 1

Here is the output we get:
(r-reticulate) C:\Users\etc\Documents\R\bats>python eval.py preds.txt --num_threads 4 --architecture vgg --log_dir C:/Users/etc/Documents/R/bats/phase1 --path_prefix C:/Users/etc/Documents/R/bats/jpg --data_info data_info.txt
File "eval.py", line 7

^
SyntaxError: invalid syntax

@arashno
Copy link
Collaborator

arashno commented Jan 16, 2019

snapshot-55.data-00000-of-00001 contains the weights.

Your data_info should be like this:

Bat_licking_DPS - Copy.mov.jpg 1
Bat_licking_DPS.mov.jpg 1

The code will add the value of --path_prefix argument to the path of all images.

I am confused, you mentioned that you were able to fix the syntax error, so what error are you getting now? Line 7 means importing the datatime module.

@fischhoff
Copy link

Hi @arashno -- Thanks for this explanation and guidance.

The invalid syntax error occurred because we had downloaded html file rather than eval.py file. We have solved this issue.

Now we are getting a different error:

(r-reticulate) C:\Users\Documents\R\bats>python eval.py preds.txt --num_threads 4 --architecture vgg --log_dir C:/Users//Documents/R/bats/phase1 --path_prefix C:/Users//Documents/R/bats/jpg --data_info data_info.txt
Namespace(architecture='vgg', batch_size=512, crop_size=[224, 224], data_info='data_info.txt', delimiter=',', depth=50, load_size=[256, 256], log_dir='C:/Users//Documents/R/bats/phase1', num_batches=1, num_channels=3, num_classes=2, num_samples=2, num_threads=4, path_prefix='C:/Users//Documents/R/bats/jpg', save_predictions='preds.txt', top_n=2)
Traceback (most recent call last):
File "eval.py", line 127, in
main()
File "eval.py", line 123, in main
evaluate(args)
File "eval.py", line 25, in evaluate
images, labels, urls = data_loader.read_inputs(False, args)
File "C:\Users\Documents\R\bats\data_loader.py", line 24, in read_inputs
filepaths, labels = _read_label_file(args.data_info, args.delimiter)
File "C:\Users\Documents\R\bats\data_loader.py", line 19, in _read_label_file
labels.append(int(tokens[1]))
IndexError: list index out of range

Having looked at read_label_file in data_loader, it’s not clear what this error is about.

Again, thanks a ton for your help! We appreciate any further advice.

@arashno
Copy link
Collaborator

arashno commented Jan 17, 2019

It seems to be a delimiter problem.
You set the delimiter to the comma (,), but in your input file, you have used space as the delimiter.
Your data_info should look like this:

Bat_licking_DPS - Copy.mov.jpg,1
Bat_licking_DPS.mov.jpg,1

@fischhoff
Copy link

fischhoff commented Jan 18, 2019

Hi @arashno, thank you for pointing this out! We changed data_info.txt as you recommended. We really appreciate your help.

We are now getting a different error that we again can’t figure out:

(r-reticulate) C:\Users\Documents\R\bats>python eval.py preds.txt --num_threads 4 --architecture vgg --log_dir C:/Users//Documents/R/bats/phase1 --path_prefix C:/Users//Documents/R/bats/jpg --data_info data_info.txt
Namespace(architecture='vgg', batch_size=512, crop_size=[224, 224], data_info='data_info.txt', delimiter=',', depth=50, load_size=[256, 256], log_dir='C:/Users//Documents/R/bats/phase1', num_batches=1, num_channels=3, num_classes=2, num_samples=2, num_threads=4, path_prefix='C:/Users//Documents/R/bats/jpg', save_predictions='preds.txt', top_n=2)
WARNING:tensorflow:From C:\Users\Documents\R\bats\data_loader.py:32: slice_input_producer (from tensorflow.python.training.input) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.from_tensor_slices(tuple(tensor_list)).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs). If shuffle=False, omit the .shuffle(...).
WARNING:tensorflow:From C:\Users\AppData\Local\conda\conda\envs\r-reticulate\lib\site-packages\tensorflow\python\training\input.py:372: range_input_producer (from tensorflow.python.training.input) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.range(limit).shuffle(limit).repeat(num_epochs). If shuffle=False, omit the .shuffle(...).
WARNING:tensorflow:From C:\Users\AppData\Local\conda\conda\envs\r-reticulate\lib\site-packages\tensorflow\python\training\input.py:318: input_producer (from tensorflow.python.training.input) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.from_tensor_slices(input_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs). If shuffle=False, omit the .shuffle(...).
WARNING:tensorflow:From C:\Users\AppData\Local\conda\conda\envs\r-reticulate\lib\site-packages\tensorflow\python\training\input.py:188: limit_epochs (from tensorflow.python.training.input) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.from_tensors(tensor).repeat(num_epochs).
WARNING:tensorflow:From C:\Users\AppData\Local\conda\conda\envs\r-reticulate\lib\site-packages\tensorflow\python\training\input.py:197: QueueRunner.init (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the tf.data module.
WARNING:tensorflow:From C:\Users\AppData\Local\conda\conda\envs\r-reticulate\lib\site-packages\tensorflow\python\training\input.py:197: add_queue_runner (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the tf.data module.
Filling queue with 2000 images before starting to train. This may take some times.
WARNING:tensorflow:From C:\Users\Documents\R\bats\data_loader.py:65: batch (from tensorflow.python.training.input) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.batch(batch_size) (or padded_batch(...) if dynamic_pad=True).
Traceback (most recent call last):
File "eval.py", line 127, in
main()
File "eval.py", line 123, in main
evaluate(args)
File "eval.py", line 30, in evaluate
logits = arch.get_model(images, 0.0, False, args)
File "C:\Users\Documents\R\bats\arch.py", line 16, in get_model
return architectures.vgg.inference(inputs, args.num_classes, wd, 0.5 if is_training else 1.0, is_training)
File "C:\Users\Documents\R\bats\architectures\vgg.py", line 32, in inference
network = common.batchNormalization(network, is_training= is_training)
File "C:\Users\Documents\R\bats\common.py", line 63, in batchNormalization
return tf.cond(is_training, lambda: tf.nn.batch_normalization(x, mean, variance, beta, gamma, epsilon), lambda: tf.nn.batch_normalization(x, moving_mean, moving_variance, beta, gamma, epsilon))
File "C:\Users\AppData\Local\conda\conda\envs\r-reticulate\lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "C:\Users\AppData\Local\conda\conda\envs\r-reticulate\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2073, in cond
raise TypeError("pred must not be a Python bool")
TypeError: pred must not be a Python bool

We found this site (https://blog.csdn.net/Felaim/article/details/84098986) that (according to collaborator who reads Chinese) suggests a solution would involve adding a line to common.py:

is_training = tf.cast(True, tf.bool)

But we don’t know where to try adding this.

Thanks for taking a look at this and any advice on a solution!

@arashno
Copy link
Collaborator

arashno commented Jan 18, 2019

It seems that there is a version incompatibility.
Which repository are you using? (this one or the recommended repo or a mix of them?)

What is your Tensorflow version?

@fischhoff
Copy link

fischhoff commented Jan 18, 2019

I was using a mix of the two repos. That makes sense that version incompatibility would result -- my mistake.

Using only this repo, I get this error:

(r-reticulate) C:\Users\Documents\R\bats\deep_learning_for_camera_trap_images-master>python eval.py preds.txt --num_threads 4 --architecture vgg --log_dir C:/Users//Documents/R/bats/deep_learning_for_camera_trap_images-master/phase1 --path_prefix C:/Users//Documents/R/bats/deep_learning_for_camera_trap_images-master/jpg --data_info data_info.txt
Traceback (most recent call last):
File "eval.py", line 15, in
import arch
File "C:\Users\fischhoffi\Documents\R\bats\deep_learning_for_camera_trap_images-master\arch.py", line 1, in
import architectures.alexnet
File "C:\Users\fischhoffi\Documents\R\bats\deep_learning_for_camera_trap_images-master\architectures\alexnet.py", line 2, in
import common
ModuleNotFoundError: No module named 'common'

Here are the Tensorflow versions and other packages in the environment:

conda list

packages in environment at C:\Users\fischhoffi\AppData\Local\conda\conda\envs\r-reticulate:

Name Version Build Channel

_tflow_select 2.1.0 gpu anaconda
absl-py 0.6.1 py36_1000 conda-forge
arch 4.7.0 py36h4a00616_0 bashtage
astor 0.7.1 py_0 conda-forge
blas 1.0 mkl
ca-certificates 2018.03.07 0 anaconda
certifi 2018.10.15 py36_0 anaconda
cudatoolkit 9.0 1 anaconda
cudnn 7.1.4 cuda9.0_0 anaconda
cython 0.29.2 py36ha925a31_0
gast 0.2.0 py_0 conda-forge
grpcio 1.16.1 py36h351948d_1 anaconda
h5py 2.8.0 py36hf7173ca_2 anaconda
hdf5 1.8.20 hac2f561_1 anaconda
icc_rt 2019.0.0 h0cc432a_1
icu 58.2 ha66f8fd_1
intel-openmp 2019.1 144
jpeg 9c hfa6e2cd_1001 conda-forge
keras-applications 1.0.6 py36_0 anaconda
keras-preprocessing 1.0.5 py36_0 anaconda
libopencv 3.4.2 h20b85fd_0 anaconda
libpng 1.6.36 h7602738_1000 conda-forge
libprotobuf 3.6.1 h1a1b453_1000 conda-forge
libtiff 4.0.10 h36446d0_1001 conda-forge
libwebp 1.0.1 hfa6e2cd_1000 conda-forge
m2w64-gcc-libgfortran 5.3.0 6
m2w64-gcc-libs 5.3.0 7
m2w64-gcc-libs-core 5.3.0 7
m2w64-gmp 6.1.0 2
m2w64-libwinpthread-git 5.0.0.4634.697f757 2
markdown 2.6.11 py_0 conda-forge
mkl 2019.1 144
mkl_fft 1.0.10 py36_0 conda-forge
mkl_random 1.0.2 py36_0 conda-forge
msgpack-python 0.6.0 py36he980bc4_1000 conda-forge
msys2-conda-epoch 20160418 1
numpy 1.15.4 py36h19fb1c0_0
numpy-base 1.15.4 py36hc3f5095_0
opencv 3.4.2 py36h40b0b35_0 anaconda
openssl 1.1.1 he774522_0 anaconda
pandas 0.23.4 py36h830ac7b_0
patsy 0.5.1 py36_0
pip 18.1 py36_1000 conda-forge
protobuf 3.6.1 py36he025d50_1001 conda-forge
py-opencv 3.4.2 py36hc319ecb_0 anaconda
python 3.6.6 he025d50_0 conda-forge
python-dateutil 2.7.5 py36_0
python-editor 1.0.3 py36_0 anaconda
pytz 2018.7 py36_0
qt 5.9.7 vc14h73c81de_0
scipy 1.1.0 py36h4f6bf74_1 anaconda
setuptools 40.6.3 py36_0 conda-forge
six 1.12.0 py36_1000 conda-forge
sqlite 3.26.0 he774522_0
statsmodels 0.9.0 py36h452e1ab_0
tensorboard 1.12.0 py36he025d50_0 anaconda
tensorflow 1.12.0 gpu_py36ha5f9131_0 anaconda
tensorflow-base 1.12.0 gpu_py36h6e53903_0 anaconda
tensorflow-gpu 1.12.0 h0d30ee6_0 anaconda
termcolor 1.1.0 py_2 conda-forge
vc 14.1 h21ff451_3 anaconda
vs2015_runtime 15.5.2 3 anaconda
werkzeug 0.14.1 py_0 conda-forge
wheel 0.32.3 py36_0 conda-forge
wincertstore 0.2 py36_1002 conda-forge
zlib 1.2.11 h2fa13f4_1003 conda-forge

Would you recommend using this repo or the recommended repo? Thanks again!

@arashno
Copy link
Collaborator

arashno commented Jan 18, 2019

Although the other repository is compatible with Python 3, this repository only works with Python 2.7.
The import error is because you are using Python 3.6.

@matobler
Copy link

I just spent a day figuring out how to run the pre-trained models. Here a few things that I learned that might be useful for others:

  1. I am working on Windows in Python 3.6 (also tested 3.7). Both versions work but the xrange() function in eval.py needs to be changed to range()

  2. The code works with Tensorflow version 1.8 and 1.9. It also works with 1.12 but there are a lot of warnings since the data structure has changes. Have not tested 1.10 and 1.11.

  3. For Phase 2 and Phase 2 Recognition Only the common.py file needs to be copied from the architecture folder to the main folder where eval.py is, else you get a "ModuleNotFoundError: No module named 'common'" error.

  4. For For Phase 2 and Phase 2 Recognition Only the --depth parameter needs to be set to 152 (for the Resnet 152 model). The default value is 50.

  5. For Phase 1 values in the second column of the image file (data_info.txt) need to be either 0 or 1

  6. On my notebook with a Quadro M2000 with 4GB of RAM I ran out of GPU memory. The models worked fine on a GTX 1080 TI with 11GB or RAM. I tried smaller batch sizes but that did not help.

While Phase 1 and Phase 2 Recognition Only work fine I still have not been able to run Phase 2. Will write another post with the errors I am getting.

It would be nice if the authors could provide a small test dataset with all the input files and commands to run each phase. Would probably save a lot of people a lot of time. That said, thanks for making the code and pre-trained models available!

@matobler
Copy link

matobler commented Jan 20, 2019

For Phase 2 I created a data_info.txt file with the image name plus 9 extra columns with all 0:
image1.jpg,0,0,0,0,0,0,0,0,0
without that I would get an error from the data_loader. Now I am getting the error below. Any suggestions are welcome.

Filling queue with 2000 images before starting to train. This may take some times.
Traceback (most recent call last):
  File "eval.py", line 204, in <module>
    main()
  File "eval.py", line 200, in main
    evaluate(args)
  File "eval.py", line 33, in evaluate
    top1acc= [None]*len(logits)
TypeError: object of type 'NoneType' has no len()

@fischhoff
Copy link

Although the other repository is compatible with Python 3, this repository only works with Python 2.7.
The import error is because you are using Python 3.6.

Thanks for letting us know @arashno!

@Mo-nasr
Copy link

Mo-nasr commented Apr 12, 2020

hi @arashno @matobler, i am new to github so i was wondering if it's possible to run the pre trained model of phase 2 on google colab? if yes how can i do it? any help from anyone would be really appreciated.

@r-barnes
Copy link
Author

r-barnes commented Jan 16, 2022

@Mo-nasr : That might do better as a separate question/issue.

@r-barnes
Copy link
Author

I agree with @matobler :

It would be nice if the authors could provide a small test dataset with all the input files and commands to run each phase. Would probably save a lot of people a lot of time.

@AlexSperka
Copy link

Have been running into similar issues as mentioned by previous people. Took the fork of Mo-nasr and adopted it, thanks for that!

Link to my fork

Follow the updated read-me to get it running. New features:

  • Using docker-compose to spin up a container
  • Using python 3.9
  • Converted code from fork using tensorflow 1.14 into tensorflow 2.9 .1
  • Super basic API feature to get an image classified using img_path (that is in the same directory right now)

I will try to clean this up and convert more and more eventually. Right now, I am not seeing very good classification results though, the only thing that is classified correctly are elephants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants