Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify model zoo implementation of SQLFlow and ElasticDL #22

Open
typhoonzero opened this issue Oct 11, 2019 · 7 comments
Open

Unify model zoo implementation of SQLFlow and ElasticDL #22

typhoonzero opened this issue Oct 11, 2019 · 7 comments

Comments

@typhoonzero
Copy link
Collaborator

typhoonzero commented Oct 11, 2019

Related design: sql-machine-learning/sqlflow#949

We should implement the model zoo using the same scheme in SQLFlow and ElasticDL so that all models can be applied to SQLFlow and ElasticDL. Below are things to do:

  1. Unify the name to define loss/optimizer/metrics, e.g. SQLFlow use default_optimizer, ElasticDL use optimizer
  2. Whether to support custom train loop in ElasticDL, for SQLFlow, can support: https://github.com/sql-machine-learning/models/blob/develop/sqlflow_models/deep_embedding_cluster.py#L194
  3. Whether to support Keras functional API in SQLFlow, ElasticDL can support this kind of model now: https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/cifar10_functional_api/cifar10_functional_api.py
  4. SQLFlow use codegen for generate a dataset_fn function, how to use the dataset_fun defined in model zoo: https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/cifar10_functional_api/cifar10_functional_api.py#L117
  5. Unify PredictionOutputsProcessor: https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/cifar10_functional_api/cifar10_functional_api.py#L153, https://github.com/sql-machine-learning/models/blob/develop/sqlflow_models/dnnclassifier.py#L40
  6. support custom metrics in SQLFlow (eval_metrics_fn function in the model definition).
@terrytangyuan
Copy link
Member

terrytangyuan commented Oct 11, 2019

  1. Where is the naming convention for default_optimizer from? If necessary, we can change this in ElasticDL. ElasticDL also supports passing different names for optimizer/loss/etc.
  2. We need to provide abstractions that's safe enough in order to support custom training loop. But we definitely want to support other ML tasks such as unsupervised learning, RL, GAN, etc. in the future. How's sqlflow_train_loop() in the example you referenced being used?
  3. I'd say yes.
  4. The ElasticDL codegen currently generates the necessary dataset_fn for a given table in the database automatically already. If it's already defined in the model zoo, ElasticDL can override it by using the codegen-generated dataset_fn instead.
  5. How is prepare_prediction_column() being used? PredictionOutputsProcessor is a super set of prepare_prediction_column() so it should not be too hard to support SQLFlow's use case.

@typhoonzero
Copy link
Collaborator Author

@terrytangyuan

Where is the naming convention for default_optimizer from? If necessary, we can change this in ElasticDL. ElasticDL also supports passing different names for optimizer/loss/etc.

Either to change SQLFlow or ElasticDL to use the same name should be fine.

We need to provide abstractions that's safe enough in order to support custom training loop. But we definitely want to support other ML tasks such as unsupervised learning, RL, GAN, etc. in the future. How's sqlflow_train_loop() in the example you referenced being used?

sqlflow_train_loop() is called if this function exists in the model definition class when SQLFlow doing codegen.

How is prepare_prediction_column() being used? PredictionOutputsProcessor is a super set of prepare_prediction_column() so it should not be too hard to support SQLFlow's use case.

You're right.

@typhoonzero
Copy link
Collaborator Author

ps: also need to support eval_metrics_fn

@typhoonzero
Copy link
Collaborator Author

By changing the default function name to define optimizer and loss from default_optimizer and default_loss to optimizer and loss will cause an error when predicting:

Traceback (most recent call last):
  File "<stdin>", line 90, in <module>
  File "/miniconda/envs/sqlflow-dev/lib/python3.6/site-packages/sqlflow_submitter/tensorflow/predict.py", line 112, in pred
    classifier.predict_on_batch(one_batch[0])
  File "/miniconda/envs/sqlflow-dev/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1024, in predict_on_batch
    x, extract_tensors_from_dataset=True)
  File "/miniconda/envs/sqlflow-dev/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 2385, in _standardize_user_data
    metrics=self._compile_metrics,
AttributeError: 'DNNClassifier' object has no attribute '_compile_metrics'

Change the name back and the test passes, It seems we can not use the name optimizer and loss in the Keras model definition.

To support both Keras functional API and subclass API, I propose to use the unified scheme in PR: #25, the SQL statement SELECT * FROM iris.train TO TRAIN sqlflow_models.dnnclassifier ... uses the python module name dnnclassifier as the model's name, and will call sqlflow_models.dnnclassifier.get_model to create the model instance. In this case, we can satisfy:

  1. SQLFlow use Keras subclass models and functional API models
  2. ElasticDL models can be run directly in SQLFlow
  3. The model zoo implementation can keep the same user experience.

@terrytangyuan
Copy link
Member

@typhoonzero Why is get_model() needed? In ElasticDL, we can support both subclass and functional Keras models without this additional method.

@typhoonzero
Copy link
Collaborator Author

Thanks for reminding, I'll check out how ElasticDL works today.

@typhoonzero
Copy link
Collaborator Author

typhoonzero commented Nov 12, 2019

Update: Will remove get_model(). I can expose the subclass name and function name can also work if we can check the name after TO TRAIN is a class or a function.

ps: we can get the python module object from the class by: https://stackoverflow.com/questions/7027848/getting-corresponding-module-from-function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants