-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spores not serializable with spark? #36
Comments
this seems like an issue:
|
Oh I think I was definitely missing something, I think this is the intended usage? This works with spark now.
This is great and I think I'll be able to use this with spark now. |
hmm, the json that comes out has something like this in it: "className": "my.package.SomeClass$$anonfun$10$anonspore$macro$24$1" And it only works if you unpickle this where you have that function on the classpath. You can't ship it somewhere else and unpickle it. Is there any way to serialize a spore that can be run somewhere else, or is that a topic of future research? |
Yes, the className is the prefix name of the |
Hotswapping (dynamically loading new classfiles) is not future work – there's been decades of not-so-successful research involving mobile code. Security is the biggest problem on that front as I'm sure you can understand. So if you'd like to use spores, you'd have to do what Spark does and ship new jars to workers. |
Could we close this and, maybe, add this to the README project so that people know better this limitation from the beginning? |
The conclusion of this thread isn't clear to me. The jar does get shared to the workers from the driver in Spark, so spores can work. In what situations is this not the case? Not sure what the "limitation" is then. |
So, I definitely understand the reasoning on the tangent. It would require shipping class files / jars and is definitely a hassle. I'm not sure how it would be any more of a security risk than a browser accessible repl (notebooks), which is one of the primary ways people are using spark. To return to the original issue, is there a good reason why this shouldn't work?
When I found this project and the related slides, I thought this was one of the primary intended uses. I mean, spark is explicitly mentioned here as a use for spores: https://speakerdeck.com/heathermiller/spores-distributable-functions-in-scala If this is not the intended way to use spores with spark, I'm curious what the intended way is or if spores are not intended to work with spark. As I mentioned in an earlier reply, I did find a way to make use of spores, by manually packing and unpacking into json. But this seems like a lot of effort for not much reward. You get protection against closure problems within the immediate scope of the declared function. That is pretty good, and covers many cases. But this doesn't guarantee that your function is actually free of closure problems, because it could call a function which has closure problems and it won't know about it. Worse, you can get inconsistent behavior if this is going on. Try this out. In theory,
Also with this method, because I also have to cast the unpack, I'm just trading one set of potential bugs (closure serialization) for another, bad type casts. This makes me think I've got the wrong idea about this and there's got to be a better way. |
Hi @nimbusgo. No, spores is not a project made exclusively for spark support and, in turn, has more use cases--all of them related to the use of safe, reliable closures in the context of distributed systems (e.g. I'm using it in the function-passing model). While we agree that it should work, we don't provide official support here, since this is more an issue of solving compatibility problems with Spark. Nonetheless, I'd be happy to help you get it working in Spark, I agree that it's an important target. There's an ongoing PR that improves considerably the serialization/deserialization experience with scala pickling. This is the default mechanism to serialize and deserialize them and I'd encourage you to give it a try. Spark uses its own serialization infrastructure, iirc Kryo, and serializing and deserializing spores need some special information that normal reflection mechanisms lack. W.r.t. the first code snippet, it should work perfectly but, again, that depends on what Spark is doing under the hood (I'm no expert in Spark, so I can't provide you good feedback on this). If it doesn't work, please let us know why with a small toy project or detailed error information. The second example that you provide compiles and I'm going to explain why. Spores have some limitations because they use macros. By definition, macros can't inspect the environment on which they are expanded. So, in this case, there's no way that the macro is able to inspect the body of So, in conclusion, two things:
Right now, I'm focused on improving the test infrastructure and reducing the number of potential bugs that could appear. If you feel like spores is still a useful project, I'd very much like to see error reports that I could use to improve the overall quality of the project. |
this throws a java.io.NotSerializableException
sc.parallelize(1 to 5).map(spore{(x: Int) => x * 2}).collect()
I thought this was the intended usage, am I missing something?
This looks like just the library I was hoping for, but I can't seem to find any way to specify a spore that passes sparks closure serializer.
This is with spark 1.6, scala 2.11.6
The text was updated successfully, but these errors were encountered: