Use data structures as database #73

The STM implementation originally referenced might also need some love. It looks like some of the code in the above acid-state library is using native methods. Some of the STM code looks like it could be compatible with F#+. It's a quite big thing to implement in F#+ core, perhaps start out as a lib and then start picking code into F#+ as you go?

0 replies

wallymathieu · 2018-01-16T06:53:38Z

wallymathieu
Jan 16, 2018
Maintainer

The performance measurements might be taken with a grain of salt. Perhaps @rofr knows more about how these things performs since he has done a lot of in memory db work on .net.

0 replies

gusty · 2018-01-16T07:40:00Z

gusty
Jan 16, 2018
Maintainer

I always had the idea of including a STM in F#+ surrounding me, although I'm not sure if it will be used in real world code.
But definitely it will be a nice addition.

0 replies

ShalokShalom · 2018-01-16T08:45:20Z

ShalokShalom
Jan 16, 2018
Author

Yeah, sounds nice 😁

0 replies

wallymathieu · 2018-01-16T08:47:27Z

wallymathieu
Jan 16, 2018
Maintainer

We can look at the differences between the ancient fsharpx code and the different stm implementations in haskell. They couldn't find a maintainer for it (f#).

0 replies

rofr · 2018-01-16T09:33:52Z

rofr
Jan 16, 2018

Yes, you can easily do the equivalent of acid-state in F#! The proof of concept implementation would probably be < 50 lines of code :)

You don't actually need the STM bits to use data structures as a database. STM provides a mechanism to rollback but rollbacks are only strictly necessary to undo changes due to concurrency conflicts. In OrigoDB and memstate we persist each command object to the journal (write-ahead logging) and then apply the command to the in-memory state (usually data structures).

Origodb will rollback by discarding all the in-memory data and rebuilding from the entire log if a command throws an unexpected exception. Memstate assumes that a failing command will not corrupt the in-memory data model.

This persistence pattern goes by many names. Memory Image (Martin Fowler), system prevalence (Klaus Wuestefeld of prevayler.org), event sourcing (kind of), command sourcing, op-logging (mongodb), redis append-only file (aof).

Write performance is io bound, constrained by how fast you can log commands to durable storage. OrigoDB can write 3k commands per second using local file system. memstate does about 100 K commands per second using Event Store.

0 replies

ShalokShalom · 2018-01-16T09:37:10Z

ShalokShalom
Jan 16, 2018
Author

Yes, you can easily do the equivalent of acid-state in F#! The proof of concept implementation would probably be < 50 lines of code :)

Wonderful. ^-^

You don't actually need the STM bits to use data structures as a database

Can you link us some useful tutorials?

Thanks a lot for mentioning OrigoDB.

@gusty Are you interested yet? ;)

0 replies

ShalokShalom · 2018-01-16T09:47:50Z

ShalokShalom
Jan 16, 2018
Author

I quote Gerard here, which responded to the snippet I already posted above in Slack channel:

I am a fan of snippet as MBP has given me latency issues in the past and in .net, Interlocked methods are usually always the fastest safe way to control shared state. This would only be one piece of the job, then need to build in the disk IO persistence (serialisation, file structure & state builder), that's why I was saying if you look at some of the posts on f# + event store, will bring you most of the way there (combine it's persistence api for local purposes).
It is a nice idea, have the types all seem like they are in memory but being persisted in background.
One interesting reason why snapshots of state are not taken and instead event/log messages are logged, is due to the fire & forget writing of the logging vs needing to wait for the IO to finish for state snapshot. This is kinda how SQL server etc even work.

0 replies

rofr · 2018-01-16T09:56:14Z

rofr
Jan 16, 2018

Here is a POC implementation in java and a small discussion in the comments. https://gist.github.com/klauswuestefeld/1103582

Regarding the quote from Gerhard: If you don't wait for I/O to complete, durability is not guaranteed. Doesn't matter if it's a snapshot or a log entry.

0 replies

gerardtoconnor · 2018-01-16T10:44:05Z

gerardtoconnor
Jan 16, 2018

Regarding the quote from Gerhard: If you don't wait for I/O to complete, durability is not guaranteed. Doesn't matter if it's a snapshot or a log entry.

Although that is somewhat true, the fact is, unfortunately, IO is extremely slow, so to wait for IO completion before every new state commit is not practical for any decent throughput. To overcome this, there are ways/tricks to fall back and recover from late bound IO failure, given it is a rare occurrence that only happens when there is disk fault or out of memory. Things like circular buffers (in memory) can hold the last 50 mutation messages such that if an IO failure comes back, you can freeze the incoming commits and re-run from the failed commit form the circular buffer or re-run from the logs.

You can wait on the logs/state to come back confirmed but the speed will be terrible and create a massive bottleneck, it's the compromise that needs to be made to get the best of both worlds, have a smart fall-back mechanism to make up for the possibility of IO errors coming back after a few more state mutations have occurred but they are rare given threads are writing to new memory all the time, not modifying existing files. If throughput is not a feature then by all means await every IO write.

The acid/atomic nature of the transactions is usually more important then the insurance of every single transaction being written, ie, missing the last few is acceptable provided everything is written in correct order up to the point of the initial failure.

A few keys to getting good write performance is using specialised OS memory dump Apis, batch save operations if possible (maybe 5 message at a time?) and tricks that you can look up on blogs for Sql Server, Lucene, Event Store and other persistence systems... If throughput is not an issue than, once again, this would all be over baking it, you can just wait all IO.

0 replies

ShalokShalom · 2018-01-16T12:54:33Z

ShalokShalom
Jan 16, 2018
Author

How is it about graph based?

0 replies

rofr · 2018-01-16T13:26:19Z

rofr
Jan 16, 2018

Well as you know your database theory, you know that it is impossible to have all 4 instantantaniously so each system

Not impossible in theory that I am aware of, are you perhaps confusing with CAP? But in practice every traditional (b-tree, disk-based) RDBMS implementation that I've worked with sacrifices isolation (I) for performance. Default isolation level for sql server is READ_COMMITED, if you crank it up to SERIALIZABLE you get perfect isolation while throughput drops significantly.

@gerardtoconnor your default mode makes total sense in the kind of high throughput architecture that you mentioned. OrigoDB and memstate both target complex domains where the contention of reads and writes in the RDBMS and the complexity of moving data back and forth between disk and memory hurts performance, correctness and developer productivity.

@ShalokShalom can you elaborate on "graph based"?

0 replies

voronoipotato · 2018-01-16T14:54:20Z

voronoipotato
Jan 16, 2018

I thought it was interesting to read.
"
Acid-state does not write your data types to disk every time you change it. It instead keeps a history of all the functions (along with their arguments) that have modified the state. Thus, recreating the state after an unforeseen error is a simple as rerunning the functions in the history log.
"

In a way this is like an event store architecture, but storing the actual functions instead of the domain events.

0 replies

gerardtoconnor · 2018-01-16T15:10:11Z

gerardtoconnor
Jan 16, 2018

@rofr Yeah, the impossibility more relevant to CAP but that's why I said instantaneous, to the cpu cycle, just to highlight that there is usually a tiny bit of flexibility on timing as long as it's controlled, to be considered ACID. Consistency is another flexible factor on perf like Casandra vs Sql Server. Luckily I guess, given this is a local store, not distributed, it's a far simpler problem then full-blown distributed systems.

@voronoipotato This was in line with my comments, I think it's worth pointing out that persisting a function instance to IO is not really possible/practical, that's why in similar systems, there are messages that map to functions, via DU or some other mapping technique. In F# functions are abstract classes with Invoke methods that can be represented easily in memory but persisting needs to be mapped some way. It's generally the same thing though, record what function & variables to apply to rebuild the state.

0 replies

rofr · 2018-01-16T20:12:05Z

rofr
Jan 16, 2018

@ShalokShalom I'd be happy to help out a bit and learn some more f# and functional patterns

0 replies

ShalokShalom · 2018-01-17T10:28:10Z

ShalokShalom
Jan 17, 2018
Author

With Graph-based, I mean something like GunDB and Neo4j. Gun is by the way available as plugin for Cassandra, so available on F-Sharp.

My own experience is very limited, I am a complete beginner so all I know is theory. :)

So forgive me this potentially stupid question: Would type providers help to interact with such a database in the code?

So far as I understand offers this technique something relevant in that perspective, especially when I look on this paper of Don:

0 replies

rofr · 2018-01-17T10:51:29Z

rofr
Jan 17, 2018

An external database such as neo4j or gundb doesn't really make sense In the context of this thread (Use data structures as database) because you're data lives in RAM in the same process as the code that operates on it.

But how you model the data is entirely up to you, if you want a graph representation you could use a library such as https://github.com/Rickasaurus/Edgy or https://github.com/CSBiology/FSharp.FGL

Does that make sense?

0 replies

ShalokShalom · 2018-01-17T12:01:45Z

ShalokShalom
Jan 17, 2018
Author

Yes, I mean the model of the data.

Thanks a lot for the links, while Edgy seems unmaintained and FSharpFGL got at least no commit since half a year. Anyway, thanks for linking me these ones. :)

Graphs just make sense, since that is how our brains work and I think nature has thought about such concepts for a long. ;)

I thought it might be possible to use such a graph based approach for our project idea here?
I guess that plays a role for the implementation?

0 replies

wallymathieu · 2018-02-04T12:36:38Z

wallymathieu
Feb 4, 2018
Maintainer

Well, it's easier in f# to get something that is more done, than compared to c#, so it might be that the above libraries are mature enough.

0 replies

ShalokShalom · 2018-02-04T19:05:57Z

ShalokShalom
Feb 4, 2018
Author

@wallymathieu Well, the question is how long it works. Of course, they may be mature enough.

What happens, if they become incompatible?

0 replies

wallymathieu · 2018-02-04T19:19:21Z

wallymathieu
Feb 4, 2018
Maintainer

How do you mean?

0 replies

ShalokShalom · 2018-02-04T20:10:43Z

ShalokShalom
Feb 4, 2018
Author

Do you think that they will work in a few years?

The thing is: If I go to invest some time into studying this software, so I hope that I am still able to use it in some years.

People which I trust and who are quite experienced with software would look at me with a questioning glance, if I tell them that I use a software which is untouched since couple of years.

0 replies

wallymathieu · 2018-02-04T20:24:17Z

wallymathieu
Feb 4, 2018
Maintainer

That's always something that you have to deal with as a developer I guess.

Many of the unix tools on mac os x and base tools on windows might not have been touched for decades. Some of the GNU versions of the unix tools are maintained, but see very few commits per year.

0 replies

ShalokShalom · 2018-02-04T21:11:54Z

ShalokShalom
Feb 4, 2018
Author

Well, so long as they are maintained, is it fine.

0 replies

ShalokShalom · 2019-06-09T05:54:09Z

ShalokShalom
Jun 9, 2019
Author

Hi 🤗

Would this be able to substitute a database in a PWA?

I really like to avoid JavaScript here.

0 replies

Use data structures as database #73

Replies: 35 comments

wallymathieu Jan 15, 2018 Maintainer

ShalokShalom Jan 15, 2018 Author

ShalokShalom Jan 15, 2018 Author

ShalokShalom Jan 15, 2018 Author

wallymathieu Jan 16, 2018 Maintainer

wallymathieu Jan 16, 2018 Maintainer

wallymathieu Jan 16, 2018 Maintainer

gusty Jan 16, 2018 Maintainer

ShalokShalom Jan 16, 2018 Author

wallymathieu Jan 16, 2018 Maintainer

ShalokShalom Jan 16, 2018 Author

ShalokShalom Jan 16, 2018 Author

ShalokShalom Jan 16, 2018 Author

ShalokShalom Jan 17, 2018 Author

ShalokShalom Jan 17, 2018 Author

wallymathieu Feb 4, 2018 Maintainer

ShalokShalom Feb 4, 2018 Author

wallymathieu Feb 4, 2018 Maintainer

ShalokShalom Feb 4, 2018 Author

wallymathieu Feb 4, 2018 Maintainer

ShalokShalom Feb 4, 2018 Author

ShalokShalom Jun 9, 2019 Author

wallymathieu
Jan 15, 2018
Maintainer

ShalokShalom
Jan 15, 2018
Author

ShalokShalom
Jan 15, 2018
Author

ShalokShalom
Jan 15, 2018
Author

wallymathieu
Jan 16, 2018
Maintainer

wallymathieu
Jan 16, 2018
Maintainer

wallymathieu
Jan 16, 2018
Maintainer

gusty
Jan 16, 2018
Maintainer

ShalokShalom
Jan 16, 2018
Author

wallymathieu
Jan 16, 2018
Maintainer

ShalokShalom
Jan 16, 2018
Author

ShalokShalom
Jan 16, 2018
Author

ShalokShalom
Jan 16, 2018
Author

ShalokShalom
Jan 17, 2018
Author

ShalokShalom
Jan 17, 2018
Author

wallymathieu
Feb 4, 2018
Maintainer

ShalokShalom
Feb 4, 2018
Author

wallymathieu
Feb 4, 2018
Maintainer

ShalokShalom
Feb 4, 2018
Author

wallymathieu
Feb 4, 2018
Maintainer

ShalokShalom
Feb 4, 2018
Author

ShalokShalom
Jun 9, 2019
Author