GitHub - ligfx/golang-distributed-filesystem: HDFS-alike in Go. Written in 2014 to learn the language and get a job.

Writing a HDFS clone in Go to learn more about Go and the nitty-gritty of distributed systems.

Features/TODO

MetaDataNode/DataNode handle uploads
MetaDataNode/DataNode handle downloads
DataNode dynamically registers with MetaDataNode
DataNode tells MetaDataNode its blocks on startup
MetaDataNode persists file->blocklist map
DataNode pipelines uploads to other DataNodes
MetaDataNode can restart and DataNode will re-register (heartbeats)
Tell DataNodes to re-register if MetaDataNode doesn't recognize them
Drop DataNodes when they go down (heartbeats)
DataNode sends size of data directory (heartbeat)
MetaDataNode obeys replication factor
MetaDataNode balances based on current reported space
MetaDataNode balances based on expected new blocks
MetaDataNode handles not enough DataNodes for replication
Have MetaDataNode manage the block size stuff (in HDFS, clients can change this per-file)
Re-replicate blocks when a DataNode disappears
Drop over-replicated blocks when a DataNode comes up
Looking at DataNode utilization should take into account the DeletionIntents and ReplicationIntents
Grace period for replicating new blocks
MetaDataNode balances blocks as it runs!
Record hash digest of blocks, reject send if hash is wrong
DataNode needs to keep track of blocks it's receiving / deleting / checking so that the integrity checker can run only on real blocks
Remove blocks if checksum doesn't match
Run a cluster in a single process for testing
Structure things better
Resiliency to weird protocol stuff (run the RPC loop manually?)
Command line parser doesn't work that well (try "main datanode -help")
Events from servers for testing
Better configuration handling (defaults)
Allow decommissioning nodes
Better logging, so warnings normally can be fatal for tests (two levels: warn that this process broke, and warn that somebody we're communicating with broke)
Don't need to wait around to delete blocks, just prevent any new reads and we'll come back to them
DataNode should do stuff on startup, and then spawn workers, not just spawn everybody (race conditions with address and data directories)
Support multiple MetaDataNodes somehow (DHT? Raft? Get rid of MetaDataNodes and use Gossip?)
Keep track of MoveIntents (subtract from predicted utilization of node), might fix the volatility when re-balancing
HashiCorp claims heartbeats are inefficient (linear work aafo number of nodes). Use Gossip?
Don't force a long-running connection for creating a file, give the client a lease and let them re-connect
If a client tries to upload a block and every DataNode in its list is down, it needs to get more from the MetaDataNode.
Keep track of blocks as we're creating a file, if the client bails before committing then delete the blocks.

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
3rdparty/github.com		3rdparty/github.com
common		common
datanode		datanode
metadatanode		metadatanode
src		src
upload		upload
utils/command		utils/command
.gitignore		.gitignore
.travis.yml		.travis.yml
Makefile		Makefile
README.md		README.md
main.go		main.go
main_test.go		main_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features/TODO

About

Releases

Packages

Contributors 4

Languages

ligfx/golang-distributed-filesystem

Folders and files

Latest commit

History

Repository files navigation

Features/TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages