Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resumable hashing #94

Open
DannyZB opened this issue Aug 21, 2019 · 4 comments
Open

Resumable hashing #94

DannyZB opened this issue Aug 21, 2019 · 4 comments
Labels

Comments

@DannyZB
Copy link

DannyZB commented Aug 21, 2019

Have any of you considered resumable hashing for rhash?

When hashing extremely large files, 20GB and up, being able to resume hashing from a previous position would help a ton.

Is this something you've considered?

@rhash
Copy link
Owner

rhash commented Sep 27, 2019

It's an interesting feature request. It can be implemented by serializing internal librhash state into a "partly hashed" file.

But for now it's a low priority FR, so not sure when I get my hands to it.

@DannyZB
Copy link
Author

DannyZB commented Sep 27, 2019

You have knowledge of the library.

Can you put 15 min and give a rundown of where that code is and what essentially should happen?

I might look into implementing it, would rather know where to look without learning the entire code base

Its very useful for download automation where you need hashing, can be split into a piped stream into rhash. Partial hashing is necessary for crashes(long downloads tend to have issues)

I.e. a way to send in the "partially hashed" file or load it after a crash.
The same code can be reused to increase stability during crashes.

When you hash a 50g file and it breaks in the middle that's a little nightmare scenario

@rhash rhash added the FR label Jul 30, 2020
@milahu
Copy link

milahu commented Oct 23, 2023

see also https://stackoverflow.com/questions/2130892/persisting-hashlib-state

you really just have to save and load

  • the file size
  • the internal state of the hasher functions

@rhash
Copy link
Owner

rhash commented Nov 5, 2023

Since bbbe1be librhash supports add rhash_import() and rhash_export() functions to save and load its internal state. Now it's not hard to support resumable hashing of single file.

Some things are not clear:

  • What RHash should do in the case, when many files or directory trees are processed? How and where to store info, what files were already hashed? Note that RHash usually outputs hashing result to STDOUT, not to a file.
  • What to do if RHash is recursively hashing a directory tree, but, after resuming, the filesystem returns files in a different order?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants