Limit Size of Subversion Commits With this Hook

We have experienced some abuse of our subversion repository at work recently. Someone committed 400 MB of data all at once including many object files, libraries, and executables. I did not get very harsh with the person who did this. Because a) I have no objection to binaries in subversion in the first place, b) I don't really know what he's working on, c) disk space is cheap and we are no where near capacity, and d) his commit was still smaller than a few commits we had a long time ago (which were legit). Still, if you just allow people to commit whatever they want to your subversion repository, in the worst case, you could run out of disk space, necessitating an svn dump-and-load onto a new larger drive (pain). It would suck to have to do that just because some people were committing large binaries (without any legitimate reason to). There are other annoying consequences. Our tarball backups of svn currently fit on a DVD, which is cheap and easy, if we allowed this abuse to continue it would complicate our backup process.

What I wanted was a way to limit the commit size for certain users automatically. There did not seem to be any hooks out there to do this, so I wrote one.

Just paste the following into your pre-commit hook:

 

and paste the following into a check_txn_size.py file in your hooks directory and make it executable.

 

Comments

Hi,

Nice hook, but it seems that the size find in getTransactionSize, or maybe size of files themselves, is not equal to size of the file commited. It seems always smaller. Maybe is it (g)ziped ? Do you have any information about that?

thanks,

Gérald

Yes, it's measuring the size of the actual transaction. The transaction, if I remember correctly, is basically what will become the next revision in the db. It's called a transaction because it hasn't become part of the db yet. If you add 10kB of text to a text file that is 10MB and commit the change, the transaction size should be 10kB, not 10MB + 10 kB.

Additionally, if you are adding files, yes I think the transaction itself is compressed. Try putting a pause in the hook or something and look at the transaction and see for yourself. :-)

Thanks for info. Yes, I've seen what you describe. It's the size of transaction. Good idea to put a pause in the hook! :) I'll try.

We are using SVN 1.5.0. I don't see the .../rev files being created (anymore). I don't know where this information goes (now). Did something change with going to 1.5.0 or so? Assistance would be greatly appreciated!

We're running Redhat Enterprise Linux 5, Subversion 1.5.0.

Error: exceptions.OSError: [Errno 2] No such file or directory: '/usr/.../svnroot/sandbox/db/transactions/1256-f.txn/rev'

Patches welcome :-)

> Patches welcome :-)

Well yes, that is fair enough. I'm just a bit stuck, though. A little googling did not bring me information about the repository structure. And while I was trying a certain test commit, I was not able to recognise my changes in the transaction/ directory, so I was at a loss. And given that this article is referenced a lot, throughout, I found it subsequently authoritive to be a little modest ;-)
So I'm not necessarely looking for a ready-cut solution, but an exchange of ideas and approaches I'd welcome!

Regards, Sander.

In getTransactionSize try putting a raw_input("") statement which will cause the script to hang. Then look at what the transaction directory actually looks like on the server. Then you might be able to cancel the transaction on the client at that point. I can't remember exactly how I debugged it at the time.

Or maybe you can just do print statements and the output of the print statement might actually show up at the client? I can't remember. At the very least you could write to a log file from within the python script.

I can have a look but I'm a bit busy right now.

I copied the transaction directory to my home.

[root@ENH-SF-XX ~]# cd 1256-e.txn/
[root@ENH-SF-XX 1256-e.txn]# ll
total 96
-rw-r--r-- 1 root root 71 Mar 18 18:28 changes
-rw-r--r-- 1 root root 4 Mar 18 18:28 next-ids
-rw-r--r-- 1 root root 144 Mar 18 18:28 node.0.0
-rw-r--r-- 1 root root 396 Mar 18 18:28 node.0.0.children
-rw-r--r-- 1 root root 98 Mar 18 18:28 node.qg.0
-rw-r--r-- 1 root root 678 Mar 18 18:28 node.qg.0.children
-rw-r--r-- 1 root root 111 Mar 18 18:28 node.uq.0
-rw-r--r-- 1 root root 76 Mar 18 18:28 node.uq.0.children
-rw-r--r-- 1 root root 120 Mar 18 18:28 node.ur.0
-rw-r--r-- 1 root root 187 Mar 18 18:28 node.ur.0.children
-rw-r--r-- 1 root root 171 Mar 18 18:28 node.v7.0
-rw-r--r-- 1 root root 141 Mar 18 18:28 props

This was trying to commit a pom.xml file in which I changed a single line from:

To:

I did similar commits in which I added a whole file, to equal no avail.

This is actually all from my initial testing. Could've included that before :$

Thanks for your attentiveness.

Try removing the +'/rev' from the end of 'txnRevPath = repos+'/db/transactions'+'/'+txn+'.txn'+'/rev'"

I'll happily try this change; will do this in the down hours, later. Looks as if it could actually run exception-free. I don't see, though, how the size of the whole transaction directory is a measure for the transaction size. Are you just accepting a certain overhead size here?

Well in the version of subversion that I developed this for, the transaction was essentially the same as the revision. If the operation proceeded successfully, then it copied the transaction to a new revision. I'm not sure why you are implying that the "size of the whole transaction directory" might not be "a measure for the transaction size".

It seems that /rev has just been moved somewhere else. The /db/transactions/TXN.txn directory appears just to be transaction meta data now (it always returns to be 4096 in size). Without looking at any docs (saying that as a sort of disclaimer) I've stumbled upon a txn-protorevs directory, which lead to the following essential change to your hook script:

def getTransactionSize(repos, txn):
txnRevPath = repos+'/db/txn-protorevs/'+txn+'.rev'
return os.stat(txnRevPath)[6]

Now the script appears to know somewhat the file size, which may be good enough. For adding a 1708876 bytes file it reports 1709254 bytes. For a 2 byte change to a file it reports 50 bytes.

Thanks for your assistance, I'm taking this to production, and let our developers stumple over it in the morning. Well, let's hope they don't stumble too hard, because that would mean one of two things: either my changes to the hook don't work well --or-- someone again tries to upload too big a file, which may not end up in the repo then, due to this hook, but it'll still leave its footprint by adding the pre-commit revision to my server's file system. Oh, well, we'll see!

What do you mean by "but it'll still leave its footprint by adding the pre-commit revision to my server's file system". If the pre-commit hook fails, the transaction will be removed. There will be no footprint.

The transaction stayed behind when someone got our server onto its knees when making a 7GB commit, last week. I had to take some corrective measures quickly, so I'm not sure how this went about exactly. Sorry for the confusion.

FWIW, the documentation for the FSFS file structure is here:
http://svn.collab.net/repos/svn/trunk/subversion/libsvn_fs_fs/structure

Your change looks correct, and is the same as what I ended up implementing on my server. Here's the relevant bit from the FSFS document:

 

It would suck to have to do that just because some people were committing large binaries (without any legitimate reason to).

Dave,

Funny part - I had to update your original script today. We recently upgraded to 1.6.

Thanks,

Tomas

Tomas, glad to see it's still in use.

Won't you share your updates?

Nice script and documentation! It's almost what I am looking for.

I'd like to limit the size of an entire repository. On each commit, check if the entire repository is over a pre-set size. If the commit would make the repository too large, reject the commit.

I'll start studying your script for possible adaptability. If you have any hints or tips, I'd appreciate it!

Did you find how to do this?

Thanks

This script is just great! Thanks a lot for the work, I hope it will help me to lead the people on my repository in the right direction.

Hi,
I've modified the hoock because its didn't work with SVN 1.6.9

Here is the code:

 

Regards,
Sebastián

Hi,

Nice script, but what i want is that each repository has a limit size and block if a user is upload a larger file than the repository available

If u know something in python3 i would apreciate

Thanks

hey can you give me this script it .bat file so that i can run it in Windows enviournment.

Any suggestion regarding windoows based implimentation of this code piece.As i have already converted it with some modifications & its running fine but fails in some scenarios

The SVN 1.6.9 compatible script above fails if you have any 'stuck' transactions in your db/transactons directory.

It's better to change the start of the getMetaData function to:

 

Add new comment