Limit Size of Subversion Commits With this Hook

We have experienced some abuse of our subversion repository at work recently. Someone committed 400 MB of data all at once including many object files, libraries, and executables. I did not get very harsh with the person who did this. Because a) I have no objection to binaries in subversion in the first place, b) I don't really know what he's working on, c) disk space is cheap and we are no where near capacity, and d) his commit was still smaller than a few commits we had a long time ago (which were legit). Still, if you just allow people to commit whatever they want to your subversion repository, in the worst case, you could run out of disk space, necessitating an svn dump-and-load onto a new larger drive (pain). It would suck to have to do that just because some people were committing large binaries (without any legitimate reason to). There are other annoying consequences. Our tarball backups of svn currently fit on a DVD, which is cheap and easy, if we allowed this abuse to continue it would complicate our backup process.

What I wanted was a way to limit the commit size for certain users automatically. There did not seem to be any hooks out there to do this, so I wrote one.

Just paste the following into your pre-commit hook:

/svn/repos/hooks/check_txn_size.py "$REPOS" "$TXN" || exit 1

and paste the following into a check_txn_size.py file in your hooks directory and make it executable.

#!/usr/bin/env python
import sys,os,popen2
 
MAX_BYTES = 1024000
DEBUG = False
SVNLOOK = '/usr/bin/svnlook'
ALLOWED_USERS = ['david', 'ctang', 'vjain', 'mike', 'sbridges', 'tcirip']
ADMIN_EMAIL = '<a href="mailto:admin@company.com">admin@company.com</a>'
 
def printUsage():
  sys.stderr.write('Usage: %s "$REPOS" "$TXN" ' % sys.argv[0])
 
def getTransactionSize(repos, txn):
  txnRevPath = repos+'/db/transactions'+'/'+txn+'.txn'+'/rev'
  return os.stat(txnRevPath)[6]
 
def printDebugInfo(repos, txn):
  for root, dirs, files in os.walk(repos+'/db/transactions', topdown=False):
    sys.stderr.write(root+", filesize="+str(os.stat(root)[6])+"\n\n")
    for name in files:
      sys.stderr.write(name+", filesize="+str(os.stat(root+'/'+name)[6])+"\n")
 
def checkTransactionSize(repos, txn):
  size = getTransactionSize(repos, txn)
  if (size > MAX_BYTES):
    sys.stderr.write("Sorry, you are trying to commit %d bytes, which is larger than the limit of %d.\n" % (size, MAX_BYTES))
    sys.stderr.write("If you think you have a good reason to, email %s and ask for permission." % (ADMIN_EMAIL))
    sys.exit(1)
 
def getUser(repos, txn):
  cmd = SVNLOOK + " author " + repos + " -t " + txn
  out, x, y = popen2.popen3(cmd)
  cmd_out = out.readlines()
  return cmd_out[0][:-1]
 
if __name__ == "__main__":
  #Check that we got a repos and transaction with this script
  if len(sys.argv) != 3:
    printUsage()
    sys.exit(2)
  else:
    repos = sys.argv[1]
    txn = sys.argv[2]
 
  if DEBUG: printDebugInfo(repos, txn)
  user=getUser(repos, txn)
  if DEBUG: sys.stderr.write("User:"+user)
  if DEBUG:
    if (user in ALLOWED_USERS):
      sys.stderr.write(user+" in allowed users")
    else:
      sys.stderr.write(user+" not in allowed users")
  if (user not in ALLOWED_USERS):
    checkTransactionSize(repos, txn)

Comments

Hi,

Nice hook, but it seems that the size find in getTransactionSize, or maybe size of files themselves, is not equal to size of the file commited. It seems always smaller. Maybe is it (g)ziped ? Do you have any information about that?

thanks,

Gérald

Yes, it's measuring the size of the actual transaction. The transaction, if I remember correctly, is basically what will become the next revision in the db. It's called a transaction because it hasn't become part of the db yet. If you add 10kB of text to a text file that is 10MB and commit the change, the transaction size should be 10kB, not 10MB + 10 kB.

Additionally, if you are adding files, yes I think the transaction itself is compressed. Try putting a pause in the hook or something and look at the transaction and see for yourself. :-)

Thanks for info. Yes, I've seen what you describe. It's the size of transaction. Good idea to put a pause in the hook! :) I'll try.

We are using SVN 1.5.0. I don't see the .../rev files being created (anymore). I don't know where this information goes (now). Did something change with going to 1.5.0 or so? Assistance would be greatly appreciated!

We're running Redhat Enterprise Linux 5, Subversion 1.5.0.

Error: exceptions.OSError: [Errno 2] No such file or directory: '/usr/.../svnroot/sandbox/db/transactions/1256-f.txn/rev'

Patches welcome :-)

> Patches welcome :-)

Well yes, that is fair enough. I'm just a bit stuck, though. A little googling did not bring me information about the repository structure. And while I was trying a certain test commit, I was not able to recognise my changes in the transaction/ directory, so I was at a loss. And given that this article is referenced a lot, throughout, I found it subsequently authoritive to be a little modest ;-)
So I'm not necessarely looking for a ready-cut solution, but an exchange of ideas and approaches I'd welcome!

Regards, Sander.

In getTransactionSize try putting a raw_input("") statement which will cause the script to hang. Then look at what the transaction directory actually looks like on the server. Then you might be able to cancel the transaction on the client at that point. I can't remember exactly how I debugged it at the time.

Or maybe you can just do print statements and the output of the print statement might actually show up at the client? I can't remember. At the very least you could write to a log file from within the python script.

I can have a look but I'm a bit busy right now.

I copied the transaction directory to my home.

[root@ENH-SF-XX ~]# cd 1256-e.txn/
[root@ENH-SF-XX 1256-e.txn]# ll
total 96
-rw-r--r-- 1 root root 71 Mar 18 18:28 changes
-rw-r--r-- 1 root root 4 Mar 18 18:28 next-ids
-rw-r--r-- 1 root root 144 Mar 18 18:28 node.0.0
-rw-r--r-- 1 root root 396 Mar 18 18:28 node.0.0.children
-rw-r--r-- 1 root root 98 Mar 18 18:28 node.qg.0
-rw-r--r-- 1 root root 678 Mar 18 18:28 node.qg.0.children
-rw-r--r-- 1 root root 111 Mar 18 18:28 node.uq.0
-rw-r--r-- 1 root root 76 Mar 18 18:28 node.uq.0.children
-rw-r--r-- 1 root root 120 Mar 18 18:28 node.ur.0
-rw-r--r-- 1 root root 187 Mar 18 18:28 node.ur.0.children
-rw-r--r-- 1 root root 171 Mar 18 18:28 node.v7.0
-rw-r--r-- 1 root root 141 Mar 18 18:28 props

This was trying to commit a pom.xml file in which I changed a single line from:

To:

I did similar commits in which I added a whole file, to equal no avail.

This is actually all from my initial testing. Could've included that before :$

Thanks for your attentiveness.

Try removing the +'/rev' from the end of 'txnRevPath = repos+'/db/transactions'+'/'+txn+'.txn'+'/rev'"

I'll happily try this change; will do this in the down hours, later. Looks as if it could actually run exception-free. I don't see, though, how the size of the whole transaction directory is a measure for the transaction size. Are you just accepting a certain overhead size here?

Well in the version of subversion that I developed this for, the transaction was essentially the same as the revision. If the operation proceeded successfully, then it copied the transaction to a new revision. I'm not sure why you are implying that the "size of the whole transaction directory" might not be "a measure for the transaction size".

It seems that /rev has just been moved somewhere else. The /db/transactions/TXN.txn directory appears just to be transaction meta data now (it always returns to be 4096 in size). Without looking at any docs (saying that as a sort of disclaimer) I've stumbled upon a txn-protorevs directory, which lead to the following essential change to your hook script:

def getTransactionSize(repos, txn):
txnRevPath = repos+'/db/txn-protorevs/'+txn+'.rev'
return os.stat(txnRevPath)[6]

Now the script appears to know somewhat the file size, which may be good enough. For adding a 1708876 bytes file it reports 1709254 bytes. For a 2 byte change to a file it reports 50 bytes.

Thanks for your assistance, I'm taking this to production, and let our developers stumple over it in the morning. Well, let's hope they don't stumble too hard, because that would mean one of two things: either my changes to the hook don't work well --or-- someone again tries to upload too big a file, which may not end up in the repo then, due to this hook, but it'll still leave its footprint by adding the pre-commit revision to my server's file system. Oh, well, we'll see!

What do you mean by "but it'll still leave its footprint by adding the pre-commit revision to my server's file system". If the pre-commit hook fails, the transaction will be removed. There will be no footprint.

The transaction stayed behind when someone got our server onto its knees when making a 7GB commit, last week. I had to take some corrective measures quickly, so I'm not sure how this went about exactly. Sorry for the confusion.

FWIW, the documentation for the FSFS file structure is here:
http://svn.collab.net/repos/svn/trunk/subversion/libsvn_fs_fs/structure

Your change looks correct, and is the same as what I ended up implementing on my server. Here's the relevant bit from the FSFS document:

Location of proto-rev file and its lock
  Formats 1-2: transactions/<txnid>/rev and
    transactions/<txnid>/rev-lock.
  Format 3-4: txn-protorevs/<txnid>.rev and
    txn-protorevs/<txnid>.rev-lock.

It would suck to have to do that just because some people were committing large binaries (without any legitimate reason to).

Dave,

Funny part - I had to update your original script today. We recently upgraded to 1.6.

Thanks,

Tomas

Tomas, glad to see it's still in use.

Won't you share your updates?

Nice script and documentation! It's almost what I am looking for.

I'd like to limit the size of an entire repository. On each commit, check if the entire repository is over a pre-set size. If the commit would make the repository too large, reject the commit.

I'll start studying your script for possible adaptability. If you have any hints or tips, I'd appreciate it!

Did you find how to do this?

Thanks

This script is just great! Thanks a lot for the work, I hope it will help me to lead the people on my repository in the right direction.

Hi,
I've modified the hoock because its didn't work with SVN 1.6.9

Here is the code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
 
import sys,os,popen2
import codecs
 
MAX_BYTES = 1024
DEBUG = False
SVNLOOK = '/usr/bin/svnlook'
ALLOWED_USERS = ['']
 
def printUsage():
  sys.stderr.write('Usage: %s "$REPOS" "$TXN" ' % sys.argv[0])
 
def getMetadata(repos, txn):
  byTx = list()
  for root, dirs, files in os.walk(os.path.join(repos , 'db/transactions'), topdown=False):
    meta = dict()
    byTx.append(meta)
    for name in files:
      if not name.startswith('node') or name.endswith('children') or name.endswith('props'):
        continue
      file_name = os.path.join(root, name)
      meta[name] = dict()
      file = None
      try: 
        file = codecs.open(file_name, "r", "utf-8")
        for line in open(file_name).readlines():
          line = line.replace("\n", "").split(":")
          if len(line) > 1:
            key = line[0].strip()
            value = line[1].strip()
            value = value
            if key == 'type' and value != 'file':
               meta.pop(name)
               break
            meta[name][key] = value
      finally:
         if file:
           file.close()
  return byTx 
 
def getFileSize(value):
  return int(value.split(' ')[3])
 
def printDebugInfo(repos, txn):
  meta = getMetadata(repos, txn)[0]
  sys.stderr.write("\n")
  for entry in meta.values():
     sys.stderr.write("\t%s = %s bytes\n" % (entry['cpath'], getFileSize(entry['text'])))
  sys.stderr.write("\n")
 
def checkTransactionSize(repos, txn):
  meta = getMetadata(repos, txn)[0]
  fail = False
  for entry in meta.values():
    size = getFileSize(entry['text'])
    if size > MAX_BYTES:
      sys.stderr.write("File %s has %d bytes and is larger than the limit (%d bytes)\n" % (entry['cpath'], size, MAX_BYTES)) 
      fail = True
 
  if fail:
    sys.exit(1)
 
def getUser(repos, txn):
  cmd = SVNLOOK + " author " + repos + " -t " + txn
  out, x, y = popen2.popen3(cmd)
  cmd_out = out.readlines()
  return cmd_out[0][:-1]
 
if __name__ == "__main__":
  #Check that we got a repos and transaction with this script
  if len(sys.argv) != 3:
    printUsage()
    sys.exit(2)
  else:
    repos = sys.argv[1]
    txn = sys.argv[2]
 
  if DEBUG:
    printDebugInfo(repos, txn)
 
  user = getUser(repos, txn)
 
  if DEBUG:
    if (user in ALLOWED_USERS):
      sys.stderr.write("User %s in allowed users\n" % user)
    else:
      sys.stderr.write("User %s not in allowed users\n" % user)
 
  if (user not in ALLOWED_USERS):
    checkTransactionSize(repos, txn)

Regards,
Sebastián

Hi,

Nice script, but what i want is that each repository has a limit size and block if a user is upload a larger file than the repository available

If u know something in python3 i would apreciate

Thanks

hey can you give me this script it .bat file so that i can run it in Windows enviournment.

Any suggestion regarding windoows based implimentation of this code piece.As i have already converted it with some modifications & its running fine but fails in some scenarios

The SVN 1.6.9 compatible script above fails if you have any 'stuck' transactions in your db/transactons directory.

It's better to change the start of the getMetaData function to:

def getMetadata(repos, txn):
  byTx = list()
  txnDir = txn + '.txn'
  for root, dirs, files in os.walk(os.path.join(repos , 'db/transactions', txnDir), topdown=False):
    meta = dict()
 etc.

Add new comment