Why Subversion for me:

My interest in Subversion stems from document management issues. Please don’t get me wrong; I believe Subversion is great for Software version control.  But you can’t ignore its general usefulness.  I spent a month playing around with Oracle IFS.  After deciding to give extending it a whirl, I went down the licensing road. OH my.  When you combine the massive amounts of hardware needed to run it with the rather large licensing tab, your stomach will turn.  At least mine did. So I decided to reconsider adapting CVS to my needs one last time. “HMMM… no renames, can’t keep track of directory moves, I really need versioned properties also.  ”Dang, IFS is expensive “very complicated to setup, plus I have to make changes for my needs anyway”.  Hey, what about that version control thing that I stumbled across a while ago.  What was it called ... Subversion.  Let me go check that thing out one more time.  OH shit they self-host now.  And the rest as they say is history.

Why SQL based DB:

Pervasive

Enough said.

Large repository support

I have several applications for Subversion which will require Repositories > 50 GB. I feel that databases supporting “table-spaces” and/or LOBs help with this a great deal.  Not to mention the performance gain that sometimes comes along for the ride.  Table-spaces usually allow their pieces to be in different file systems as well.  Got big 10GB DB files?  Some backup software can’t read large files. Or simply can’t split files across tapes.  How about 5 2GB files instead.

Extensibility

·       SQL databases by their vary nature are extensible

·       SQL provides a very robust adhoc query capability. This capability can be used both internally and externally to Subversion

Multi-tiering

·       I like to have any process exposed to the Internet on separate boxes from the data they access.  SQL easily allow for queries to be passed through internal/bastion firewalls and networks.

·       This may also provide some performance opportunities.

Design drivers and goals:

Support multiple DBs

·       I hate religious wars. Databases all have strengths and weaknesses

·       Many organizations require use of specific Databases. Site licenses and such.

·       May help in legal arguments when using open-source Databases.

Support multiple APIs

·       No real standard C interface. The closest thing I found was ODBC

·       No JDBC/DBI like de-facto standard

Be extensible

·       When using a specific DB make it possible to use “nice” features

·       Decided against a “all knowing SQL generator” which attempts to hide SQL details.  This is difficult to do well.  Also hard to beat the expressiveness of SQL.  So why hide it.

·       Allow use of DB specific syntax to some extent.  (Stored procedures ignored for right now) Although I’m sure this is in my near future.

·       Decided against imbedded pre-processor based solution. It’s more work and makes runtime configuration not as easy to accomplish. 

Avoid big ugly embedded SQL constants.

·       Why keep in-line constants for rarely accessed SQL.  Instead Separate and reference by name.

·       IMO it better enables code to use two separate queries to return same results in different ways.

·       May make development quicker and easier IMO.

·       Keeping SQL separate makes good reference point for low level logic discussions. First it executes… then this …, then this …

Design (kinda sorta:-)

 

sql_fs_impl layer (in blue)

This layer is responsible for directing function traffic based on configuration information which will be stored in the existing repository tree for now.  It must insure that the svn_fs_vtable represents a cohesive set of function calls.  I.e. load impl_packs and make sure they play together.  This function will be weak at first.

impl_packs (in purple)

These are chunks of code identified to perform a specific file system function or functions.  The Sql_fs_impl layer uses configuration information and impl_packs to construct valid vtables.  The sql_statement content is managed by the sql_stmt and packs_mgmt support code.  Struct_mgmt’s  macros and, optionally generated, structure access functions, aid in writing code not cluttered with DB access code. This makes no sense I’m sure but it’s not quite done yet.  I have to use examples to explain.

SQL statements support code (in yellow)

They allow SQL to be written in db specific dialects (if needed).  This is often needed for DDL.  Statements have conversion tokens for parameters and return columns. The names and types are defaulted if not provided. The parameters are put in DB specific format by the VAPI layer through functions registered by the specific adaptor.  I have simple, prepared, and chunked statements.  Chunked look like prepared only they allow the entire SQL statement to be built on the fly.  Means you don’t have to worry about whether or not the underlying DB/API handles in (? Where ? = 1,7,9,11) for instance, or if you want to write semi-custom SQL on the fly.  I also have sequenced statements.  I use them in DDL and plan on using them for other things as well.

Packs management support code (in yellow)

Provides functions for working with groups of statements. As well as (I hope) dynamic binding to the C code processing the statements. I haven’t played with the apr_dso support yet.

db_vapi code (in green)

Provides a way of hiding much of the indirection introduced by a system able to work with multiple DBs and access APIs.  “A->B->C->func()” can get a little unreadable at times.  Defaulted pool management binds sub-pools to their logical access construct.

API adaptors (in gray)

These provide specific implementations of the VAPI layer.

Current code status.

General comments

I consider all code to be in a pliable state. It’s never done.

Build system dependencies are way up in the air.  I did buy a book on autoconf howeverJ.  I currently have minimal changes to build.conf, which still require me to set (EXTRA_CPPFLAGS, EXTRA_CFLAGS, and EXTRA_LDFLAGS) to build my chit.

Also have spent some time looking at python. It’s not currently in my bag-o-tricks.

Definitions

Vapor

No code slated for final use is underway.  However, conceptual code may have been written say in the form of SQL scripts or C in some cases.  One such area has to do with alternate forms of storing directory entries and properties.  The delta-fied storage of properties makes them difficult to query against.  I don’t want to debate this.  I want to implement both the current stuff and something I’ve yet to get figured outJ

Underway

Enough coding has been done that it’s not fair to call it vapor

Almost usable

Significant code has been written but more work is needed

Usable

Code is close to complete for nowJ

Sql_fs_impl (Vapor)

Impl_packs (Underway) Long road however.

Vapi (Usable)

SQL_stmts (Usable)

Struct_mgmt (Almost usable)

Packs_mgmt (Underway-Almost usable)

Long-term

Query/reporting extensibility

I really want to be able to answer questions like: list all files/directories where property x = blah and make it revision knowledgeable.  I would like to expose a function or two that handles searching/and or reporting based on revisions and meta-data.  I would like for this to work in a plug-in or “impl_pack” way. I want to be able ask a repos what it can do for me.  Then ask it to do it. Like: “svn list_rpts <-v>; svn exec <rpt>”. Perhaps XML is a way to implement this sort of thing. I haven’t looked at ra_dav but I’m sure it will provide some ideas.

Archive

I’ve seen mentioned on the list the need to archive off old files.  I’m really going to need this.  So just let me say that it is real high on my list of things to work on.  I have a fair amount of java code wrapped around C via JNI to handle Tapes and Tape libraries.  I can’t release anything yet for legal reasons.  Currently it only runs on Solaris due to some of the C tape code.  I have extensive plans for re-writing the Library portion.  Some threading issues have bitten me hard.  Anyhow, the highest-level stuff is RMI based.  It does some cute stuff I can’t discuss just yet.  I’m considering creating an XML-RPC interface for this stuff.