OpenAFS 1.4.12 pre-release is available


January 13, 2010 by Michael Meffie

Update: OpenAFS 1.4.12 release candidate 4 is now available.

The OpenAFS 1.4.12 release candidate 1 has been announced. The source code is available for download. This pre-release contains a large number of fixes since OpenAFS 1.4.11, including several critical fileserver and unix cache manager fixes.

Testing this pre-release in your environment will help improve the quality of the release. We can’t fix problems we don’t know about, so please report any errors to openafs-bugs@openafs.org and any positive results to the OpenAFS information mail list.

Thanks,
Mike

AFSVol Tag-Length-Value Remote Procedure Call Extensions


December 14, 2009 by Tom Keiser

Several weeks ago I submitted an Internet Draft (I-D) to the AFS-3 protocol working group. This draft covers a number of interesting changes to the volume server RPC interface. The key proposals made in this draft are to:

  1. Introduce a GetCapabilities RPC, similar to the ones previously defined for the file server and cache manager services,
  2. Develop a solution to the XDR discriminated union discriminator evolution problem,
  3. Introduce a suite of RPCs to manage volume metadata via Tag-Length-Value (TLV) semantics,
  4. Export existing volume and volume transaction metadata via the new TLV interface, and
  5. Export Demand Attach File Server (DAFS) state metadata via the new TLV interface.

The primary motivation for this draft is the desire to introspect DAFS state via the standard remote procedure call interface. At the moment, the vos command can only report a boolean volume state of online/offline. For DAFS deployments, this is inadequate to properly manage a file server. Existing DAFS deployments utilize the fssync-debug command to determine the exact state of a volume. We have recognized for quite some time the need for a better (remote, and administrator–rather than developer–focused) introspection mechanism. With the advent of this draft, a considerably more descriptive set of states can be reported back to the caller.

Secondarily, this draft will pave the way for future protocol changes which permit vos to set advanced forms of volume policy, such as RxOSD-specific quotas, volume ACLs, etc.

The abstract for this draft is as follows:

AFS-3 heavily leverages Remote Procedure Calls (RPCs). This proposal adds a new mechanism to better manage the addition of new, enhancement-specific RPCs through the use of both capability bits via the GetCapabilities RPC, and via standardization of backwards-compatibility behaviors for enhancement-specific RPCs. These goals are accomplished through standardization of Tag-Length-Value (TLV) get/set/enumerate RPCs with value payloads encoded using an XDR discriminated union. The XDR union decode problem is circumvented by specifying an opaque default leg. Tags are allocated for existing volume and transaction metadata, and implementation-private tags are allocated for metadata related to the OpenAFS Demand Attach File Server.

Full text is available in the following formats: TXT, HTML, and XML.

Sine Nomine welcomes discussion of and feedback on this proposed Internet Draft over the afs3-standardization mailing list.

A new Windows AFS build script


December 4, 2009 by Mickey

If you’re new to building the OpenAFS Windows client from source, getting your build machine properly configured can be a daunting task. The current process involves editing lines in a file named ntbuild.bat with things like the type of client to be built (32-bit, 64-bit, debug, etc) and the ‘8.3′ names of the paths to various installed products. If you have multiple build machines, you have multiple versions of this file.

Winafsbld is a set of batch files that replaces ntbuild.bat. Once all is said and done, it uses the same build mechanism as ntbuild.bat does but wraps it in a much more user-friendly environment. It does away with the need for the ‘8.3′ names and does a lot of parameter checking before it starts the build.

Winafsbld with instructions is available here as a zip file.

Here is a sample from the instructions:

Winafsbld.bat is executed from the same location as ntbuild.bat. It sets various configuration environment variables and calls batch files in the
winafsbld directory.

No changes should be required in any of the files in the winafsbld directory.

Use Notepad (etc) to review winafsbld.bat. Options should be clear. (Note that a signing certificate is required for this release despite the options given.)

Open a console window and type ‘winafsbld’. If on Vista, open a console window with ‘Run as Administrator’ and navigate from the system directory to your build directory.

As currently implemented, the console window will turn blue to indicate a configuration error, red to indicate an error or green to indicate success.

Failures will spawn Notepad with the log file.

It is not a good idea to re-run Winafsbld.bat from the same console window as various environment variables will be modified each time the batch file is run. Close the console window and open a new one.

An old cache manager bug squashed


December 3, 2009 by Michael Meffie

An old but critical bug in the unix version of the OpenAFS cache manager kernel module was recently fixed by Sine Nomine and was committed in the upstream stable code tree for inclusion in the next release of OpenAFS. This was quite an old bug. In fact, it has been present since OpenAFS 1.0, which makes it about ten years old.

The site reporting the bug had several hosts crash after removing a bogus IP address in their VLDB, which initially was quite baffling. As it turned out, a rare combination of events lead to a code path that exposed a race condition in the cache manager. In this case, the cache manager would crash when trying to use a pointer to memory which was freed and then reused on another thread.

This was triggered when the client noticed one of the fileserver network interfaces has a new address. At that point the cache manager invalidates the old address from all the cache entries for that server. The memory holding the server information is freed and is available for other uses in the cache manager.

The cache manager code which flushes vcache entries also accesses the server data members when flushing cache entries for read-only volumes. This is done to save the volume level callback information, since read-only volumes have callbacks for the entire volume, and not per individual files.

Now, there are a series of locks in the cache manager to prevent threads from walking over each other’s memory, but in this case, the locks were not used correctly in the code which was flushing the read-only cache entry. This code took a pointer to the memory holding the server information before the lock was held, a classic race condition. The fix was to make sure the pointer to the shared data member is used only after the mutual exclusion lock is held.

The patch is available in the OpenAFS git repository,

cm: address race condition in afs_QueueVCB

This is a conservative fix for the stable series. No new locks, or changes to locking order are introduced. However, longer term, we may want to revisit this part of the cache manager.

Restricting AFS ACLs


December 1, 2009 by Andrew Deason

If you’ve ever administrated a sufficiently large and public AFS cell, you have probably at least once had a user assign rlidwka rights to system:anyuser on a directory. This can be a real security headache, particularly when web-accessible data is pulled directly from AFS. The only way currently to make sure that doesn’t happen is to revoke users’ admin rights, but then you lose the convenience and flexibility of users maintaining permissions themselves.

Arguably, this problem can be solved by user education and performing audits of ACL rights, but that isn’t always enough. Sometimes there are simply too many users, and/or the cost of them making a mistake with ACLs is just too high. Clearly, we need a way to specify an enforceable security policy, so too-permissive ACLs cannot ever be set.

To solve this, Mike Meffie, Tom Keiser, and I have been writing an Internet Draft that proposes three different ways to specify such security policies. It is currently available at
http://bm1vsrv05.sinenomine.net/~adeason/draft-deason-afs3-acl-restrictions-00.html.

Here is the Abstract:

The AFS-3 ACL ‘a’ bit gives users unfettered power to grant, or revoke, privileges, with no provision for enforcing site policy. This memo provides several alternative mechanisms for creating restrictions on what powers the ‘a’ bit denotes. Three alternative mechanisms for restricting the power of the ‘a’ bit are proposed: a method for overlaying the ACL with a site-controlled ACL; a method for masking the ACL with a site-controlled privilege mask; and a finely granular meta-acl mechanism for restricting to whom prvileges may be delegated, and which privileges may be given to different classes of principals. This memo will serve as a basis for the ACL restriction discussion with the AFS-3 protocol working group. The intended goal of this discussion is to reach consensus on standardization of one or more solutions, and then publish a BCP status memo.

If one of these methods in particular sounds best, or this just sounds useful to your or your organization in general, we encourage you to let us know. We welcome any feedback or discussion on the openafs-info mailing list.

Two important OpenAFS fileserver fixes


November 30, 2009 by Michael Meffie

Two important fileserver fixes are available for OpenAFS 1.4.11, both of which address intermittent fileserver crashes. Source code patches are available in the OpenAFS git source code repository and are in the pipeline for the next release of OpenAFS.

The first patch fixes an error in the handling of multi-homed client hosts. An AFS client host may have multiple interfaces, and hence multiple IP addresses. The fileserver attempts to associate these IP address to the host in memory. This multi-home tracking has been improved in recent releases of OpenAFS, however a subtle error was introduced around OpenAFS 1.4.8. When the last address associated with a host is removed, the callback connection for that host was also removed. In some cases that connection object was still in use by other threads, and the premature removal of the connection object will lead to a server crash when the fileserver attempts to access a null pointer.

The second fix is for an insidious and long standing bug in the host package of the fileserver. Several cases were found where the fileserver could be using a host object that had been freed. This bug could manifest in a number of terrible ways. Sometimes this bug lead to a situation where the internal list of client hosts was corrupted, in which case the fileserver could crash or even hang as it was trying to traverse a linked list that looped on itself. In other cases, the fileserver heap could be corrupted and the fileserver would crash when calling malloc, or the filerserver would crash when attempting to free an object which was already freed.

The fixes are available in the OpenAFS git repository, and are mirrored on bm1vsrv05.sinenomine.net,

Sine Nomine OpenAFS Blog - Hello World!


November 23, 2009 by Evan

Welcome to the inaugural post of a pilot project here at Sine Nomine Associates - an OpenAFS Department Blog. We are doing a lot to work with customers and colleagues to make OpenAFS as useful and beneficial as possible, but it occurred to us that we could do a better job of communicating that to the wider OpenAFS community. This blog will be our attempt to do that.

Let me take a moment to explain what this blog is and is not. It is a place for Sine Nomine engineers to provide information about projects, tools, patches and interesting observations about the workings of OpenAFS in general. If we isolate and fix a problem that we expect may be generally experienced, we will post here as well as providing the solution through the appropriate OpenAFS.org channels. If we answer a question about how OpenAFS works that we think would be interesting for the general community to know, we’ll post here. If we develop tools that might be handy when administering (or developing) AFS, we’ll post here.

This blog is not intended to be a advertising or marketing channel. Yes, we will probably make references to SNA products and services, because the technical things we talk about here will be the results of SNA products and services. On occasion, I may post about opportunities SNA is going to make available for community involvement in a project we are doing. That being said, the majority of the content, and the entirety of the intent is to provide useful and interesting information about OpenAFS at SNA to our customers, friends and the wider community of OpenAFS users.

With that, here are some of the things we’ll be posting about in the near future.

  • ALERT Posts - In the past six months or so, SNA has issued patches to cover issues discovered in recent versions of OpenAFS, we’ll reprise the issue and the patches here, so they can be found easily.
  • Tools Posts - SNA has been developing a few tools which may be of use to users of OpenAFS for Windows. We’ll be posting information about them here.
  • Answers Posts - As we get interesting questions from customers, we will post our answers here if we think the answer is something others would like to know.

As this blog progresses, we’ll be posting other things as well, like architectural summaries of operating AFS in specific environments and discussions of proposed features that have reached Internet Draft status.

If you have any questions please contact us and we’ll be happy to answer them. For now we have commenting turned off to avoid the spamlink evils to which so many blog comments tend these days.

Thanks for reading!

Evan Macbeth
Director, OpenAFS
Sine Nomine Associates