underlap

I am Glyn Normington, a retired software developer, interested in helping others improve their programming skills.

A recent thread on mastodon got me thinking about my experience with concurrent programming. Here's a thumbnail sketch of the approaches I've tried.

Compare-and-swappery

Some of the first concurrent code I wrote involved managing multiple threads of execution using compare and swap instructions. This was hard work, but I wasn't tempted to try anything particularly complex because it was just too hard to reason about. This was before the days of unit testing¹, so developers were used to spending a lot of time thinking about the correctness of code before attempting to run it.

Model checking

One way of reasoning about concurrent code was to model the behaviour in CSP and then use a model checker like FDR to check various properties. Unfortunately, even relatively simple concurrent code took quite a bit of effort to model in CSP. Also, model checking, even with FDR's amazing “compressions”, tended to take too long unless the state space could be kept manageable. So with this approach I again tended to spend a lot of time thinking, this time about how to structure the CSP model to keep model-checking tractable. The result was I only produced one or two limited CSP models.

I would say the main benefit of CSP modelling is that it makes you aware of the main types of concurrency bugs: deadlock (where all or part of the system seizes up permanently), livelock (where the system gets into some kind of unending, repetitive behaviour), and more general kinds of divergence (e.g. where the system spends its time “chattering” internally without making useful progress).

Memory models

Java has various low-level locking mechanisms for managing concurrency. The Java memory model gives a good framework for reasoning about concurrent code in Java. Again the emphasis was on reasoning and it was hard work, but at least there was the sense that it was well founded.

Channels and goroutines

I've used Go a lot and would say goroutines (similar to lightweight threads, sometimes called “Green threads”) and channels are deceptively simple. The principle is that you can safely write to, and read from, a channel in distinct goroutines. It's easy to build concurrent systems that work most of the time, although it's hard to be sure they are bug free. But at least you're better off than in a language which only provides low-level mutexes and such like.

Language support

Rust guarantees safe access to shared data at compile time. The main difficulty is getting used to the constraints imposed by this model and then designing your code appropriately. That said, I haven't written much concurrent code in Rust, so I'll simply defer to the book.

Footnote 1: When I did write the occasional unit test, I also deleted it to avoid having to maintain it!

You can avoid checking in certain files in a git project directory by using a .gitignore file.

Then there are files which you don't want to check in on any project directory, such as editor/IDE configuration files. Instead of “polluting” .gitinore files with such entries, it's better to set up a global .gitignore file:

$ touch ~/.gitignore
$ git config --global core.excludesfile ~/.gitignore

Here's the contents of mine:

*~
.DS_Store
.idea
*.iml
\#*#
*.hsp
*.sav
*.scpt
/scratch/
.vscode/
coverage.out
.Guardfile
.config.ru

(You can see some of my history there: macOS, IntelliJ, VS Code, Go, etc.)

But notice this entry:

/scratch/

This means that any directory in the project named scratch will be ignored, along with its contents.

Having such a git scratch directory turns out to be really handy:

  • The files are visible to your editor/IDE.
  • It's easier to remember where you put such files compared to storing them outside the project directory.
  • You can even nest directories in the scratch directory.
  • If you finish with the project and delete the project directory, the files are cleaned up too.

A recent example is from a Rust project:

$ cargo expand > scratch/generated.rs

This puts the generated code in a file which my editor will recognise as Rust code, and will display with syntax highlighting. I definitely didn't want to check that file in!

Other files which are suitable for the scratch directory are:

  • Hacky tests or fixtures I'm too embarrassed to check in.
  • Dependencies I don't control, but which I need to modify, e.g. for debugging.
  • TODO lists and other rough notes.
  • Output files from static analysis or code coverage.
  • Old versions of code files from the project which I want to refer to quickly.
  • Old project executables for comparing with the current behaviour.
  • Downloaded PDF manuals relating to the project.

I'm sure you'll find many other uses for git scratch directories.

A micro-commit is a commit of a small/incremental code change, which ideally also passes the tests.

Micro-commits combine nicely with TDD as follows: 1. Write a failing test 2. Make all the tests pass 3. Commit 4. Refactor to make the code clean 5. Commit

Why commit so often?

Mark Seeman's recent stackoverflow blog “Use Git tactically” uses one of my favourite analogies to coding: rock climbing. The basic idea of micro-commits is to work with small, safe changes and commit after each one, much as rock climbers regularly secure their ropes to the rock face.

If something goes wrong, there's a smaller distance to fall. In coding, something going wrong, such as a regression creeping into the code, can be recovered from by returning to an earlier commit without losing much work in the process.

Other benefits

Using micro-commits reduces the stress of coding. Not only do you know you're not going to lose much work if you mess things up, but by focussing on one small step at a time, you don't have to keep a lot of information in your head.

Micro-commits also help you stay focussed on making just one change at a time. To avoid falling into the “While You’re At It” trap, described in Tim Ottinger's blog “What's this about Micro-commits?”, keep a TODO list, either in an issue or, for short-lived items, on a piece of paper.

Squash or merge?

After completing a fix, feature, or other pieces of work, it's important to decide what to do with all the micro-commits.

One approach is to squash them into one or more substantial commits. A downside of squashing is that there is then no record of the sequence of micro-commits, which could come in handy (e.g. for bisecting out a bad change) if some problem has crept in which didn't show up while running the unit tests. An advantage of squashing is that people reading the commit history can then see the wood for the trees.

Another approach is to merge the changes with a commit message that summarises the changes. It's hard to see the wood for the trees in this approach when reading a linear series of commit messages (such as produced by git log), but tools which show the commit structure make it easy to pick out the merge commits.

Commit messages

Commit messages for micro-commits are likely to be brief, but don't forget to reference any associated issue(s). You may want to treat the commit messages like a discussion thread and drop in the occasional comment on the overall design when it occurs to you rather than forget to include it when you finally squash or merge.

My favourite “morning paper” is “On formalism in specification”. Studying Bertrand Meyer's original paper, trying to avoid its “seven sins of specification”, and reading Strunk and White's “The Elements of Style” improved my technical writing enormously.

The first comment on the morning paper (by a certain David Parnas) raises an interesting question of whether there can be a “truly readable mathematical specification”. Sadly, I believe the answer is “no”, given many developers' difficulties with mathematics.

However, that hasn't stopped me writing mathematical specifications from time to time, often as a way of getting a basic understanding of an area of software before starting development. Two of which I am proud are “Image Registries” (download PDF) which nails down some of the basic terminology surrounding Docker and OCI registries and “OCI Image Format” (download PDF). The style of interspersing English and mathematics in these specifications might even make them readable by those who find mathematics off-putting.

The so-called “Dirty Pipe” CVE-2022-0847, detailed in Max Kellerman's article, was published on 7 March 2022. I recently upgraded my kernel to the stable version 5.16.11. So is there a new stable version with a fix?

According to a post on the kernel mailing list, the fix is in 9d2231c5d74e (lib/iov_iter: initialize "flags" in new pipe_buffer).

However, the change logs for 5.16.12 and 5.16.13 do not mention 9d2231c5d74e or lib/iov_iter. So has the fix still not made it into a stable kernel?

A UNIX stack exchange answer to question “Given a git commit hash, how to find out which kernel release contains it?” helped here. The github page for commit 9d2231c5d74e shows that, at the time of writing, the fix is part of v5.17-rc7 and v5.17-rc6. So it seems like the fix isn't yet available in a stable kernel.

Postscript: 21 March 2020

According to the releases page at kernel.org:

After each mainline kernel is released, it is considered “stable.”

v5.17, containing the fix, was released yesterday, so I upgraded to that.

This morning I followed these instructions to upgrade Ubuntu to a development release of 22.04 and the kernel to 5.16.11. I was hoping that a bug would be fixed, but it turn out not.

I was previously on 21.10:

$ lsb_release -a
LSB Version:	core-11.1.0ubuntu3-noarch:printing-11.1.0ubuntu3-noarch:security-11.1.0ubuntu3-noarch
Distributor ID:	Ubuntu
Description:	Ubuntu 21.10
Release:	21.10
Codename:	impish

with kernel 5.15.0-rc7:

$ uname -r
5.15.0-051500rc7-generic

I upgraded to a development branch of Ubuntu 22.04:

$ lsb_release -a
LSB Version:	core-11.1.0ubuntu3-noarch:printing-11.1.0ubuntu3-noarch:security-11.1.0ubuntu3-noarch
Distributor ID:	Ubuntu
Description:	Ubuntu Jammy Jellyfish (development branch)
Release:	22.04
Codename:	jammy

and to a stable kernel version 5.16.11:

$ uname -r
5.16.11-051611-generic

If I find any issues, I'll mention them here.

Notes

The following are some notes of errors and interesting observations during the upgrade process.

Third party repository error

During the upgrade process, I notice the following error:

E: The repository 'http://ppa.launchpad.net/gezakovacs/ppa/ubuntu impish Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

This was caused because I installed UNetbootin from the above third party repository, which doesn't support all releases.

Failure upgrading installed packages

When running sudo apt-get upgrade, the system rebooted and I had to re-run the command and follow some recovery instructions to run:

$ sudo dpkg --configure -a
$ sudo apt-get upgrade
$ sudo apt --fix-broken install

Re-running sudo apt-get upgrade then succeeded.

Upgrading the distribution

Towards the end of sudo apt-get dist-upgrade, I notice the following output:

update-initramfs: Generating /boot/initrd.img-5.15.0-051500rc7-generic
update-initramfs: Generating /boot/initrd.img-5.13.0-30-generic
update-initramfs: Generating /boot/initrd.img-5.13.0-28-generic

So it seems the distribution upgrade is preserving my current kernel version as well as a couple of previously installed kernel versions.

Later, the following output:

* dkms: running auto installation service for kernel 5.15.0-18-generic

showed the upgrade was installing a later 5.15 kernel.

Here's how I got involved in the IETF JSONPath Working Group.

A couple of years ago I found myself needing to use JSONPath in a work project, but I couldn't find a spec other than Stefan Gössner's original article. Since the project was in Go, I looked for a Go implementation of JSONPath which documented the syntax and semantics clearly. Since the project was aimed at Kubernetes, I also needed it to accommodate YAML. I ended up specifying and implementing a new Go implementation: yaml-jsonpath.

Towards the end of this effort, I decided there was a need for a standard¹. I settled on the IETF and gathered together a small group of JSONPath implementers (found via Christoph Burgmer's excellent JSONath comparison project). Together we cooked up an initial spec and published it as an internet draft.

Around the same time, Stefan Gössner and IETF veteran Carsten Bormann submitted their own draft based closely on Stefan's original article. We joined forces in a Working Group with the help of James Gruessing, Tim Bray, and others. After agreeing a charter, we merged our drafts, and have been iterating on the spec for about a year.

There are a few loose ends, but we hope to begin the submission process for an RFC before very long. This may be the end of the beginning, rather than the beginning of the end.

Footnote

¹ I'd worked in software standards in the past, such as OSGi and JSR 291, which brought OSGi into the Java Community Process.

You craft a code change with tests, comments, and maybe even some documentation. But when you commit the change, it's tempting to rush. Instead, spare a thought for the humble commit message.

A well-written commit message can be like gold dust later — possibly much later — in the project. It's a chance to record why you made the change, any compromises you made, stuff that still needs to be done, and perhaps alternative approaches you rejected.

Commit messages, sometimes called commit logs, are like documentation which never goes out of date because they relate to a specific commit. They don't clutter up your code. They are there for the asking when you need them,

If you've ever been faced with working out what a piece of code is doing and why it is the way it is, good commit messages can give a unique insight into the coder's mind.

The following links give some ideas and examples for writing better commit messages: * How to Write a Git Commit Message * Conventional Commits * My longest ever commit message

Testing your code is a great way of avoiding bugs, but it's tempting to put off writing tests until you have something which works. At that point, it may be very difficult to test your code, especially to write unit tests, because you may not have written the code with testing in mind. One way to avoid this situation is to use Test Driven Development (TDD).

What is TDD?

TDD is a way of developing code by writing tests before the code to be tested. This way, you can be sure that your code can be tested. Also, you'll know that your tests are valid because they'll start off failing (a test which never fails is useless!).

How do I start doing TDD?

The basic process is easy: 1. Write a failing test 2. Make the test pass, together with any other tests 3. Refactor to make the code clean (described below) 4. Repeat as necessary.

A failing test might not even compile if it's calling some code which hasn't been written yet. That's ok.

This process is sometimes called “red, green, refactor” after the colours that some tools use to flag failing and passing tests.

Unit testing

A unit test tests some code in isolation (from other code, the internet, a database, etc.). Code composed of small modules is easier to test, if each module can be tested in isolation. It's then possible to drive unusual and error paths as well as the happy paths through the code.

The trick is deciding the size of “unit” to test, e.g. it could be a single module or a group of modules. If the unit size is larger, there's more scope for refactoring without needing to change the tests. If it's smaller, then moving code between modules and changing interfaces tends to require test rework.

Refactoring

The term refactoring is used in two different ways. The first kind of refactoring is to start with a piece of code and all its tests passing and then to make essentially arbitrary changes to the code, often in small steps, ensuring that the tests continue to pass.

So, starting with all the tests passing, the process is: 1. Change the code 2. Run the tests again 3. If some fail, fix the code or undo the change until all the tests pass 4. Repeat as necessary.

The second kind of refactoring is to make specific changes to the code which are known to preserve the behaviour. Sometimes this can be assisted by IDEs or editors with automatic “refactorings”, such as “extract method”, “rename variable”, and so forth. After doing this kind of refactoring, it's still worth checking all the tests still pass.

Modifying tests

Strictly speaking, if you modify tests, you should check they still catch the failures they were originally written to catch, but that's a real pain as you would have to temporarily break the code under test to provoke each modified test to fail. But without this, it's theoretically possible to mess up a test change and end up with a test which passes when it shouldn't. An extreme example of this would be to delete the code inside a test, which would obviously then pass, but be useless. The issue is that people might be tempted to hack their test code around on the assumption that running all the tests and seeing them pass is some sort of safety net when, in fact, it isn't.

The best way to avoid modifying tests excessively is by testing larger units. If you do need to modify tests, then doing a series of correctness-preserving refactorings reduces the risk of invalidating a test. But the rule of thumb is to be extra careful when modifying tests.

Do I need to follow TDD strictly?

It's quite a good discipline to follow the strict TDD approach for a while to get the hang of it. But after that, I think it's fine to be a bit more relaxed.

For instance, if you're trying to get a piece of code working, it may be more appropriate to code up a prototype as a “proof of concept” and then go back and develop some code using TDD now that you know roughly how the code will work. This approach is sometimes called “fire, aim, ready” (reversing the well-known phrase “ready, aim, fire”) meaning get something working first, then understand the problem better, then start development proper.

I have also used code coverage tools to make sure most, if not all, of my code is tested. You don't need to hit 100% coverage, but something like 80-90% is clearly a better sign of a good test suite than 20-30%.

Does TDD guarantee my code will be well designed?

No! Don't be caught by the trap of thinking TDD will necessarily give you a great design. It's a good way of avoiding untestable code, but it doesn't guarantee clean, understandable interfaces etc. Using TDD to get a good design is, as Rich Hickey once described, like driving a car along a road and bashing against the crash barriers (“guard rails” if you're American) as a way of getting to where you want to go.

Is TDD worth it on small, personal projects?

I've worked in teams, one doing pretty strict TDD, but on personal projects, I tend to unit test the stuff which I'm most likely to get wrong and with the most conditional logic. (I must confess I don't tend to invest in integration tests for personal projects as it's usually too much bother.)

More information