underlap

Blog moved

December 12, 2023

This blog has moved to https://underlap.org/ as user @glyn@underlap.org.

I appreciate being welcome here!

I haven't yet worked out if there is a way to transfer the followers of this blog across, although I have copied all the posts. If you're following this blog, you may care to follow the new blog too.

JSONPath RFC Nearing Publication

November 26, 2023

Early last year, I wrote about how the JSONPath Internet Draft got started. Since then draft 21 was approved for publication and, after final editing, will be published as an RFC, probably in early 2024.

Meanwhile there has been considerable progress in implementing the draft, with some notable gaps that you may be interested in helping to plug.

Implementations

I started a Rust implementation that I hoped could be used as a Reference Implementation. I got distracted by spending much more time editing the draft than I anticipated, but I also ran into trouble because the lack of documentation of the pest parser makes the parsing code laborious to implement. An advantage, indeed the main point, of using the pest parser is that the parsing expression grammar (PEG) is cleanly separated from the implementation and is therefore easier to compare to the ABNF in the spec than, for example, a hand-crafted parser. If you are interested in taking over this implementation, please email me using link below.

Greg Dennis, who was involved in the JSONPath Working Group, implemented the whole spec in C#. The code is available in his json-everything repository. You can take this for a spin on the online playground.

Carsten Bormann, one of the editors of the spec, wrote a Ruby implementation. This uses his abnftt tool to generate a PEG parser from the ABNF in the spec. Given that we tried to keep the ABNF “PEG-compatible” by ordering choices suitably, the parser is highly likely to match the spec. For usage information, issue:

$ gem install jpt
$ jpt --help

Trevor Hilton implemented the whole spec in Rust using the nom parser combinator library. See serde_json_path for the code or take it for a spin using the online sandbox.

All these implementations have been tested against the Compliance Test Suite.

Compliance Tests

There is an incomplete Compliance Test Suite. The most notable omission from my perspective is testing for non-deterministic behaviour. Currently, all the tests in the CTS produce deterministic results. So any valid implementation should pass the CTS. However, the non-deterministic aspects of the spec (which stem from the unordered nature of JSON objects) are not tested by the CTS.

Regular Expression Support

Since JSONPath supports regular expression matching in filters, the working group produced an interoperable regular expression spec that defines a subset of XML Schema regular expressions that is also a subset of most other regular expression flavours (“I-Regexp”). This was published as RFC 9485.

Carsten wrote a Ruby implementation again based on abnftt. The code is available on github. For usage information, issue:

$ gem install iregexp
$ iregexp --help

This is a reference implementation only insofar as it can be used to express iregexp regular expressions in other popular flavours such as PCRE or JavaScript regular expressions. The implementation repository contains some basic tests, but a full Compliance Test Suite has not been developed. Perhaps a CTS could be based on the XML Schema Test Suite.

Feedback

Reply via email

A quintillion transistors every six months?

March 25, 2023

TSMC is a Taiwanese semiconductor producer. According to a recent article in Wired: 'Every six months, just one of [TSMC's] 13 fabrication plants “carves and etches” a quintillion transistors just for Apple.'

But according to TSMC's own blog in August 2020: “Since [April 2018], we have manufactured 7nm chips for well over 100 products from dozens of customers. It is enough silicon to cover more than 13 Manhattan city blocks, and with more than a billion transistors per chip, this is true Exa level, or over one quintillion 7nm transistors.”

Did Wired really get their figure wrong by at least a factor of 5? There seems to be confusion over the timescale as well as over which customers were involved. Or have TSMC stepped up their productivity massively in the last three years?

The Wired article also makes this eye-popping claim: “the semiconductor industry churns out more objects in a year than have ever been produced in all the other factories in all the other industries in the history of the world.” This I can believe.

Blog posts: historical documents or ephemera?

January 3, 2023

An article Bring Back Personal Blogging¹ on The Verge makes this provocative statement:

Personal stories on personal blogs are historical documents when you think about it. They are primary sources in the annals of history, and when people look back to see what happened during this time in our lives, do you want The New York Times or Washington Post telling your story, or do you want the story told in your own words?

I have no illusions that my own writing has any historical interest, but others have pointed out that we can't tell what's of historical interest until a couple of hundred years have elapsed.

With that in mind, it seems most blogs are ephemeral. If the content staying available depends on some company, the company may well cease to exist. If the content is on my personal site, possibly under a rented domain name, then when I stop paying for the site or domain name, the content will cease to exist.

I guess some popular blogs may make it into the internet archive, for posterity, but they are not likely to include mine.

Reply via email

Postscript

James pointed out that some of my posts are already archived and it's easy to archive more.

Footnote: 1. via Start a F***ing Blog via Does a Blog Need to Integrate

Bring Back Blogging

January 1, 2023

I'm sympathetic to the idea behind Bring Back Blogging, even though I find there's more inertia to writing a blog post than posting, say, on Mastodon.

But it's tricky to know what to do about hosting.

I potentially could host my own blog, but then I'd have the costs associated with a hosting service and renting a domain name. I'd be responsible for regularly upgrading the operating system and blogging software to avoid security exposures. If I wanted to split the cost of hosting with others, I'd have to provide them with some kind of support. Also, when I eventually stop hosting, my posts and those of anyone else sharing the service would cease to exist, so any important posts would need to be moved elsewhere first. Finally, if I hosted my own blog, that could be the thin end of the wedge and I'd be tempted to host my own Mastodon instance, etc.

The alternative to hosting my own blog is to use a commercial blogging site such as Blogger (which I used regularly over seven years ago), Medium, or WordPress. But I find the commercial aspect of these a little distasteful. Unless I paid to use them, and possibly even if I did pay, my writing would be exploited by these platforms by subjecting my readers to advertisements, promotions, or other visual clutter.

For now, I'll stick with wordsmith.social and try to find out who pays for it and whether I can contribute to their costs.

Reply via email

Tidying up git branches

November 14, 2022

It's easy to tidy up branches which have been deleted on a remote using the prune switch on git fetch, e.g.:

git fetch --all --prune

but deleting the corresponding local branches is less convenient.

Until today, I used git branch to spot local branches and then:

git branch -d branch-name

to delete them, one at a time.

But today I found this solution on stackoverflow which deletes all local branches that have been merged into main:

git branch --no-contains main --merged main | xargs git branch -d

Approaches to concurrent programming

April 22, 2022

A recent thread on mastodon got me thinking about my experience with concurrent programming. Here's a thumbnail sketch of the approaches I've tried.

Compare-and-swappery

Some of the first concurrent code I wrote involved managing multiple threads of execution using compare and swap instructions. This was hard work, but I wasn't tempted to try anything particularly complex because it was just too hard to reason about. This was before the days of unit testing¹, so developers were used to spending a lot of time thinking about the correctness of code before attempting to run it.

Model checking

One way of reasoning about concurrent code was to model the behaviour in CSP and then use a model checker like FDR to check various properties. Unfortunately, even relatively simple concurrent code took quite a bit of effort to model in CSP. Also, model checking, even with FDR's amazing “compressions”, tended to take too long unless the state space could be kept manageable. So with this approach I again tended to spend a lot of time thinking, this time about how to structure the CSP model to keep model-checking tractable. The result was I only produced one or two limited CSP models.

I would say the main benefit of CSP modelling is that it makes you aware of the main types of concurrency bugs: deadlock (where all or part of the system seizes up permanently), livelock (where the system gets into some kind of unending, repetitive behaviour), and more general kinds of divergence (e.g. where the system spends its time “chattering” internally without making useful progress).

Memory models

Java has various low-level locking mechanisms for managing concurrency. The Java memory model gives a good framework for reasoning about concurrent code in Java. Again the emphasis was on reasoning and it was hard work, but at least there was the sense that it was well founded.

Channels and goroutines

I've used Go a lot and would say goroutines (similar to lightweight threads, sometimes called “Green threads”) and channels are deceptively simple. The principle is that you can safely write to, and read from, a channel in distinct goroutines. It's easy to build concurrent systems that work most of the time, although it's hard to be sure they are bug free. But at least you're better off than in a language which only provides low-level mutexes and such like.

Language support

Rust guarantees safe access to shared data at compile time. The main difficulty is getting used to the constraints imposed by this model and then designing your code appropriately. That said, I haven't written much concurrent code in Rust, so I'll simply defer to the book.

Reply via email

Footnote 1: When I did write the occasional unit test, I also deleted it to avoid having to maintain it!

Managing scratch files in a git project

April 16, 2022

You can avoid checking in certain files in a git project directory by using a .gitignore file.

Then there are files which you don't want to check in on any project directory, such as editor/IDE configuration files. Instead of “polluting” .gitinore files with such entries, it's better to set up a global .gitignore file:

$ touch ~/.gitignore
$ git config --global core.excludesfile ~/.gitignore

Here's the contents of mine:

*~
.DS_Store
.idea
*.iml
\#*#
*.hsp
*.sav
*.scpt
/scratch/
.vscode/
coverage.out
.Guardfile
.config.ru

(You can see some of my history there: macOS, IntelliJ, VS Code, Go, etc.)

But notice this entry:

/scratch/

This means that any directory in the project named scratch will be ignored, along with its contents.

Having such a git scratch directory turns out to be really handy:

The files are visible to your editor/IDE.
It's easier to remember where you put such files compared to storing them outside the project directory.
You can even nest directories in the scratch directory.
If you finish with the project and delete the project directory, the files are cleaned up too.

A recent example is from a Rust project:

$ cargo expand > scratch/generated.rs

This puts the generated code in a file which my editor will recognise as Rust code, and will display with syntax highlighting. I definitely didn't want to check that file in!

Other files which are suitable for the scratch directory are:

Hacky tests or fixtures I'm too embarrassed to check in.
Dependencies I don't control, but which I need to modify, e.g. for debugging.
TODO lists and other rough notes.
Output files from static analysis or code coverage.
Old versions of code files from the project which I want to refer to quickly.
Old project executables for comparing with the current behaviour.
Downloaded PDF manuals relating to the project.

I'm sure you'll find many other uses for git scratch directories.

Reply via email

Micro-commits

April 14, 2022

A micro-commit is a commit of a small/incremental code change, which ideally also passes the tests.

Micro-commits combine nicely with TDD as follows: 1. Write a failing test 2. Make all the tests pass 3. Commit 4. Refactor to make the code clean 5. Commit

Why commit so often?

Mark Seeman's recent stackoverflow blog “Use Git tactically” uses one of my favourite analogies to coding: rock climbing. The basic idea of micro-commits is to work with small, safe changes and commit after each one, much as rock climbers regularly secure their ropes to the rock face.

If something goes wrong, there's a smaller distance to fall. In coding, something going wrong, such as a regression creeping into the code, can be recovered from by returning to an earlier commit without losing much work in the process.

Other benefits

Using micro-commits reduces the stress of coding. Not only do you know you're not going to lose much work if you mess things up, but by focussing on one small step at a time, you don't have to keep a lot of information in your head.

Micro-commits also help you stay focussed on making just one change at a time. To avoid falling into the “While You’re At It” trap, described in Tim Ottinger's blog “What's this about Micro-commits?”, keep a TODO list, either in an issue or, for short-lived items, on a piece of paper.

Squash or merge?

After completing a fix, feature, or other pieces of work, it's important to decide what to do with all the micro-commits.

One approach is to squash them into one or more substantial commits. A downside of squashing is that there is then no record of the sequence of micro-commits, which could come in handy (e.g. for bisecting out a bad change) if some problem has crept in which didn't show up while running the unit tests. An advantage of squashing is that people reading the commit history can then see the wood for the trees.

Another approach is to merge the changes with a commit message that summarises the changes. It's hard to see the wood for the trees in this approach when reading a linear series of commit messages (such as produced by git log), but tools which show the commit structure make it easy to pick out the merge commits.

Commit messages

Commit messages for micro-commits are likely to be brief, but don't forget to reference any associated issue(s). You may want to treat the commit messages like a discussion thread and drop in the occasional comment on the overall design when it occurs to you rather than forget to include it when you finally squash or merge.

On formalism in specification

March 26, 2022

My favourite “morning paper” is “On formalism in specification”. Studying Bertrand Meyer's original paper, trying to avoid its “seven sins of specification”, and reading Strunk and White's “The Elements of Style” improved my technical writing enormously.

The first comment on the morning paper (by a certain David Parnas) raises an interesting question of whether there can be a “truly readable mathematical specification”. Sadly, I believe the answer is “no”, given many developers' difficulties with mathematics.

However, that hasn't stopped me writing mathematical specifications from time to time, often as a way of getting a basic understanding of an area of software before starting development. Two of which I am proud are “Image Registries” (download PDF) which nails down some of the basic terminology surrounding Docker and OCI registries and “OCI Image Format” (download PDF). The style of interspersing English and mathematics in these specifications might even make them readable by those who find mathematics off-putting.