http://invisible-island.net/personal/
Copyright © 2009–2020,2023 by Thomas E. Dickey


Change-Logs...

Introduction

This is an overview of the guidelines which I use in maintaining change-logs and similar information for computer programs.

Background

One of the things that the maintainer does (or used to) is to keep the change-log up-to-date. Though I have been developing software for some time, it wasn't until 1992 that a combination of circumstances (declining in-house development opportunities, and the Internet) prompted me to provide fixes for "free" software.

By 1994, I had contributed changes to about 65 programs. In that process, I had of course encountered various personalities. But the worst of those were simply slow to incorporate the changes.

Starting in 1994, I arranged to have the programs which I had been developing for my personal use excluded from my employee agreement. These included ded (the motivation for the resizeterm function), vile and tin as well as related programs. One of the related programs was ncurses.

The case with ncurses was ... different. Rather than a single developer, there were two. And they used a mailing list, unlike most. The nominal maintainer was Zeyd Ben-Halim, who was rather nonresponsive. The result of submitting patches was not good—it seems that they intended to copyright everything for themselves. That's a workable situation if they wrote everything themselves. They did not.

For instance, they incorporated Juergen Pfeifer's libraries in 1995, which greatly increased the size of ncurses. After incorporating it added 11,183 lines of code (pcurses was just under 10,000 lines of code before it became ncurses). In 1.9.7a, Juergen's name appeared in 3 places in those libraries (two pro-forma README's and one comment in a makefile noting that optimization did not work properly). Zeyd and Eric's copyright notice appeared in the same files 36 places.

The NEWS file notes:

* integrated Juergen Pfeifer's forms library.
* integrated Juergen Pfeifer's menu code into the distribution.

I noticed that patches were sent to the mailing list (including my own) and that the NEWS file would include the change, but not mention the contributor. My name appears in the NEWS file twice, as well, for that time period, though—as I pointed out later—I had done about half of the work. Not all of my changes were mentioned, and most of them were unattributed. The casual reader would assume that Eric and Zeyd did almost all of the work.

Zeyd, being the nominal maintainer, appears to have done most of the edits to NEWS. However ESR also sent changes to the mailing list incorporating changes from others without mentioning this in his announcements.

After I stopped sending patches to Eric and Zeyd in April 1996 (and providing ncurses, myself), I resolved to maintain the NEWS file with attribution for each contributor. That's the way we were doing it in vile and tin, for example. Philippe De Muyter suggested that I also note who reported the problem to be fixed as well. I did that. I began doing that a few weeks later, in late April.

Keep it Simple

Of course you're keeping your project in some type of revision control system. You can extract that information with various tools and render it as a change-log. Any idiot can do that.

Unfortunately, many change-logs are automatically generated, and indeed appear to have been generated by "any idiot".

Just the Facts

What is missing in many automatically-generated change-logs is the information which is typically not supplied by developers:

One advantage of automatically-generated change-logs is that it is possible to get the dates on which changes were made. Not all automatically-generated logs show this, but it is a strong possibility.

Whether or not the change-logs are automatically-generated, there is an additional problem if changes are collected and applied by a project maintainer—recording the contributors consistently.

Dates, of course

Change-logs should have dates, to establish when a change was made.

Developers who do not supply dates on their changelogs have been known to “fix” problems with a release without noting the fact. Besides that nuisance, developers who omit dates tend to be sloppy about facts in other ways.

Contribution Categories

There are of course changes by primary contributors.

patch by

The patch is usable without rework required.

Often, for conciseness, the "patch by" is left out and only the name of the contributor given. They are equivalent.

As a rule, if I am applying a contributor's patch which (aside from formatting details) works properly, I use the rcs "-w" option to mark that revision as originating from that person. It is rare that patches good enough for this come from completely anonymous developers, so an appropriate string is seldom lacking.

Most patches require rework or adaptation.

integrated patch

The patch requires work, e.g,. it is not ifdef'd as required for all optional features.

adapted from patch

The patch has some logic flaw, requires modification to build and work.

analysis by...

Someone told how to go about fixing the problem, or else they provided a detailed enough report that the solution was apparent to the developer. This may/may not be the same person who reported the problem.

discussion with...

A discussion with someone brought out an idea, but it is unclear who was the source.

prompted by discussion with...

Talking to someone prompted me to realize a bug or solution. Without their input, the idea/fix would not have been apparent.

Occasionally their report and discussion is completely incorrect, but the "prompt" was useful. This does not apply to hostile or untruthful contributors of course.

In some cases, someone provides a suggested patch, but if it is unsuitable beyond illustrating the problem which was being discussed, then the changelog may read “prompted by patch...” while the actual implementation is different.

reported by...

Someone reported the problem, but did not provide the solution. That is, most people would not regard these as contributors, but a source of information which has to be investigated. When computing metrics, I do not count these, nor the closely related "prompted by", etc.

These categories are oriented toward direct communication with the program's maintainers. Accounting for indirect contributions is not as straightforward.

Problems in Categorization

There are a few basic problems to address:

Bug-tracking systems

Bug-tracking systems are a major source of indirect contributions.

If all of the report is within the bug-tracking system, and there is no analysis by other people, nor proposed (useful) fixes, then I will cite only the bug-tracking system and its number for the bug.

On the other hand, if there are useful direct contributions toward the solution (reports without analysis are indirect), then I will cite those individuals in addition to the bug-tracking information.

Updates of Bundled Sources

A few files (such as config.guess and config.sub) are maintained by other developers. The changelog for these says "updated", and if the origin is volatile (the config.* scripts are a good example of this) or relatively obscure, says where it was found. Read their changelog for credits.

Hostile/untruthful contributors

Bear in mind that I am not a public service.

I get some reports indirectly, via web-searches in various forums. Some of the comments are useful, others partly (because they point out details for an issue). However, it is not uncommon for those to be mixed in with secondhand comments. As is usual with hearsay, much of it is inaccurate, and much of the repetition in public forums is not intended to be constructive commentary.

Still, an occasional comment is useful.

Of course, in this case, I will categorize it as "adapted from", etc., noting that it makes it automatically an indirect contribution rather than a direct contribution.

If the information is from a discussion between different individuals, none of whom appears to be knowledgeable about the issue, I will simply cite the group where the information was given.

People who attempt to use bug-reporting systems as a soapbox fall into this category, of course. For those unfamiliar with the term, this refers to a variety of misbehavior, including:

As a caveat, not all “bug-tracking” systems are equal. Granted, bug-reports are not always welcome. But the bug-tracking system has to be reliable:

The issues-tracking systems provided with github and gitlab (writing this in May, 2019) are not reliable because changes to comments are not visible to others. In some cases, the project maintainers can (and some do) readily delete and modify comments to adjust a story to their advantage.

Anonymous contributors

Anonymous reports are not uncommon. Useful fixes from anonymous people are much less common (see discussion). When considering these, there are several factors to consider:

For the pen names, I cite the actual (or apparent) name.

On occasion I get suggested fixes which are neither from a readily identified person, or fit into the design of whatever program is being discussed. For those, I may adapt the change.

Anonymous or “not” I do not use bug reports containing information from Wikipedia:

Other problematic contributors

In general, we would assume that developers submit their own work. This is not always true.

When reviewing a change, I do take the time to scrutinize it, attempt to determine a proper attribution for the change. It happens that I may notice (or recall, if I am subscribed to a given mailing list) that the change was originally developed by a different individual. In that case, I will amend the description to cite the actual developer. If the code has a comment citing the developer, that suffices, though even that has been a matter of dispute on occasion, when the intermediary insists on sharing the credit.

Individuals who do this repeatedly (there are a few) will either be banned, or subject to scrutiny on every change. In either case, they generally go away and provide their services to a different project. Rather than leave, some of these use the public bug-tracking systems as a forum.

Other forums are problematic. Although the site technically has a policy (which confuses copyright infringement with plagiarism), StackExchange for instance promotes plagiarism, with people copying answers from each other as well as from unspecified sources, just to get points in its “reputation” ranking (some high-ranking individuals have copied their answers from my FAQs or documentation). Still others are known to cheat by various schemes of voting for themselves. The answers are very rarely useful for development, but some questions are essentially bug reports. I cite those according to the other guidelines mentioned above.

Examples

Not all of the change-logs are in the same textual format. I wrote a script which handles the most common cases, and have massaged some change-logs to follow the format which it recognizes, to collect information about contributors. Essentially, it reads the text, looking for the markers which I use to denote direct- and indirect-contributions, and gives totals and names for the direct contributions.

For some (lynx and vile) I have not reformatted the older change-logs. In those cases, the dates below correspond to the beginning of the change-logs that I have reformatted.

Here is a list (as/of May 2010) of the change-logs for which I have useful metrics, noting the percentage for my own contributions, and the number of other contributors (disregarding "external", since there is no active involvement).

Program Percent Other Date
diffstat 81 12 June 1994
xterm 83 150 January 1996
ncurses 76 176 April 1996
vttest 96 3 June 1996
lynx 45 136 February 1997
vile 76 36 November 1999
dialog 78 64 December 1997
cdk 85 24 May 1999
byacc 97 4 February 2002
luit 91 0 August 2006
mawk 73 6 September 2008

Other changelogs

I use rcs2log for a few programs (ded, (byacc, autoconf macros, etc), which did not have a history of other contributors, and/or which are very stable.

The number of changes shown by rcs2log is different from the conventional change-logs:

Here is a more complete table, from May 2017 which shows both sets of data where applicable (and excluding programs such as ded which have no other contributors):

Program Log started Manually edited rcs2log generated
Changes By me Percent Others Changes By me Percent Others
diffstat 1994/06 135 101 74.8% 17 584 566 96.9% 10
cproto 1994/08 153 139 90.8% 3 935 889 95.1% 4
xterm 1996/01 2271 1897 83.5% 185 12106 11848 97.9% 89
ncurses 1996/04 5045 3926 77.8% 235 19483 18365 94.3% 175
vttest 1996/06 215 203 94.4% 5 1338 1333 99.6% 3
bcpp 1996/10 175 163 93.1% 9 496 495 99.8% 1
lynx 1997/01 3565 1874 52.6% 135 4323 4185 96.8% 48
dialog 1997/12 830 645 77.7% 81 4068 3974 97.7% 42
cdk 1999/05 470 372 79.1% 38 2334 2261 96.9% 24
vile 1999/11 2889 2717 94.0% 48 15140 14008 92.5% 51
cdk-perl 2001/01 39 36 92.3% 2 189 185 97.9% 3
byacc 2002/02 N/A N/A N/A N/A 826 737 89.2% 13
luit 2006/08 114 103 90.4% 2 960 957 99.7% 3
mawk 2008/09 234 179 76.5% 10 1534 1485 96.8% 7

Other metrics

There are other ways to measure contributions. Not all of them work as well as inspecting the change-log.

For instance, the Orbiten survey several years ago ignored the change-logs and RCS identifiers in my projects, and credited virtually all of my work to other people. Some of those credited were never contributors. Rather, Orbiten noted the mention of various individuals and organizations in README's and comments, and credited them with the entire work.

Other people have pointed out that Orbiten also did not factor out programs such as libtool, which are bundled with other programs.

Any metric requires inspection and tuning to validate the results. Lacking that step, the metric is worthless.

Reciprocity

Unsurprisingly enough, my change-logs cite contributions from people who also maintain change-logs. They do not necessarily reciprocate, e.g., some developers who borrow from my work. I don't work with those people.