Proposal: Use Git and Github better
I think it's great that we're using Git and Github as the experimental / unofficial repository at https://github.com/eggert/tz. It's much easier to track change history looking through the commit log and see the changes than by reading through emails with patch attachments. However, we're not currently taking advantage of all that this environment has to offer. -- Item 1 -- We should be making better use of branches. We currently have a single "master" branch that everything gets committed to. This is problematic, because it doesn't separate things that are certain to be released from things that are proposed changes. For example, the recent time.tab file, and the other large-scale proposed changes that are currently being debated, could have been created on feature branches. This would have given the tz list members a place to look at the proposed changes and make additional suggestions (via pull requests) before things are finalized. As it sits today, since everything is in master, if the proposal is ultimately defeated then new commits will have to be made to master to revert these changes. The danger comes if, say we needed to issue an emergency release sometime in between. Since master isn't in a state of positive agreement, then one would have to branch from an earlier point in history to build a hotfix release, then merge that hotfix back to master later. It's much easier if we can just trust that master always consists of things that are certain to be released. See also: https://www.atlassian.com/git/workflows#!workflow-feature-branch http://www.git-scm.com/book/en/Git-Branching-Basic-Branching-and-Merging -- Item 2 -- I think that we should all make better use of forking and pull-requests for submitting proposed changes. Instead of submitting a patch file to the mailing list, one should fork the GitHub repo, make their changes, then create a pull request. This allows for place for discussion on proposals where the code can be referenced much easier. It also ensures that the author of each and every change is tracked in the commit log. And finally, it makes it much clearer which proposals were adopted and which were not. Presently, looking through the mailing list archives, it's quite difficult to tell if any given patch was actually applied or not. -- Item 3 -- We should decide how the GitHub issue tracker fits in to the ecosystem. I see that there have been a few issues reported to via the issue tracker in the past, but most things have come through the mailing list. If we adopt the conventions used by other modern projects, then we should be reporting bugs through the issue tracker so their history can be more easily found. Another benefit is that you can reference issue numbers in commits, and you can reference commits in the comments of an issue. This linking makes it quite easy to find the code or data that was changed in response to an issue. The mailing list should probably be used for extended discussion, rather than as a place to report issues. Though, there may be some blend of both, I personally think that an issue tracker is much more palatable than a mailing list for many of these kinds of things. There should probably be some guidance document on the iana tz page about what goes where. -- Item 4 -- While Paul Eggert is the tz maintainer, and I appreciate his efforts greatly, I personally don't feel that it's appropriate for the github repo to be in his personal "eggert" account. There should instead be a common "organizational account" for the project, such as github.com/tzdb or similar. ("iana" is taken, but appears to be unused or abandonded. Someone may want to inquire about obtaining it, as "github.com/iana/tz" would be quite appropriate IMHO). Though Paul would be the administrator of this account, his own personal account would no longer be authoritative. That also ties back to the idea of pull-requests. Since Paul makes the majority of changes, he would first make them in his own account, and then send a pull-request to the main account. Then a link could be sent to the mailing list for discussion on the pull request before it was merged in. As a side note - I've found that several third-party projects are linking to the unofficial sources using git submodules. While this isn't officially sanctioned, it would be much better if they could link to iana/tz instead of eggert/tz. -- Item 5 -- While code and data often go hand-in-hand, there are quite a lot of projects these days that only rely on the tz data. There are also a lot of releases of code changes that don't require data changes. Having both code and data in a single project seems rather inefficient. I propose that they be split back to separate projects, and maintained in separate github repos (tzdata / tzcode). Also, consider also that perhaps there are too many merged projects just within the code. For example, tzselect, zic, zdump, etc. might be broken out for better visibility of changes and for clarity of dependent files. I look forward to feedback on these items. I'm sure not all will be in agreement, but I think it's important that we look forward to new and better ways to manage this project - rather than just sticking with the ways of the past. -Matt
Were these principles to be adopted, it would be hugely beneficial to the project. There is no reason that I can see why tzdb could not be managed using the best practice git techniques described above. Stephen On 26 July 2014 19:38, Matt Johnson <mj1856@hotmail.com> wrote:
I think it's great that we're using Git and Github as the experimental / unofficial repository at https://github.com/eggert/tz. It's much easier to track change history looking through the commit log and see the changes than by reading through emails with patch attachments. However, we're not currently taking advantage of all that this environment has to offer.
-- Item 1 -- We should be making better use of branches. We currently have a single "master" branch that everything gets committed to. This is problematic, because it doesn't separate things that are certain to be released from things that are proposed changes. For example, the recent time.tab file, and the other large-scale proposed changes that are currently being debated, could have been created on feature branches. This would have given the tz list members a place to look at the proposed changes and make additional suggestions (via pull requests) before things are finalized.
As it sits today, since everything is in master, if the proposal is ultimately defeated then new commits will have to be made to master to revert these changes. The danger comes if, say we needed to issue an emergency release sometime in between. Since master isn't in a state of positive agreement, then one would have to branch from an earlier point in history to build a hotfix release, then merge that hotfix back to master later. It's much easier if we can just trust that master always consists of things that are certain to be released.
See also: https://www.atlassian.com/git/workflows#!workflow-feature-branch http://www.git-scm.com/book/en/Git-Branching-Basic-Branching-and-Merging
-- Item 2 -- I think that we should all make better use of forking and pull-requests for submitting proposed changes. Instead of submitting a patch file to the mailing list, one should fork the GitHub repo, make their changes, then create a pull request. This allows for place for discussion on proposals where the code can be referenced much easier. It also ensures that the author of each and every change is tracked in the commit log. And finally, it makes it much clearer which proposals were adopted and which were not. Presently, looking through the mailing list archives, it's quite difficult to tell if any given patch was actually applied or not.
-- Item 3 -- We should decide how the GitHub issue tracker fits in to the ecosystem. I see that there have been a few issues reported to via the issue tracker in the past, but most things have come through the mailing list. If we adopt the conventions used by other modern projects, then we should be reporting bugs through the issue tracker so their history can be more easily found. Another benefit is that you can reference issue numbers in commits, and you can reference commits in the comments of an issue. This linking makes it quite easy to find the code or data that was changed in response to an issue. The mailing list should probably be used for extended discussion, rather than as a place to report issues. Though, there may be some blend of both, I personally think that an issue tracker is much more palatable than a mailing list for many of these kinds of things. There should probably be some guidance document on the iana tz page about what goes where.
-- Item 4 -- While Paul Eggert is the tz maintainer, and I appreciate his efforts greatly, I personally don't feel that it's appropriate for the github repo to be in his personal "eggert" account. There should instead be a common "organizational account" for the project, such as github.com/tzdb or similar. ("iana" is taken, but appears to be unused or abandonded. Someone may want to inquire about obtaining it, as "github.com/iana/tz" would be quite appropriate IMHO). Though Paul would be the administrator of this account, his own personal account would no longer be authoritative.
That also ties back to the idea of pull-requests. Since Paul makes the majority of changes, he would first make them in his own account, and then send a pull-request to the main account. Then a link could be sent to the mailing list for discussion on the pull request before it was merged in.
As a side note - I've found that several third-party projects are linking to the unofficial sources using git submodules. While this isn't officially sanctioned, it would be much better if they could link to iana/tz instead of eggert/tz.
-- Item 5 -- While code and data often go hand-in-hand, there are quite a lot of projects these days that only rely on the tz data. There are also a lot of releases of code changes that don't require data changes. Having both code and data in a single project seems rather inefficient. I propose that they be split back to separate projects, and maintained in separate github repos (tzdata / tzcode).
Also, consider also that perhaps there are too many merged projects just within the code. For example, tzselect, zic, zdump, etc. might be broken out for better visibility of changes and for clarity of dependent files.
I look forward to feedback on these items. I'm sure not all will be in agreement, but I think it's important that we look forward to new and better ways to manage this project - rather than just sticking with the ways of the past.
-Matt
Hi Matt, At 11:38 26-07-2014, Matt Johnson wrote:
-- Item 4 -- While Paul Eggert is the tz maintainer, and I appreciate his efforts greatly, I personally don't feel that it's appropriate for the github repo to be in his personal "eggert" account. There should instead be a common "organizational account" for the project, such as github.com/tzdb or similar. ("iana" is taken, but appears to be unused or abandonded. Someone may want to inquire about obtaining it, as "github.com/iana/tz" would be quite appropriate IMHO). Though Paul would be the administrator of this account, his own personal account would no longer be authoritative.
[snip]
I look forward to feedback on these items. I'm sure not all will be in agreement, but I think it's important that we look forward to new and better ways to manage this project - rather than just sticking with the ways of the past.
The messages at http://mm.icann.org/pipermail/ianatransition/2014/ might be informative for anyone with an interest in IANA. Item 4 is about formalizing the maintenance. The project loses flexibility when things are too formal. The tz maintainer might have to request approval from some higher authority for changes even though there is agreement on this mailing list. I don't think that Paul Eggert has been sticking with the ways of the past ( http://mm.icann.org/pipermail/tz/2012-July/018126.html ). The next tz maintainer might find some better way to do the work. I think that it is better to avoid single point of failures. Regards, -sm
Thanks, Matt, for getting the ball rolling on this. I had been wrestling for a few days over how to write something similar, but you seem to have touched on most of the major points. No matter how much we say it is merely “experimental,” simply having a public repository available to act as a central and timely source for changes has apparently caused many to see it as “blessed.” This is not inherently a bad thing, but we need to adapt our behavior accordingly. *With regard to Items 1 and 4,* we absolutely should be making use of branches, for all the reasons you mentioned. Currently, most patches are submitted to the list, reviewed, and then applied to master; however, patches authored by Paul are applied directly to master, and those changes are reviewed and defended afterwards. Clearly, this disparity has been an issue recently, and I sense that it is the source of much of the shock some are experiencing over the proposed major changes. Though I personally believe Paul has made his edits in good faith, I can also easily see how the idea that the repository is simply “Paul’s playground” may have developed amongst others. I would add that if the master branch is always in a state of “positive agreement,” as it should be, then it is also always “ready for release.” Every push to master should effectively be a release candidate. For us, this may still mean that obvious and urgent hotfixes to data go straight to master, but this should not be the norm, especially for maintenance tasks. In particular, it means that the proposed zone-linking and introduction of time.tab should be done on separate branches to allow for debate and further refinement of those ideas before this project commits itself to their use. To this effect, I agree that it would be better that the repository not be associated with Paul’s personal account; however, as long as the master branch is given due deference, this concern becomes somewhat less pressing. *With regard to Items 2 and 3,* while I’m all for using what Git has to offer, I’m extremely wary of locking ourselves into Github or any other similar service. Others have written, far better than I could, about how the Github pull request system subverts or otherwise “breaks” <http://laurent.bachelier.name/2012/05/github-kinda-sucks/> the core functionality of Git. If it were up to me, I’d disable pull requests entirely and exclusively use the mailing list, which keeps more complete (albeit messier) archives. Alas, Github does not allow pull requests to be disabled. This may mean that Github is not the right place for our repository. Further, Github’s communication tools on pull requests and issues are simply far less flexible than email. I’d also rather keep our barrier of entry low, so that even someone who knows nothing about Git can still submit well-thought changes to the list. For the simplest patches we see, a pull request is simply too much overhead. Even without using pull requests, though, forks can be very useful for sharing branches. For all but the simplest patches, we can each alert the list when we have a branch ready for consideration, copying the proposed patches to the list for archival, while also providing a link to the branch for easy review. For large changes, one can use this method in conjunction with the flexibility of email to get input from a few more people before presenting a more refined set of changes to the full list. We generally have a low enough throughput that it may still be okay to copy commits over into master once this process is complete, to avoid messy merges. In any case, given how central this system has recently become to our project, we should definitely add clear guidance to both our own documentation and the IANA pages, conveying how we collectively choose to use (or not use) the features of Github or any similar service. Right now, no such guidance exists. Hopefully this discussion acts as a starting point for that. *With regard to Item 5,* I would point out that, often, code changes are necessitated by corresponding changes in data. Breaking apart into separate projects would make these connections far less obvious, and so I would prefer that our work remain unified under a single repository. *I would also like to add Item 6:* Proper use of commits as individual units for review. For us, commits don’t necessarily need to be absolutely minimal; my personal goal is merely that each contains only highly-related changes and is understandable on its own. I tried to obey this principle while developing my recent contributions regarding Russia’s changes. (Believe me, I didn’t write them as four neat little patches from the start!) It is inappropriate to batch together several unrelated changes and push them as a single commit, as this makes reviewing more difficult. A recent example is commit f1ddf32f059c17fa5a1ec24f549d70db36dc5fa9 <https://github.com/eggert/tz/commit/f1ddf32f059c17fa5a1ec24f549d70db36dc5fa9> of 2014-07-15, in which Paul partially reverts his earlier zone-to-link changes, but also adds several fixes and bits of commentary which he discovered while researching the changes. In the case that further reversions become necessary, this makes it very difficult to tease out the “good” from the “bad.” (This is only exacerbated by our non-use of branches.) I will try to model these best practices — especially with regard to Items 2, 3, and 6 — in a separate email I will send shortly, proposing reversions to the current state of the repository so that we can hopefully adopt these (or similar) best practices from there. -- Tim Parenti On 26 July 2014 14:38, Matt Johnson <mj1856@hotmail.com> wrote:
I think it's great that we're using Git and Github as the experimental / unofficial repository at https://github.com/eggert/tz. It's much easier to track change history looking through the commit log and see the changes than by reading through emails with patch attachments. However, we're not currently taking advantage of all that this environment has to offer.
-- Item 1 -- We should be making better use of branches. We currently have a single "master" branch that everything gets committed to. This is problematic, because it doesn't separate things that are certain to be released from things that are proposed changes. For example, the recent time.tab file, and the other large-scale proposed changes that are currently being debated, could have been created on feature branches. This would have given the tz list members a place to look at the proposed changes and make additional suggestions (via pull requests) before things are finalized.
As it sits today, since everything is in master, if the proposal is ultimately defeated then new commits will have to be made to master to revert these changes. The danger comes if, say we needed to issue an emergency release sometime in between. Since master isn't in a state of positive agreement, then one would have to branch from an earlier point in history to build a hotfix release, then merge that hotfix back to master later. It's much easier if we can just trust that master always consists of things that are certain to be released.
See also: https://www.atlassian.com/git/workflows#!workflow-feature-branch http://www.git-scm.com/book/en/Git-Branching-Basic-Branching-and-Merging
-- Item 2 -- I think that we should all make better use of forking and pull-requests for submitting proposed changes. Instead of submitting a patch file to the mailing list, one should fork the GitHub repo, make their changes, then create a pull request. This allows for place for discussion on proposals where the code can be referenced much easier. It also ensures that the author of each and every change is tracked in the commit log. And finally, it makes it much clearer which proposals were adopted and which were not. Presently, looking through the mailing list archives, it's quite difficult to tell if any given patch was actually applied or not.
-- Item 3 -- We should decide how the GitHub issue tracker fits in to the ecosystem. I see that there have been a few issues reported to via the issue tracker in the past, but most things have come through the mailing list. If we adopt the conventions used by other modern projects, then we should be reporting bugs through the issue tracker so their history can be more easily found. Another benefit is that you can reference issue numbers in commits, and you can reference commits in the comments of an issue. This linking makes it quite easy to find the code or data that was changed in response to an issue. The mailing list should probably be used for extended discussion, rather than as a place to report issues. Though, there may be some blend of both, I personally think that an issue tracker is much more palatable than a mailing list for many of these kinds of things. There should probably be some guidance document on the iana tz page about what goes where.
-- Item 4 -- While Paul Eggert is the tz maintainer, and I appreciate his efforts greatly, I personally don't feel that it's appropriate for the github repo to be in his personal "eggert" account. There should instead be a common "organizational account" for the project, such as github.com/tzdb or similar. ("iana" is taken, but appears to be unused or abandonded. Someone may want to inquire about obtaining it, as "github.com/iana/tz" would be quite appropriate IMHO). Though Paul would be the administrator of this account, his own personal account would no longer be authoritative.
That also ties back to the idea of pull-requests. Since Paul makes the majority of changes, he would first make them in his own account, and then send a pull-request to the main account. Then a link could be sent to the mailing list for discussion on the pull request before it was merged in.
As a side note - I've found that several third-party projects are linking to the unofficial sources using git submodules. While this isn't officially sanctioned, it would be much better if they could link to iana/tz instead of eggert/tz.
-- Item 5 -- While code and data often go hand-in-hand, there are quite a lot of projects these days that only rely on the tz data. There are also a lot of releases of code changes that don't require data changes. Having both code and data in a single project seems rather inefficient. I propose that they be split back to separate projects, and maintained in separate github repos (tzdata / tzcode).
Also, consider also that perhaps there are too many merged projects just within the code. For example, tzselect, zic, zdump, etc. might be broken out for better visibility of changes and for clarity of dependent files.
I look forward to feedback on these items. I'm sure not all will be in agreement, but I think it's important that we look forward to new and better ways to manage this project - rather than just sticking with the ways of the past.
-Matt
Matt Johnson <mj1856@hotmail.com> wrote:
I think that we should all make better use of forking and pull-requests for submitting proposed changes. Instead of submitting a patch file to the mailing list, one should fork the GitHub repo, make their changes, then create a pull request.
No, I think all patches should be reviewed and discussed on the mailing list, since it is archived by IANA, and it is where RFC 6557 says discussion should occur.
It also ensures that the author of each and every change is tracked in the commit log.
This is also true for patches posted to the mailing list. Pull requests have the disadvantage that they can make revising proposals harder, and they can make it more inconvenient for a maintainer to do small fix-ups when required. http://blog.spreedly.com/2014/06/24/merge-pull-request-considered-harmful/
We should decide how the GitHub issue tracker fits in to the ecosystem. I see that there have been a few issues reported to via the issue tracker in the past, but most things have come through the mailing list.
The mailing list is the correct place to report problems.
Also, consider also that perhaps there are too many merged projects just within the code. For example, tzselect, zic, zdump, etc. might be broken out for better visibility of changes and for clarity of dependent files.
I think this would create more work than it saves. The tz repo is pretty small. Tony. -- f.anthony.n.finch <dot@dotat.at> http://dotat.at/ Fisher, North German Bight: Northerly or northwesterly 3 increasing 4 or 5. Slight, becoming slight or moderate. Fair. Moderate or good.
On Mon, Jul 28, 2014 at 10:53 AM, Tony Finch <dot@dotat.at> wrote:
I think that we should all make better use of forking and pull-requests for submitting proposed changes. Instead of submitting a patch file to the mailing list, one should fork the GitHub repo, make their changes, then create a pull request.
No, I think all patches should be reviewed and discussed on the mailing list, since it is archived by IANA, and it is where RFC 6557 says discussion should occur.
I know some projects have their pull request stuff sent to the mailing list email, so it might be possible to use pull requests this way and still be able to follow discussion via the mailing list. I guess posting back via email should also be possible, not sure how much harder that would be to realize. Cheers, Dirkjan
Dirkjan Ochtman wrote:
I know some projects have their pull request stuff sent to the mailing list email, so it might be possible to use pull requests this way and still be able to follow discussion via the mailing list.
I don't encourage pull requests on Github, as the mailing list is the primary way of discussing proposed changes. On the few occasions where people have made pull requests anyway, I've tried to migrate discussion of the nontrivial changes to the mailing list. Nowadays it's a bit more convenient for me if emailed patches are generated via "git format-patch" or "git send-email" but this is not required. More generally, I'd rather not formally require a lot of Git- or Github-specific features in tz maintenance. There are advantages to having multiple branches, pull requests, etc., but there are also disadvantages and it's not clear that the benefits would outweigh the costs. Although Github is a convenient repository, other repositories are also convenient and Github itself may be superseded some day. And although I prefer Git, the next maintainer may prefer something else. As long as we can talk about changes via patches, pretty much any version-control system will do. Finally, I'd rather keep the experimental repository informal. It's just my personal list of changes that I'm thinking of putting into the next release. It's intended to be releasable at any time (though a release is usually not urgent and we like to put things off :-). Obviously I'll make mistakes sometimes, just as releases themselves sometimes contain mistakes. But these can be fixed.
participants (7)
-
Dirkjan Ochtman -
Matt Johnson -
Paul Eggert -
SM -
Stephen Colebourne -
Tim Parenti -
Tony Finch