Government Datasets or APIs

Commonwealth Legislation API

It would be great if Australian legislation were available in a machine-readable format.

There are many projects available for scraping and parsing what's out there in HTML and PDF format, but it is insufficient for a modern, vibrant, digital democracy.

I propose an API that would provide access to Bills and Acts in a machine readable format such as XML, such that we could do things such as:

* Compare different versions of Acts side by side in a diff-like manner

* Transpose proposed Bills on top of the relevant Acts in a manner similar to diff to see the result prior to debate and acceptance

* Allow for timely dissemination of legislation in a manner befitting the web, such as being able to link to specific sections or paragraphs

* A git repository could be created that allows anyone to fork and modify legislation and make actual tangible proposals to fix legislation

These are just a few of the benefits that could be provided by an API for Commonwealth legislation. I'm sure others can think of even better things they could do with it.



Submitted by

Stage: In Review

Feedback Score

15 votes

Idea Details

Vote Activity

  1. Upvoted
  2. Upvoted
  3. Upvoted
  4. Upvoted
  5. Upvoted
  6. Upvoted
  7. Upvoted
  8. Upvoted
  9. Upvoted
  10. Upvoted
  11. Upvoted
  12. Upvoted
  13. Upvoted
  14. Upvoted
  15. Upvoted

Similar Ideas [ 4 ]


  1. Status Changed from Active to In Review
  2. The idea was posted


  1. Comment
    Garry Brooke

    OPC may have already done something on this. There are good facilities for finding legislation on Comlaw. OPC also have to ensure that everything is 100% accurate so, they may have good tight procedures.

    A facility to apply proposed amendments to existing legislation would be useful as the full import of changes is often not clear until seen in context.

    Comments on this comment

    1. Comment
      Andrew Donnellan

      I'm not sure what it's like right now, but I had to use ComLaw data for a project back in 2012, and the best I could do was scrape the Microsoft Word-generated HTML version of the legislation. Some hand inspection showed me that you could figure out where headings/titles are by looking at class attributes that corresponded to MS Word styles. It was rather painful.

  2. Comment
    Community Member

    I think New Zealand has already done this:

    See example use of NZ data at:

    (using the semantic markup "def-term" to extract all definitions in legislation).

    The US Government Printing Office (GPO) appears to also have legislation in XML form, including semantically-marked up references (at least in some of the places where the text of a provision refers to the US Code). E.g.

    The US GPO system also comes with an XML stylesheet (XSLT) to generate human-readable HTML from each bill.

    When it comes to Commonwealth legislation, Comlaw and appear to currently provide access to the original data (i.e. the .doc files that the drafters draft).

    Austlii ( has done a lot of work on scraping the .doc files that the Commonwealth and state/territory drafters produce, some of which is described in their Technical Library. The Austlii computer understands the legislation to the extent that it can insert hyperlinks to section references (e.g. the text "as per section 97(2) of the B Act" in the A Act will be a hyperlink to section 97 of the B Act) when displaying legislation. If I understand correctly, this is achieved via computerised guessing/natual language processing.

    The Commonwealth Office of Parliamentary Counsel puts out "Word Notes" that describe the styles they use to mark up their .doc files. The benefits of an idealised XML-based system over .doc in this context are that:

    (1) an XML schema would be machine-readable and more-or-less generate its own scraper, whereas Word Notes are human-readable and need a manual scraper, and

    (2) an idealised XML-based system has semantic markup rather than structural markup, so we are told explicitly that sub-section (a) is a sub-section of section (1), not having to guess on the basis that one follows immediately after the other in the file.

    Legislation is fundamentally free text. A member can read out whatever they want the law to be and, if both chambers agree, that's it, tags or no tags. However, all members of Federal Parliament seem to rely on OPC to do their drafting for them, and OPC has a very standardised way of writing things and marking them up with MS Word styles. Therefore you can scrape it.

    You could get the OPC to generate structurally- or semantically-marked up XML instead of .doc. It puts the burden on the OPC drafters who are writing legislation every day, and takes the burden off the scrapers who have to write a scraper once.

    OPC and other drafters could assist greatly by semantically marking up all references. For example, instead of their current practice of writing "Repeal section 5" as text, put in some tags (whether .doc styles to make it easier for the drafters, or XML to make it easier for the scrapers) to refer unambiguously to the section 5 they mean to refer to. New Zealand legislation XML does this (e.g. "<extref>" tag).

    It would be useful to be able to distinguish:

    (a) "repeal section 3 and replace it with amended wordings" from

    (b) "repeal section 3 and replace it with a totally unrelated provision but re-use the section number".

    These two actions (a) and (b) appear the same to me. I could be missing something in the standardised wording that drafters use, which would allow me to scrape this information from the amending Act; I would be interested to hear if that is the case. (In an idealised XML-based system, you would have something like a separate persistent unique identifiers for each of the subsection and the version of the subsection, rather than the scraper having to interpret the amending words used. Perhaps the New Zealand system does this; I would be interested to hear if anyone knows this.)

    Of course, asking the government to publish existing data is one thing; asking the government to acquire a new computer system and change its drafters' workflow to generate additional metadata is another thing, so I understand if there is a line the government wants to draw somewhere there. (If I understand correctly, New Zealand does write all its legislation in XML.)

    For example, OPC could start adding semantic markup to the references in their text without a whole new system. They can do it in MS Word. Yes, you have to click some more to add hyperlinks every time you write "as per subsection (3)" or use a defined term, but you are adding useful machine-readable information for today and for posterity.

    Comlaw has some very useful features in linking an Act to its originating Bill. Recent Bills stored on Comlaw have the explantory memorandum stored on Comlaw too. The ideal case would have the Comlaw data unambiguously linked to the data (Bills Digest, debates, committee reports, etc). The data is important context to the existing law, and the existing law is important context to a proposed law. I suspect the current data exposed by the government would allow any member of the public to programmatically trace Acts on to scraped debates (XML) on OpenAustralia. Of course, the government could help that along by publishing more explicit metadata like semantic markup (and data in a form described by XML schemas rather than human-readable "Word Notes").

    The next step after marked-up legislation (i.e. machine-readable metadata for legislation) is machine-readable legislation. This means instead of writing "It is an offence to carry a widget in public", you have a schema to explain "offence", "carry", "widget" etc to the computer. Then the computer can fill out your tax return automatically, by reading your accounting records and the legislation.

  3. Comment
    Allan Barger
    ( Moderator )

    Hello Brendan,

    This request has been received and we have passed it on to relevant Agency. We will keep you updated on any progress and on the outcome of the request.


Add your comment