Jump to content

Software Heritage

From Wikipedia, the free encyclopedia
Software Heritage
FormationJune 30, 2016; 8 years ago (2016-06-30)
FounderRoberto Di Cosmo,
Stefano Zacchiroli
TypeNon‑profit
HeadquartersInria
Location
Scientific Advisors
Gérard Berry
Jean-François Abramatic
Julia Lawall
Serge Abiteboul
AffiliationsInria
Staff13
Websitesoftwareheritage.org

Software Heritage is a non-profit organization which provides a service for archiving and referencing historical and contemporary software — with a focus on human readable source code. The site was unveiled in 2016 by Inria[1] and is supported by UNESCO.[2][3][4] The project itself is structured as a non‑profit multi‑stakeholder initiative.

Overview

[edit]

The stated mission of Software Heritage is to collect, preserve and share all software that is publicly available in source code form, with the goal of building a common, shared infrastructure at the service of industry, research, culture and society as a whole.[5]

Software source code is collected by crawling code hosting platforms, like GitHub, GitLab.com or Bitbucket, and packages archives, like npm or PyPI, and ingested into a special data structure, a Merkle DAG, that is the core of the archive.[6] Each artifact in the archive is associated with an identifier called a SWHID.[7] In 2023, the expansion of SWHID was changed from Software Heritage identifier to software hash identifier.

In order to increase the chances of preserving the Software Heritage archive over the long term, a mirror program was established in 2018, joined by ENEA[8] and FossID[9] as of October 2020.

History

[edit]

Development of Software Heritage began at Inria under the direction of computer scientists Roberto Di Cosmo and Stefano Zacchiroli in early 2015,[10] and the project was officially announced to the public on June 30, 2016.[1][11]

In 2017 Inria signed an agreement with UNESCO for the long-term preservation of software source code and for making it widely available, in particular through the Software Heritage initiative.[12]

In June 2018, the Software Heritage Archive[6] was opened at UNESCO headquarters.[2]

On July 4, 2018, Software Heritage was included in the French National Plan for Open Science.[13]

In October 2018, the strategy and vision underlying the mission of Software Heritage were published in Communications of the ACM.[5]

In November 2018, a group of forty international experts met at the invitation of Inria and UNESCO,[14] which led to the publication in February 2019 of Paris Call: Software Source Code as Heritage for Sustainable Development.[15]

In November 2019, Inria signed an agreement with GitHub to improve the archival process for GitHub-hosted projects in the Software Heritage archive.[16]

As of October 2020, Software Heritage’s repository held over 143 million software projects in an archive of over 9.1 billion unique source files.[6]

Funding

[edit]

Software Heritage is a non-profit organization, funded largely from donations from supporting sponsors, that include private companies, public bodies and academic institutions.[17]

Software Heritage also seeks support for funding third parties interested in contributing to its mission. A grant from NLNet[18] funded the work of Octobus[19] and Tweag[20] that led to rescuing 250.000 Mercurial repositories phased out from Bitbucket.[21]

A grant from the Alfred P. Sloan Foundation funds experts to develop new connectors for expanding coverage of the Software Heritage Archive [22]

Development and community

[edit]

The Software Heritage infrastructure is built transparently and collaboratively. All the software developed in the process is released as free and open-source software.[23] An ambassador program has been announced in December 2020 with the stated goal to grow the community of users and contributors.[24]

Awards

[edit]

In 2016, Software Heritage received the best community project award at Paris Open Source Summit 2016.[25][26]

In 2019, Software Heritage received the award of Academic Initiative from the Pôle Systematic.[27]

References

[edit]
  1. ^ a b "Collect, organise, preserve and share the Software Heritage of mankind" (PDF). Software Heritage. 30 June 2016. Retrieved 26 July 2016.
  2. ^ a b UNESCO (14 November 2019). "Software Heritage". Retrieved 2 November 2020.
  3. ^ Brown, Paul (30 June 2016). "Software Heritage: Creating a safe haven for software". Boing Boing. Retrieved 26 July 2016.
  4. ^ Jost, Clémence (1 July 2016). "Open source: lancement de Software Heritage, la plus grande bibliothèque de codes source de la planète". Archimag. Retrieved 27 July 2016.
  5. ^ a b Abramatic, Jean-François; Di Cosmo, Roberto; Zacchiroli, Stefano (1 October 2018). "Building the Universal Archive of Source Code Journal Article". Communications of the ACM. Retrieved 2 November 2020.
  6. ^ a b c "Software Heritage Archive". Retrieved 2 November 2020.
  7. ^ "Software Heritage Persistent Identifiers". Software Heritage. Retrieved 2 November 2020.
  8. ^ "At ENEA the first institutional mirror of Software Heritage". ENEA. Archived from the original on 16 November 2020. Retrieved 2 November 2020.
  9. ^ "FossID establishes first independent mirror of world's larges source code archive". FossID. 6 December 2018. Archived from the original on 23 September 2020. Retrieved 2 November 2020.
  10. ^ Moody, Lyn (30 June 2016). "Software Heritage, the "Library of Alexandria of software," launches today". Ars Technica. Retrieved 26 July 2016.
  11. ^ Brogan, Jacob (30 June 2016). "Introducing Software Heritage, the Library of Alexandria for Code". Slate. Retrieved 26 July 2016.
  12. ^ UNESCO (3 April 2020). "Discours de la Directrice générale de l'UNESCO, Irina Bokova, à l'occasion de la signature de l'accord entre l'UNESCO et INRIA portant sur la préservation et le partage du patrimoine logiciel" (Press release). Paris, France: UNESCO. Retrieved 2020-11-03. Bokova, IG, Director-General, 2009–2017.
  13. ^ "National Plan for Open Science" (PDF). Ouvrir La Science. Archived from the original (PDF) on 1 July 2021. Retrieved 2 November 2020.
  14. ^ "Experts call for greater recognition of software source code as heritage for sustainable development" (Press release). Paris, France: UNESCO. 16 November 2020. Retrieved 2 November 2020.
  15. ^ "Paris Call on software source code as heritage for sustainable development". Paris: UNESCO. February 2019. Retrieved 2 November 2020.
  16. ^ "GitHub Archive Program". November 2019. Retrieved 2 November 2020.
  17. ^ "Software Heritage Sponsors". Retrieved 2 November 2020.
  18. ^ "NLNet Software Heritage grant". Retrieved 2 November 2020.
  19. ^ "Augmenting Software Heritage archiving capabilities". Retrieved 2 November 2020.
  20. ^ "Long-term reproducibility with Nix and Software HERITAGE". Retrieved 2 November 2020.
  21. ^ "Announcing the Mercurial public Bitbucket archive". Retrieved 2 November 2020.
  22. ^ Sloan Foundation. "Excited to support Software Heritage". Retrieved 2 November 2020.
  23. ^ "Software Heritage licensing". Retrieved 25 February 2021.
  24. ^ "Software Heritage Ambassadors". Retrieved 25 February 2021.
  25. ^ "Les Acteurs du Libre - Précédents Lauréats". Archived from the original on 18 January 2019. Retrieved 8 May 2020.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
  26. ^ "Paris Open Source Summit 2016 : Prix Acteurs du Libre : et les gagnants sont..." Programmez! (in French). 17 November 2016. Retrieved 28 June 2019.
  27. ^ @Pole_Systematic (27 June 2019). "Convention @Pole_Systematic le Trophée Prix Initiative académique est remis @SWHeritage" (Tweet) – via Twitter.
[edit]