You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"This suggests that once a package is archived maintainers do not make the effort to put it back on CRAN except on very few cases were there are multiple attempts. To check we can see the current available packages and see how many of those are still present on CRAN:
CRAN
Packages
Proportion
no
3869
64%
yes
2183
36%
Many packages are currently on CRAN despite their past archivation but close to 64% are currently not on CRAN.", which vice versa means 36% of archived packages return to CRAN.
"Yes, 36% of all packages archived returned to CRAN (when I created the post). As time goes this % will lower, and also it could mean that a package was archived, then returned and then was archived for good. The time they were archived could be calculated comparing the archive and current dates and the date when they were archived. This is relatively trivial to do and could provide some estimation for CRANhaven."
It would be interesting to get the raw data for how long "returning" packages are archived. This information should be possible to retrieve from https://cran.r-project.org/src/contrib/PACKAGES.in because its entries carry information on the type of event and when it took place. Two examples are:
Package: jlmerclusterperm
X-CRAN-History: Archived on 2024-02-29 for policy violation.
.
Does not clean up use of cache.
Unarchived on 2024-03-04.
and
Package: BFS
X-CRAN-History: Archived on 2022-06-14 as check problems were not corrected in time.
Unarchived on 2022-09-07.
Archived on 2024-01-24 as requires archived package 'pxweb'.
Unarchived on 2024-02-02.
Archived on 2024-02-17 for policy violation.
.
On Internet access (429 error).
Unarchived on 2024-02-24.
With this raw data, we can estimate the distribution of how long packages falls off CRAN before returning.
We could also add annotation to each archived packages with information on why it was archived. For instance, CRANhaven could also serve as a dashboard to get an overview of why packages are no longer available, as an alternative to going into the each CRAN package page.
The text was updated successfully, but these errors were encountered:
HenrikBengtsson
changed the title
STATS: ~36% of archived packages are unarchived again (2022 study)
STATS: ~36% of archived packages are unarchived later (2022 study)
Mar 5, 2024
I think we could get some comments besides the package name. Something like "Archived on 2024-02-17 for policy violation." but to get the "On Internet access (429 error)." on the same line it can be more tricky, I would avoid it.
I have some numbers from the raw data:
There are some inconsistencies but there are at least 4704 cases when packages were archived and they later returned to CRAN.
These come from 3649 unique packages out of 8837 with at least one event registered on the file.
It seems that the number of archived packages has been rising (in line with what other ongoing research would suggest).
It would be nice to cross reference with the attempts it takes to be accepted.
Overall, median time 30 days:
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.0 8.0 30.0 113.2 109.0 3292.0
Split by times the package was archived:
attempt packages min q1 mean median q3 max
1 3649 1 days 9 days 121 days 33 days 120 days 3292 days
2 769 1 days 8 days 91 days 27 days 87 days 1949 days
3 203 1 days 6 days 80 days 22 days 76 days 882 days
4 65 1 days 9 days 66 days 23 days 54 days 652 days
5 16 1 days 3 days 24 days 13 days 31 days 93 days
6 2 17 days 20 days 22 days 22 days 24 days 27 days
I haven't taken into account #6, but we could deduce 2 weeks if a package was archived close to 20XX/12/31.
In the 'Reasons why packages are archived on CRAN' blog post on 2022-05-10, @llrs shows how get metadata on different CRAN packages events, including archiving and unarchiving of packages, directly from CRAN. Specifically, this data is available in https://cran.r-project.org/src/contrib/PACKAGES.in.
One of the results of this 2022 study, was:
"This suggests that once a package is archived maintainers do not make the effort to put it back on CRAN except on very few cases were there are multiple attempts. To check we can see the current available packages and see how many of those are still present on CRAN:
Many packages are currently on CRAN despite their past archivation but close to 64% are currently not on CRAN.", which vice versa means 36% of archived packages return to CRAN.
In a Bioconductor Slack thread on 2024-03-05 (https://community-bioc.slack.com/archives/CLF37V6C8/p1709643793615939?thread_ts=1709600683.154139&cid=CLF37V6C8), @llrs added:
"Yes, 36% of all packages archived returned to CRAN (when I created the post). As time goes this % will lower, and also it could mean that a package was archived, then returned and then was archived for good. The time they were archived could be calculated comparing the archive and current dates and the date when they were archived. This is relatively trivial to do and could provide some estimation for CRANhaven."
It would be interesting to get the raw data for how long "returning" packages are archived. This information should be possible to retrieve from https://cran.r-project.org/src/contrib/PACKAGES.in because its entries carry information on the type of event and when it took place. Two examples are:
and
With this raw data, we can estimate the distribution of how long packages falls off CRAN before returning.
We could also add annotation to each archived packages with information on why it was archived. For instance, CRANhaven could also serve as a dashboard to get an overview of why packages are no longer available, as an alternative to going into the each CRAN package page.
The text was updated successfully, but these errors were encountered: