Skip to content

Semantic Scholar citation graph builder - allows to build a graph that includes all citations and (if specified) also references for all papers of an author given the Semantic Scholar author ID.

License

Notifications You must be signed in to change notification settings

pmarcis/citation-graph-builder

Repository files navigation

Semantic Scholar Citation Graph Builder

Semantic Scholar citation graph builder - allows to build a graph that includes all citations and (if specified) also references for all papers of an author given the Semantic Scholar author ID.


NOTE 1 Semantic Scholar: 1) may not have all information of an author's works, 2) tends to have noisy data - author data scattered around multiple author entries and multiple different authors may be merged under one author ID. So ... be transparent about these points when comparing data!


NOTE 2 Be polite! Semantic Scholar allows 100 requests per 5 minutes. If you do not want to get banned, do not increase the request frequency (and do not execute the requests in parallel)!


NOTE 3 I differentiate between self-citations, co-author citations, and other citations! I define a co-author as someone who has co-authored with the author at least one paper and not as someone who is a co-author of a specific paper when analysing that particular paper's citations.


Build Instructions

The code is written in C#. You will need Visual Studio or an alternative (C# capable) development environment to compile and use the code.

You may need Visual Studio 2019 or newer to compile the code.

Usage Instructions

In order to use the citation-graph-builder, first you need to acquire an author ID. Go to https://www.semanticscholar.org/, find the author you are interested in, open his/her profile, and copy the multi-digit number that comes after the author's name in the URL.

Then, execute the GetSemanticScholarAuthorCitationGraph.exe tool with the following command line:

.\GetSemanticScholarAuthorCitationGraph.exe -o [OutFile] -id [AuthorID] -y [Year]

Replace:

  • [OutFile] with a path to the Graph Exchange XML Format (GEXF) output file. This is where the graph will be stored.
  • [AuthorID] with the multi-digit author ID.
  • [Year] with the earliest year since which papers should be included in the graph. This allows you to analyse, for instance, papers for the last five years. The parameter -y is optional.

There are two optional parameters available (both override each other so only one should be specified):

  • --include-relevant-references - include also references to co-author papers.
  • --include-all-references - include all references.

The tool will output also statistics of each author after analysis. The format of the output is as follows:

[AuthorID]\t[AuthorName]\t[PaperCount]\t[SelfCitationCount]\t[CoAuthorCitationCount]\t[OtherCitationCount]

The GEXF graph can be visualised, for instance, using Gephi.

Reference

If you use the code for scientific purposes, please refer to it using:

@ebook{ss-citation-graph-builder,
  author = {Pinnis, Mārcis},
  title  = {{Semantic Scholar Citation Graph Builder}},
  url    = {https://github.com/pmarcis/citation-graph-builder},
  year   = {2020},
  note   = {Accessed Oct. 25, 2020}
}

About

Semantic Scholar citation graph builder - allows to build a graph that includes all citations and (if specified) also references for all papers of an author given the Semantic Scholar author ID.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages