I have a lot of alerts configured with Google Scholar for various research interests. It’s a very cool concept, setting up a keyword search like
blast fragmentation shockwave and Google sending you a summary email of new research that matches.
However, this can generate a lot of email each week that needs to be sifted through (as of this writing it is about 60 emails or so a week for me). I developed a simple tool to help me. It can read emails from Google Scholar and Research Gate for links to articles and PDF. You have to save the email to
.eml format somewhere on your disk. Point the script to that folder. The script will read them all. It will search for all
href tags deduplicating the links, listing them along with the description. Optionally, it can load PDF links directly in your browser or open a CSV list of links up in your favorite spreadsheet.
This script makes dealing with large volumes of alerts much more efficient.
NOTE: In Libre Office Calc, use the
hyperlinkfunction on the
URLcolumn to create clickable links that will open automatically in your browser.
To work with the tool you will need
Python 3.9 and
virtualenv installed. You will also need to clone the git repository:
$ cd ~/repositories $ git clone https://github.com/TroyWilliams3687/extract_email
Once the git repository has been cloned, run the make file to construct the virtual environment:
NOTE: You will need an environment variable called
pythonthat points to your local python bin folder. It should look something like:
$ echo $python ~/opt/python_3.9.5/bin
Activate the virtual environment:
$ . .venv/bin/activate
Or you can use make:
$ make shell
Execute the script:
$ extract "~/tmp/extract tbird email" --verbose --launch-pdf
$ extract "~/tmp/extract tbird email" --verbose --launch-csv