Extract Email#
Problem#
I have a lot of alerts configured with Google Scholar for various research interests. It’s a very cool concept, setting up a keyword search like blast fragmentation shockwave
and Google sending you a summary email of new research that matches.
However, this can generate a lot of email each week that needs to be sifted through (as of this writing it is about 60 emails or so a week for me). I developed a simple tool to help me. It can read emails from Google Scholar and Research Gate for links to articles and PDF. You have to save the email to .eml
format somewhere on your disk. Point the script to that folder. The script will read them all. It will search for all href
tags deduplicating the links, listing them along with the description. Optionally, it can load PDF links directly in your browser or open a CSV list of links up in your favorite spreadsheet.
This script makes dealing with large volumes of alerts much more efficient.

Figure 1 - A sample screen capture of the CSV output.#
Note
In Libre Office Calc, use the hyperlink
function on the URL
column to create clickable links that will open automatically in your browser.
Installation and Usage#
To work with the tool you will need Python 3.9
and virtualenv
installed. You will also need to clone the git repository:
$ cd ~/repositories
$ git clone https://github.com/TroyWilliams3687/extract_email
Once the git repository has been cloned, run the make file to construct the virtual environment:
$ make
Note
You will need an environment variable called python
that points to your local python bin folder. It should look something like:
$ echo $python
~/opt/python_3.9.5/bin
Activate the virtual environment:
$ . .venv/bin/activate
Or you can use make:
$ make shell
Execute the script:
$ extract "~/tmp/extract tbird email" --verbose --launch-pdf
OR
$ extract "~/tmp/extract tbird email" --verbose --launch-csv