Extract Email#

Problem#

I have a lot of alerts configured with Google Scholar for various research interests. It’s a very cool concept, setting up a keyword search like blast fragmentation shockwave and Google sending you a summary email of new research that matches.

However, this can generate a lot of email each week that needs to be sifted through (as of this writing it is about 60 emails or so a week for me). I developed a simple tool to help me. It can read emails from Google Scholar and Research Gate for links to articles and PDF. You have to save the email to .eml format somewhere on your disk. Point the script to that folder. The script will read them all. It will search for all href tags deduplicating the links, listing them along with the description. Optionally, it can load PDF links directly in your browser or open a CSV list of links up in your favorite spreadsheet.

This script makes dealing with large volumes of alerts much more efficient.

Figure 1 - A sample screen capture of the CSV output.#

Note

In Libre Office Calc, use the hyperlink function on the URL column to create clickable links that will open automatically in your browser.

Installation and Usage#

To work with the tool you will need Python 3.9 and virtualenv installed. You will also need to clone the git repository:

$ cd ~/repositories

$ git clone https://github.com/TroyWilliams3687/extract_email

Once the git repository has been cloned, run the make file to construct the virtual environment:

$ make

Note

You will need an environment variable called python that points to your local python bin folder. It should look something like:

$ echo $python
~/opt/python_3.9.5/bin

Activate the virtual environment:

$ . .venv/bin/activate

Or you can use make:

$ make shell

Execute the script:

$ extract "~/tmp/extract tbird email" --verbose --launch-pdf

OR

$ extract "~/tmp/extract tbird email" --verbose --launch-csv