Automatically create Trello cards through Python webscraping

Content:

  1. Motivation
  2. Idea
  3. Basic implementation
  4. Optimization and automation
    1. Identifying the target list within my Trello account
    2. Scrape data from different sources and convert it to Trello cards
    3. Plugging it all together
  5. Conclusion
  6. Next steps
  7. Additional information

Motivation

As part of my M.Sc. studies, I have to attend 45 talks given by professionals in my field. Of these 45 talks I have to summarize 30 in about 1-2 pages each. Therefore it is vital, to take notes during the talk in order to be able to remember it later on. As you might know, I am a big fan of of the versatile project management tool Trello. Therefore it just felt natural to let Trello help me in this task. As Trello boards are great for collaborating with multiple people (more on than in a different article), I even invited my fellow class mates to participate in this Trello-aided pursuit.

Idea

While identifying and attending 45 talks can be a challenging task on its own, summarizing them properly requires thorough planning. Therefore I came up with a basic workflow to then see, what parts of it can be optimized. My ideal workflow can be described as follows:

  1. Identify talks and keep track of them
  2. Attend talk and take notes
  3. Store notes in a way that makes it easy to remember to which talk they belong
  4. Summarize talk based on the notes

As the attendance, note-taking and summary parts are hard to automate/optimize I’ll focus on the identification and organisation of talks.

Basic implementation

First, I would a central “hub” for the organisation of all talks and the corresponding notes. Therefore I created a Trello board that is centered around two lists: “Upcoming talks” and “Past talks”.

A Trello board containing the two lists 'Upcoming talks' and 'Past talks'.

As you can see, each of this lists contains a card (and con potentially contain hundreds if not thousands). Cards are basically the heart of project management with Trello as you can extend and interact with them in multiple ways. I decided to create one card per talk. As you can see, I put the title and the location of the talk as the card-title so that it is displayed on the “board view”. Additionally, you can see a date underneath the title – more on that later.

A Trello card that represents a talk. The card contains a title, a description, a due date and an attachment.

The card features a title, which is also displayed in the “board view”, a description, attachments and multiple other features. I’ll be using the title, the description, the option to add attachments and the due date in this project. The due date is specifically important to my approach as the Trello calendar plugin provides you with an .ical link that can then be integrated into your personal calendar.

A snapshot of my Google calendar that features an automatically created event from Trello.

As you can see, I’ve integrated the calendar representing the board into my Google calendar which therefore lists a talk on the 18th. Up until here, the proposed basic workflow is fully functional and can be described in detail as follows:

  1. Identify talk
  2. Create Trello card with the respective information on that talk (including the due date)
  3. Automatically be notified by my calendar to attend the talk
  4. Attend the talk and take notes
  5. Upload the notes to Trello once the talk is over
  6. Access my notes on Trello once I’m in the mood for writing a summary

Optimization and automation

While attendance and writing the summary will obviously be the most time-consuming, the part that I am the least excited about is searching for talks that I could attend. As most accessible talks are hosted by institutions, the search would consist of browsing their websites and then adding suitable talks to my Trello board. This screams for an automated webscraping solution. And as I’m the most fluent in Python, I chose this particular language for the project.

Part 1: Identifying the target list within my Trello account

The interaction with Trello will be facilitated through their API using POST and GET requests. Two different python packages for interacting with Trello exist (trello and py-trello) but I didn’t see any benefit in using them.

First, we need to acquire an API-key and an authentication token (Click on “Token” in the first paragraph) for our Trello account. I chose to create an additional account specific to this task as this enables an easier separation from “human actions” and “computer actions”. For better modifiability, those two account-specific arguments as well as the Trello board and list name will be stored in variables instead of being hard-coded into the respective links.

import requests
from icalendar import Calendar
import urllib2

API_KEY = "INSERT YOUR API KEY HERE"
OAUTH_TOKEN = "INSERT YOUR OAUTH TOKEN HERE"
BOARD_NAME = "INSERT YOUR BOARD NAME HERE"
LIST_NAME = "INSERT YOUR LIST NAME HERE"

Next we try to find the unique id of the Trello board with the name specified in “BOARD_NAME” so that we can use it later on. As you can see, we build our GET request using the previously acquired tokens. This request returns a list of all boards associated with our account which we then filter for the correctly named Board. Once found, its unique id is returned.

def findBoard():
  
  get_boards_url = "https://api.trello.com/1/members/me/boards?key=" + \
                   API_KEY + "&token=" + OAUTH_TOKEN + "&response_type=token"

  r = requests.get(get_boards_url)

  for boards in r.json():

    board_id = ""
    board_name = ""

    for key, value in boards.items():

      if key == "id":

        board_id = value

      elif key == "name":

        board_name = value

    if board_name == BOARD_NAME:

      print("Found board.")

      return board_id

    else:

      print("Didn't find board.")

      return False

Next, we use this unique board id to identify the “Upcoming talks” list within this particular board. Therefore we once again construct our GET request and filter the returned result for our list specified in “LIST_NAME”. If found, its id is returned.


def findList(board_id):
  
  get_lists_url = "https://api.trello.com/1/boards/" + board_id + \
                  "/lists?key=" + API_KEY + "&token=" + OAUTH_TOKEN + \
                  "&response_type=token"

  r = requests.get(get_lists_url)

  for lists in r.json():

    list_id = ""
    list_name = ""

    for key, value in lists.items():

      if key == "id":

        list_id = value

      elif key == "name":

        list_name = value

    if list_name == LIST_NAME:

      print("Found list.")

      return list_id

    else:

      print("Didn't find list.")

      return False

This id is then used to get a list of all cards already present. We’ll later check possible new candidates against this list so that no duplicates are added. For each card, a unique id, the title (stored in “name”), its due date and its description are stored in the returned list.

def findCards(list_id):
  
  get_cards_url = "https://api.trello.com/1/lists/" + list_id + \
                  "/cards?key=" + API_KEY + "&token=" + OAUTH_TOKEN + \
                  "&response_type=token"

  list_of_cards = []

  r = requests.get(get_cards_url)

  for cards in r.json():

    card_id = ""
    card_name = ""
    card_due = ""
    card_desc = ""

    for key, value in cards.items():

      if key == "id":

        card_id = value

      elif key == "name":

        card_name = value

      elif key == "due":

        card_due = value

      elif key == "desc":

        card_desc = value

    list_of_cards.append([card_id, card_name, card_due, card_desc])

  if len(list_of_cards) > 0:

    return list_of_cards

  else:

    return False

Part 2: Scrape data from different sources and convert it to Trello cards

Currently this script only acquires data from the homepage of the EMBL Heidelberg. Conveniently, one the one hand they provide a website on which they list all future talks and on the other hand, an .ical feed of these talks which could theoretically directly integrated into your calendar. But this would not meet the desired feature of being able to assign notes to the talk. Therefore we scrape this .ical calendar and convert the data into Trello cards. As I don’t see a point in listing talks without knowing their topic, I’m excluding talks for which the topic is still set to “To be announced”.

The list of extracted talks is then also checked against the talks already present in the list so that no duplicates are created.


def getEMBLTalks(list_id, list_of_cards):
  
  get_embl_events = "https://www-db.embl.de/jss/servlet/de.embl.bk.seminarlist.ServeSeminarAsICal?dutystationID=1&seminarTypeID=0&timeFrame=0"

  req = urllib2.Request(get_embl_events)
  response = urllib2.urlopen(req)
  data = response.read()

  cal = Calendar.from_ical(data)

  for event in cal.walk('vevent'):

    try:

      date = event.get('dtstart').dt

    except:

      pass

    try:

      summery = event.get('summary')

    except:

      pass

    try:

      description = event.get('description')

    except:

      pass

    event_name = summery + " @ EMBL"

    if not "To be announced" in event_name:

      if not any(card[1] == event_name for card in list_of_cards):

        r = requests.post("https://api.trello.com/1/cards?key=" + \
                          API_KEY + "&token=" + OAUTH_TOKEN + \
                          "&name=" + event_name + "&idList=" + \
                          list_id + "&due=" + str(date) + "&desc=" + \
                          description)

        print("Added card.")

Part 3: Plugging it all together

Now that all functions are ready, we just plug it together and run it. To sum it up: It first finds your board, then your list, then generates a list of cards present. A list of scraped data from the EMBL is then checked against this list and new talks are added as cards with the correct data.

if __name__ == '__main__':

  board_id = findBoard()

  if board_id:

    list_id = findList(board_id)

    list_of_cards = findCards(list_id)

    if list_of_cards:

      getEMBLTalks(list_id, list_of_cards)

The output of the script tells me that it successfully identified both the board and the list and has added two new talks to it. Nice.

Output of the script showing that it found both the board and the list and has added two cards to it.

Conclusion

Next steps

  • Add more sources that will be scraped for talks
  • Automate the scraping process using my Raspberry Pi
  • Automatically mark past talks as completed and move them to the “Past talks” list

Additional information