UP | HOME

Published: 2017-05-01

Publishing Org Documents to Google Drive

Table of Contents

I recently had to write a couple of company policies and playbooks at work, where we use Google Drive extensively.

I started writing the documents on Google Docs directly, but quickly found how painful it was, at least when compared to writing documents in Orgmode.

So I switched to writing them in Orgmode, thinking that publishing and uploading to Google Drive would be straightforward. I thought I could just publish the documents as HTML and upload them to Google Drive. Unfortunately it wasn’t quite as simple as that.

Publishing

My first attempt was to publish the Org files using the HTML backend then upload those into Google Drive and convert them to Google Docs. But the resulting Google Doc file was not very good. The links in the table of contents were not working at all, the formatting was off, etc.

My next attempt was to export as ODT files. This time the conversion to Google Docs was much better and had none of the issues the HTML files had. Satisfied with the conversion result, I quickly got to configuring my Orgmode project to use the ODT backend for publishing. But to my annoyance, the ox-odt.el package did not have a publishing function useable with org-publish-project-alist at all.

But with a little bit of digging and a lot of trial and error, I ended up with the following implementation of a publishing function for ODT:

 1: ;;;###autoload
 2: (defun org-odt-publish-to-odt (plist filename pub-dir)
 3:   "Publish an org file to ODT.
 4: 
 5: FILENAME is the filename of the Org file to be published.  PLIST
 6: is the property list of the given project.  PUB-DIR is the publishing
 7: directory.
 8: 
 9: Return output file name."
10:   (unless (or (not pub-dir) (file-exists-p pub-dir)) (make-directory pub-dir t))
11:   ;; Check if a buffer visiting FILENAME is already open.
12:   (let* ((org-inhibit-startup t)
13:          (visiting (find-buffer-visiting filename))
14:          (work-buffer (or visiting (find-file-noselect filename))))
15:     (unwind-protect
16:     (with-current-buffer work-buffer
17:       (let ((outfile (org-export-output-file-name ".odt" nil pub-dir)))
18:         (org-odt--export-wrap
19:          outfile
20:          (let* ((org-odt-embedded-images-count 0)
21:                 (org-odt-embedded-formulas-count 0)
22:                 (org-odt-object-counters nil)
23:                 (hfy-user-sheet-assoc nil))
24:            (let ((output (org-export-as 'odt nil nil nil
25:                                         (org-combine-plists
26:                                          plist
27:                                          `(:crossrefs
28:                                            ,(org-publish-cache-get-file-property
29:                                              (expand-file-name filename) :crossrefs nil t)
30:                                            :filter-final-output
31:                                            (org-publish--store-crossrefs
32:                                             org-publish-collect-index
33:                                             ,@(plist-get plist :filter-final-output))))))
34:                  (out-buf (progn (require 'nxml-mode)
35:                                  (let ((nxml-auto-insert-xml-declaration-flag nil))
36:                                    (find-file-noselect
37:                                     (concat org-odt-zip-dir "content.xml") t)))))
38:              (with-current-buffer out-buf (erase-buffer) (insert output))))))))
39:     (unless visiting (kill-buffer work-buffer))))

It’s a combination of org-publish-org-to, which is what ox-html and ox-latex use for publishing as HTML and LaTeX/PDF respectively, and org-odt-export-to-odt, the function for exporting individual Org files as ODT.

Uploading

The next step is to upload the ODT files to Google Drive and convert them as Google Docs.

For this I decided to use Python, since Google provides a great client library for their API’s.

Below is the entire script. It is quite simple and not very flexible, but it works well for my purposes. One prerequisite is that the directory structure in Google Drive must be pre-created manually to match the directory structure of the publish directory.

  1. CREDENTIALS_PATH: path to where the credentials data will be stored
  2. SECRETS_PATH: path to the client secrets file downloaded from the Google Cloud Console project
  3. PUBLISH_DIR: path to the :publishing-directory property
  4. DRIVE_FOLDER_ROOT: the ID of the root Google Drive folder where the documents should be uploaded
  1: import webbrowser
  2: import logging
  3: import os
  4: import httplib2
  5: import googleapiclient.discovery
  6: import oauth2client
  7: import oauth2client.client
  8: import oauth2client.file
  9: from googleapiclient.http import MediaFileUpload
 10: 
 11: 
 12: CREDENTIALS_PATH = 'credentials.json'
 13: SECRETS_PATH = 'secrets.json'
 14: PUBLISH_DIR = 'published'
 15: DRIVE_FOLDER_ROOT = '0000AAAABBBBCCCC'
 16: 
 17: logging.getLogger('googleapiclient').setLevel(logging.WARNING)
 18: logging.basicConfig(level=logging.INFO, format='%(message)s')
 19: 
 20: def get_auth():
 21:     """
 22:     Load credentials from file or otherwise authorize for new credentials.
 23:     """
 24:     storage = oauth2client.file.Storage(CREDENTIALS_PATH)
 25:     credentials = storage.get()
 26:     if credentials is None:
 27:         flow = oauth2client.client.flow_from_clientsecrets(SECRETS_PATH,
 28:                                                            scope='https://www.googleapis.com/auth/drive',
 29:                                                            redirect_uri='urn:ietf:wg:oauth:2.0:oob')
 30:         auth_uri = flow.step1_get_authorize_url()
 31:         webbrowser.open(auth_uri)
 32:         auth_code = raw_input('Enter the auth code: ')
 33:         credentials = flow.step2_exchange(auth_code)
 34:         credentials.authorize(httplib2.Http())
 35:         storage = oauth2client.file.Storage(CREDENTIALS_PATH)
 36:         storage.put(credentials)
 37:         return credentials
 38:     else:
 39:         return credentials
 40: 
 41: 
 42: def find_folder_id(client, odt_file):
 43:     """
 44:     Find the correct folder to upload the odt_file to in Google Drive.
 45:     """
 46:     folder_path = os.path.dirname(odt_file).split(os.path.sep)
 47:     drive_folder_id_path = [DRIVE_FOLDER_ROOT]
 48:     drive_folder_name_path = []
 49:     q = "mimeType='application/vnd.google-apps.folder' and '{id}' in parents"
 50:     for path in folder_path:
 51:         if path == PUBLISH_DIR:
 52:             continue
 53:         resp = client.files().list(corpora='user',
 54:                                    q=q.format(id=drive_folder_id_path[-1])).execute(num_retries=2)
 55:         if resp.get('files'):
 56:             for f in resp['files']:
 57:                 if f['name'] == path:
 58:                     drive_folder_id_path.append(f['id'])
 59:                     drive_folder_name_path.append(f['name'])
 60:                     break
 61:         else:
 62:             raise ValueError('Failed to find folder "%s" in Google Drive. Make sure it already exists.' % path)
 63:     if drive_folder_name_path and drive_folder_name_path[-1] == folder_path[-1]:
 64:         return drive_folder_id_path[-1]
 65: 
 66: 
 67: def find_existing_file_id(client, folder_id, drive_file_name):
 68:     """
 69:     Find an existing Google Doc file with the same drive_file_name in folder_id.
 70:     """
 71:     q = "mimeType='application/vnd.google-apps.document' and '{id}' in parents and name='{name}'"
 72:     resp = client.files().list(corpora='user',
 73:                                q=q.format(id=folder_id, name=drive_file_name)).execute(num_retries=2)
 74:     if resp.get('files'):
 75:         return resp['files'][0]['id']
 76: 
 77: 
 78: def upload(odt_file):
 79:     """
 80:     Upload an individual odt_file to Google Drive.
 81:     """
 82:     if not odt_file.endswith('.odt'):
 83:         return
 84:     credentials = get_auth()
 85:     http = credentials.authorize(httplib2.Http())
 86:     client = googleapiclient.discovery.build('drive', 'v3', http=http, cache_discovery=False)
 87:     folder_id = find_folder_id(client, odt_file)
 88:     drive_file_name = os.path.basename(odt_file).replace('.odt', '')
 89:     drive_file_name = [s.title() for s in drive_file_name.split('_')]
 90:     drive_file_name = ' '.join(drive_file_name)
 91:     existing_file_id = find_existing_file_id(client, folder_id, drive_file_name)
 92: 
 93:     file_metadata = {
 94:         'name': drive_file_name,
 95:         'mimeType': 'application/vnd.google-apps.document',
 96:     }
 97:     media = MediaFileUpload(odt_file,
 98:                             mimetype='application/vnd.oasis.opendocument.text',
 99:                             resumable=True)
100:     if not existing_file_id:
101:         file_metadata['parents'] = [folder_id]
102:         logging.info('Creating "%s" from %s...', drive_file_name, odt_file)
103:         file = client.files().create(body=file_metadata,
104:                                      media_body=media,
105:                                      fields='id').execute()
106:     else:
107:         logging.info('Updating "%s" from %s...', drive_file_name, odt_file)
108:         file = client.files().update(fileId=existing_file_id,
109:                                      body=file_metadata,
110:                                      media_body=media,
111:                                      fields='id').execute()
112: 
113: 
114: def upload_directory(directory):
115:     """
116:     Recursively upload all odt files in directory to Google Drive.
117:     """
118:     for root, dirs, files in os.walk(directory):
119:         for f in files:
120:             odt_file = os.path.join(root, f)
121:             upload(odt_file)
122:         for d in dirs:
123:             upload_directory(d)
124: 
125: 
126: if __name__ == '__main__':
127:     upload_directory(PUBLISH_DIR)

The script recursively uploads all ODT documents in the publish directory to the specified folder in Google Drive, and follows the same folder structure. One possible improvement one could do would be to only upload the recently modified files.

So with these two pieces of code in place, I can now publish and upload to Google Drive with the following sequence of commands from Emacs:

  1. C-c C-e P p to publish
  2. M-! python upload2drive.py to upload to Google Drive

Keywords: emacs,orgmode,python

Modified: 2018-10-20 18:29:24 +08

Copyright (c) 2017 John Del Rosario

Emacs 26.1 (Org mode 9.1.9)