Automating password rotation for Oracle databases

May 26, 2017, 12:03 pm

≫ Next: Redo volume optimization in 12c R2

≪ Previous: Investigating IO performance on Amazon EC2

Password rotation is not the most exciting task in the world, and that’s exactly why it’s a perfect candidate for automation. Automating routine tasks like this are good for everyone – DBAs can work on something that’s more exciting, companies save costs as less time is spent on changing the passwords, and there’s no place for human error, either. At Pythian, we typically use Ansible for task automation, and I like it mainly because of its non-intrusive configuration (no agents need to be installed on the target servers), and its scalability (tasks are executed in parallel on the target servers). This post will briefly describe how I automated password rotation for oracle database users using Ansible.

Overview

This blog post is not an intro to what is Ansible and how to use it, but it’s rather an example of how a simple task can be automated using Ansible in a way that’s scalable, flexible and easily reusable, and also provides the ability for other tasks to pick up the new passwords from a secure password store.

Scalability – I’d like to take advantage of Ansible’s ability of executing tasks on multiple servers at the same time. For example, in a large environments of tens or hundreds of machines, a solution that executes password change tasks serially would not be suitable. This would be an example of a “serial” task (it’s not a real thing, but just an illustration that it “hardcodes” a few “attributes” (environment file, the username and the hostname), and creating a separate task for every user/database you’d want to change the password for would be required:
```
- hosts: ora-serv01
  remote_user: oracle
  tasks:
  - name: change password for SYS
    shell: |
      . TEST1.env && \
      sqlplus / as sysdba @change_pasword.sql SYS \
      \"{{lookup('password','/dev/null length=8')}}\"
```
Flexible – I want to be able to adjust the list of users for which the passwords are changed, and the list of servers/databases that the user passwords are changed for in a simple way, that doesn’t include changing the main task list.
Reusable – this comes together with flexibility. The idea is that the playbook would be so generic, that it wouldn’t require any changes when it’s implemented in a completely separate environment (i.e. for another client of Pythian)
Secure password store – the new passwords are to be generated by the automated password rotation tool, and a method of storing password securely is required so that the new passwords could be picked up by the DBAs, application owners or the next automated task that would reconfigure the application

The implementation

Prerequisites

I chose to do the implementation using Ansible 2.3, because it introduces the passwordstore lookup, which enables interaction with the pass utility (read more about it in Passwordstore.org). pass is very cool. It store passwords in gpg-encrypted files, and it can also be configured to automatically update the changes to a git repository, which relieves us of the headache of password distribution. The password can be retrieved from git on the servers that need the access to the new passwords.

Ansible 2.3 runs on python 2.6, unfortunately, the passwordstore lookup requires Python 2.7, which can be an issue if the control host for Ansible runs on Oracle Linux 6 or RHEL 6, as they don’t provide Python 2.7 in the official yum repositories. Still, there are ways of getting it done, and I’ll write another blog post about it.

So, what we’ll need is:

Ansible 2.3
jmespath plugin on Ansible control host (pip install jmespath)
jinja2 plugin on Ansible control host (I had to update it using pip install -U jinja2 in few cases)
Python 2.7 (or Python 3.5)
pass utility

The Playbook

This is the whole list of files that are included in the playbook:

./chpwd.yml
./inventory/hosts
./inventory/orcl1-vagrant-private_key
./inventory/orcl2-vagrant-private_key
./roles/db_users/files/change_password.sql
./roles/db_users/files/exists_user.sql
./roles/db_users/defaults/main.yml
./roles/db_users/tasks/main.yml

Let’s take a quick look at all of them:

./chpwd.yml – is the playbook and (in this case) it’s extremely simple as I want to run the password change against all defined hosts:
```
$ cat ./chpwd.yml
---
  - name: password change automation
    hosts: all
    roles:
      - db_users
```

./inventory/hosts, ./inventory/orcl1-vagrant-private_key, ./inventory/orcl2-vagrant-private_key – these files define the hosts and the connectivity. In this case we have 2 hosts – orcl1 and orcl2, and we’ll connect to vagrant user using the private keys.

$ cat ./inventory/hosts
[orahosts]
orcl1 ansible_host=127.0.0.1 ansible_port=2201 ansible_ssh_private_key_file=inventory/orcl1-vagrant-private_key ansible_user=vagrant
orcl2 ansible_host=127.0.0.1 ansible_port=2202 ansible_ssh_private_key_file=inventory/orcl2-vagrant-private_key ansible_user=vagrant

./roles/db_users/files/change_password.sql – A sql script that I’ll execute on the database to change the passwords. It takes 2 parameters the username and the password:
```
$ cat ./roles/db_users/files/change_password.sql
set ver off pages 0
alter user &1 identified by "&2";
exit;
```

./roles/db_users/files/exists_user.sql – A sql script that allows verifying the existence of the users. It takes 1 argument – the username. It outputs “User exists.” when the user is there, and “User {username} does not exist.” – when it’s not.

$ cat ./roles/db_users/files/exists_user.sql
set ver off pages 0
select 'User exists.' from all_users where username=upper('&1')
union all
select 'User '||upper('&1')||' does not exist.' from (select upper('&1') from dual minus select username from all_users);
exit;

./roles/db_users/defaults/main.yml – is the default file for the db_users role. I use this file to define the users for each host and database for which the passwords need to be changed:
```
$ cat ./roles/db_users/defaults/main.yml
---
  db_users:
    - name: TEST1
      host: orcl1
      env: ". ~/.bash_profile && . ~/TEST1.env > /dev/null"
      pwdstore: "orcl1/TEST1/"
      os_user: oracle
      become_os_user: yes
      users:
        - dbsnmp
        - system
    - name: TEST2
      host: orcl2
      env: ". ~/.bash_profile && . ~/TEST2.env > /dev/null"
      pwdstore: "orcl2/TEST2/"
      os_user: oracle
      become_os_user: yes
      users:
        - sys
        - system
        - ctxsys
    - name: TEST3
      host: orcl2
      env: ". ~/.bash_profile && . ~/TEST3.env > /dev/null"
      pwdstore: "orcl2/TEST3/"
      os_user: oracle
      become_os_user: yes
      users:
        - dbsnmp
```
In this data structure, we define everything that’s needed to be known to connect to the database and change the passwords. each entry to the list contains the following data:
- name – just a descriptive name of the entry in this list, normally it would be the name of the database that’s described below.
- host – the host on which the database resides. It should match one of the hosts defined in ./inventory/hosts.
- env – how to set the correct environment to be able to connect to the DB (currently it requires sysdba connectivity).
- pwdstore – the path to the folder in the passwordstore where the new passwords will be stored.
- os_user and become_os_user – these are used in case sudo to another user on the target host is required. In a typical configuration, I connect to the target host using a dedicated user for ansible, and then sudo to the DB owner. if ansible connects to the DB onwer directly, then become_os_user should be set to “no”.
- users – this is the list of all users for which the passwords need to be changed.
As you see, this structure greatly enhances the flexibility and reusability, because adding new databases, hosts or users to the list would be done by a simple change to the “db_users:” structure in this defaults file. In this example, dbsnmp and system passwords are rotated for TEST1@orcl1, sys, system and ctxsys passwords are rotated for TEST2@orcl2, and dbsnmp on TEST3@orcl2
./roles/db_users/tasks/main.yml – this is the task file of the db_users role. The soul of the playbook and the main part that does the password change depending on the contents in the defaults file described above. Instead of pasting the whole at once, I’ll break it up task by task, and will provide some comments about what’s being done.
- populate host_db_users – This task simply filters the whole db_users data structure that’s defined in the defaults file, and creates host_db_users fact with only the DBs that belong to the host the task is currently run on. Using the ansible “when” conditional would also be possible to filter the list, however in such case there’s a lot of “skipped” entries displayed when the task is executed, so I prefer filtering the list before it’s even passed to the Ansible task.
```
---
  - name: populate host_db_users
    set_fact: host_db_users="{{ db_users | selectattr('host','equalto',ansible_hostname) | list }}"
```
- create directory for target on db hosts – for each unique combination of os_user and become_os_user on the target host, and “ansible” directly is created. A json_query is used here, to filter just the os_user and become_os_user attributes that are needed. It would also work with with_items: "{{ host_db_users }}", but in that case, the outputs become cluttered as the attributes are displayed during the execution.
```
  - name: create directory for target on db hosts
    file:
      path: "ansible"
      state: directory
    become_user: "{{ item.os_user }}"
    become: "{{ item.become_os_user }}"
    with_items: "{{ host_db_users | json_query('[*].{os_user: os_user, become_os_user: become_os_user }') | unique | list }}"
```
- copy sql scripts to db_hosts – the missing scripts are copied from Ansible control host to the target “ansible” directories. “with_nested” is the method to create a loop in Ansible.
```
  - name: copy sql scripts to db_hosts
    copy:
      src="{{ item[1] }}"
      dest=ansible/
      mode=0644
    become_user: "{{ item[0].os_user }}"
    become: "{{ item[0].become_os_user }}"
    with_nested:
      - "{{ host_db_users | json_query('[*].{os_user: os_user, become_os_user: become_os_user }') | unique | list }}"
      - ['files/change_password.sql','files/exists_user.sql']
```
- verify user existence – I’m using a shell module to execute the sql script after setting the environment. The outputs are collected in “exists_output” variable. This task will not fail and will not show as “changed” because of failed_when and changed_when settings of “false”.
```
  - name: verify user existence
    shell: |
       {{ item[0].env }} && \
       sqlplus -S / as sysdba \
       @ansible/exists_user.sql {{ item[1] }}
    register: exists_output
    become_user: "{{ item[0].os_user }}"
    become: "{{ item[0].become_os_user }}"
    with_subelements:
      - "{{ host_db_users |json_query('[*].{env: env, os_user: os_user, users: users, become_os_user: become_os_user }') }}"
      - users
    failed_when: false
    changed_when: false
```
- User existence results – this task will fail when any of the users didn’t exist, and will display which user it was. This is done in a separate task to produce cleaner output, and in case it’s not wanted to fail if any of the users don’t exist (continue to change passwords for the existing users), this task can simply be commented or the “failed_when: false” can be uncommented.
```
  - name: User existence results
    fail: msg="{{ item }}"
    with_items: "{{ exists_output.results|rejectattr('stdout','equalto','User exists.')|map(attribute='stdout')|list }}"
    #failed_when: false
```
- generate and change the user passwords – finally, this is the task that actually changes the passwords. The successful password change is detected by checking the output from the sqlscript, which should produce “User altered.” The rather complex use of lookups is there for a reason: the passwordstore lookup can also generate passwords, but it’s not possible to define the character classes that the new password should contain, however the “password” lookup allows defining these. Additionally, the 1st character is generated only containing “ascii_letters”, as there are usually some applications that “don’t like” passwords that start with numbers (this is why generating the 1st letter of the password is separated from the remaining 11 characters. And lastly, the “passwordstore” lookup is used with the “userpass=” parameter to pass and store the generated password into the passwordstore (and it also keeps the previous passwords). This part could use some improvement as in some cases different rules for the generated password complexity may be required. The password change outputs are recorded in “change_output” that’s checked in the last task.
```
  - name: generate and change the user passwords
    shell: |
       {{ item[0].env }} && \
       sqlplus -S / as sysdba \
       @ansible/change_password.sql \
       {{ item[1] }} \"{{ lookup('passwordstore',item[0].pwdstore + item[1] + ' create=true overwrite=true userpass=' +
                                 lookup('password','/dev/null chars=ascii_letters length=1') +
                                 lookup('password','/dev/null chars=ascii_letters,digits,hexdigits length=11')) }}\"
    register: change_output
    become_user: "{{ item[0].os_user }}"
    become: "{{ item[0].become_os_user }}"
    with_subelements:
      - "{{ host_db_users |json_query('[*].{env: env, os_user: os_user, users: users, pwdstore: pwdstore, become_os_user: become_os_user}') }}"
      - users
    failed_when: false
    changed_when: "'User altered.' in change_output.stdout"
```
- Password change errors – The “change_output” data are verified here, and failed password changes are reported.
```
   # fail if the password change failed.
  - name: Password change errors
    fail: msg="{{ item }}"
    with_items: "{{ change_output.results|rejectattr('stdout','equalto','\nUser altered.')|map(attribute='stdout')|list }}"
```

It really works!

Now, when you know how it’s built – it’s time to show how it works!
Please pay attention to the following:

The password store is empty at first
The whole password change playbook completes in 12 seconds
The tasks on both hosts are executed in parallel (see the order of execution feedback for each task)
The passwordstore contains the password entries after the playbook completes, and they can be retrieved by using the pass command

$ pass
Password Store
$ time ansible-playbook -i inventory/hosts chpwd.yml
PLAY [pasword change automation] *******************************************************
TASK [Gathering Facts] *****************************************************************
ok: [orcl1]
ok: [orcl2]
TASK [db_users : populate host_db_users] ***********************************************
ok: [orcl1]
ok: [orcl2]
TASK [db_users : create directory for target on db hosts] ******************************
changed: [orcl1] => (item={'become_os_user': True, 'os_user': u'oracle'})
changed: [orcl2] => (item={'become_os_user': True, 'os_user': u'oracle'})
TASK [db_users : copy sql scripts to db_hosts] *****************************************
changed: [orcl1] => (item=[{'become_os_user': True, 'os_user': u'oracle'}, u'files/change_password.sql'])
changed: [orcl2] => (item=[{'become_os_user': True, 'os_user': u'oracle'}, u'files/change_password.sql'])
changed: [orcl1] => (item=[{'become_os_user': True, 'os_user': u'oracle'}, u'files/exists_user.sql'])
changed: [orcl2] => (item=[{'become_os_user': True, 'os_user': u'oracle'}, u'files/exists_user.sql'])
TASK [db_users : verify user existance] ************************************************
ok: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'sys'))
ok: [orcl1] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST1.env > /dev/null'}, u'dbsnmp'))
ok: [orcl1] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST1.env > /dev/null'}, u'system'))
ok: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'system'))
ok: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'ctxsys'))
ok: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST3.env > /dev/null'}, u'dbsnmp'))
TASK [db_users : User existance results] ***********************************************
TASK [db_users : generate and change the user passwords] *******************************
changed: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl2/TEST2/', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'sys'))
changed: [orcl1] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl1/TEST1/', 'env': u'. ~/.bash_profile && . ~/TEST1.env > /dev/null'}, u'dbsnmp'))
changed: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl2/TEST2/', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'system'))
changed: [orcl1] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl1/TEST1/', 'env': u'. ~/.bash_profile && . ~/TEST1.env > /dev/null'}, u'system'))
changed: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl2/TEST2/', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'ctxsys'))
changed: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl2/TEST3/', 'env': u'. ~/.bash_profile && . ~/TEST3.env > /dev/null'}, u'dbsnmp'))
TASK [db_users : Password change errors] ***********************************************
PLAY RECAP *****************************************************************************
orcl1                      : ok=6    changed=3    unreachable=0    failed=0
orcl2                      : ok=6    changed=3    unreachable=0    failed=0
real    0m12.418s
user    0m8.590s
sys     0m3.900s
$ pass
Password Store
|-- orcl1
|   |-- TEST1
|       |-- dbsnmp
|       |-- system
|-- orcl2
    |-- TEST2
    |   |-- ctxsys
    |   |-- sys
    |   |-- system
    |-- TEST3
        |-- dbsnmp
$ pass orcl1/TEST1/system
HDecEbjc6xoO
lookup_pass: First generated by ansible on 26/05/2017 14:28:50

Conclusions

For past 2 months I’ve been learning Ansible and trying it for various DBA tasks. It hasn’t always been a smooth ride, as I had to learn quite a lot, because I wasn’t exposed much to beasts like jinja2, json_query, YAML, python (very handy for troubleshooting) and Ansible itself before. I feel that my former PL/SQL coder’s experience had created some expectations from Ansible, that turned out not to be true. The biggest challenges to me were getting used to the linear execution of the playbook (while with PL/SQL I can call packages, functions, etc. to process the data “outside” the main linear code line), and the lack of execution feedback, because one has to learn creating Ansible tasks in a way that they either succeed or fail (no middle states like ‘this is a special case – process it differently’), as well as the amount of visual output is close to none – which does make sense to some degree, it’s “automation” after all, right? Nobody should be watching :)
A separate struggle for me was working with the complex data structure that I created for storing the host/database/user information. It’s a mix of yaml “dictionary” and “list”, and it turned out to be difficult to process it in a way I wanted – this is why I used the json_query at times (although not in a very complex way in this case). There are probably simpler ways I didn’t know of (didn’t manage finding), and I’d be glad if you’d let me know of possible improvements or even other approaches to such tasks that you have worked on and implemented.
Despite all the complaining above, I think it’s really worth investing time in automating tasks like this, it really works and once done it doesn’t require much attention. Happy Automating!

↧

Redo volume optimization in 12c R2

June 26, 2018, 7:43 am

≫ Next: Using Docker to provide self-service database patching

≪ Previous: Automating password rotation for Oracle databases

I was using SLOB to compare the throughput between 12.1 and 12.2 databases, and was surprised to see that the average redo size per transaction was ~18.5KB on 12cR2, and ~339KB on 12cR1. Understanding this difference was important for the assessment and interpretation of the test results.

This is going to be a short blog post about an apparently new redo volume optimization that’s introduced in Oracle Database 12c R2.

Before I explain how I found it, here’s the conclusion for the impatient ones: There’s a new feature in 12cR2 that reduces the redo volume by removing the new column values from the redo records in cases when UPDATES set the column to its already existing value.

I spent some time checking the AWR report differences and tracing the sessions to rule out all the obvious candidates and found nothing obvious there:

the redo block size was the same – 512
the “redo wastage” (statistic) was very low
no other workload was running at the same time, so the transactions were coming purely from SLOB.
in both cases, SLOB was configured with the same parameters, and the workload characteristics were the same (i.e. the same number of updates per transaction, the same SQL in the top, etc)

I also used a query I wrote a while ago to extract the redo size (by object and by transaction) directly from the redo logs:

exec DBMS_LOGMNR.ADD_LOGFILE('&FILENAME');
exec DBMS_LOGMNR.START_LOGMNR(OPTIONS => DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG);

set pages 50000 lines 500 tab OFF TIME ON serverout ON
col owner FOR a30
col object_name FOR a30
col subobject_name FOR a30
col object_type FOR a30
select * from (
SELECT data_obj#,
       owner,
       object_name,
       subobject_name,
       XIDUSN,XIDSLT,XIDSQN,XID,
       COUNT(*)                    REDO_CNT,
       SUM(r_size_b) / 1024 / 1024 REDO_MB
FROM   (SELECT data_obj#,XIDUSN,XIDSLT,XIDSQN,XID,
               rbablk,
               rbabyte,
               rbablk * 512 + rbabyte
                      b_offset,
               ( LEAD(rbablk * 512 + rbabyte)
                   over (
                     PARTITION BY rbasqn
                     ORDER BY rbablk*512+rbabyte) ) - ( rbablk * 512 + rbabyte )
                      R_SIZE_B
        FROM   v$logmnr_contents) lc,
       dba_objects o
WHERE  o.object_id(+) = lc.data_obj#
       and owner='USER1' and object_name='CF1'
GROUP  BY data_obj#,
          owner,
          object_name,
          subobject_name, XIDUSN,XIDSLT,XIDSQN,XID
) where rownum<=5;

exec DBMS_LOGMNR.END_LOGMNR;

And I confirmed the redo size per transaction was a lot different between the two versions.
This is on 12cR1 (pay attention to the REDO_MB column, I moved it to the front for visibility):

   REDO_MB  DATA_OBJ# OWNER                          OBJECT_NAME                    SUBOBJECT_NAME                     XIDUSN     XIDSLT     XIDSQN XID                REDO_CNT
---------- ---------- ------------------------------ ------------------------------ ------------------------------ ---------- ---------- ---------- ---------------- ----------
.333778381   19726234 USER1                          CF1                                                                    1          0    5582960 0100000070305500         63
.334209442   19726234 USER1                          CF1                                                                    1          0    5582961 0100000071305500         63
.333782196   19726234 USER1                          CF1                                                                    1          1    5583160 0100010038315500         63
 .33380127   19726234 USER1                          CF1                                                                    1          1    5583161 0100010039315500         63
.328651428   19726234 USER1                          CF1                                                                    1          2    5582217 01000200892D5500         63

This is on 12cR2:

   REDO_MB  DATA_OBJ# OWNER                          OBJECT_NAME                    SUBOBJECT_NAME                     XIDUSN     XIDSLT     XIDSQN XID                REDO_CNT
---------- ---------- ------------------------------ ------------------------------ ------------------------------ ---------- ---------- ---------- ---------------- ----------
.018199921     100475 USER1                          CF1                                                                    1          0     148400 01000000B0430200         63
.018596649     100475 USER1                          CF1                                                                    1          0     148351 010000007F430200         63
.018745422     100475 USER1                          CF1                                                                    1          0     148352 0100000080430200         63
.018726349     100475 USER1                          CF1                                                                    1          0     148353 0100000081430200         63
.018543243     100475 USER1                          CF1                                                                    1          0     148354 0100000082430200         63

As I hadn’t observed any workload differences up to this point, I decided to take some logfile dumps and look at them. SLOB uses one SQL statement to update the data, so it was easy to identify the redo records for the CF1:

UPDATE CF1 SET C2 = 'AAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBB', 
               ... skipped 18 more columns to improve readability ..., 
               C20 = 'AAAAAAAABBBBBBBBAAAAAAAA0BBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBB0BBBAAAAAAAABBBBBBBBAAAAAAAABBBBB3BBAAAAAAAABBBBBBBB' 
WHERE CUSTID > ( :B1 - :B2 ) AND ( CUSTID < :B1 )

And this is what I found in the redo log dumps on 12cR1 – we see two records per row – redo for the undo blocks first, and the actual change of the CF1, including the new values that the columns are updated to:

REDO RECORD - Thread:1 RBA: 0x00354d.000002ae.010c LEN: 0x15b0 VLD: 0x01 CON_UID: 0
SCN: 0x0b28.3f9903c6 SUBSCN:  1 06/20/2018 05:41:48
CHANGE #1 CON_ID:0 TYP:0 CLS:83 AFN:7 DBA:0x01c00290 OBJ:4294967295 SCN:0x0b28.3f9903bb SEQ:1 OP:5.2 ENC:0 RBL:0 FLG:0x0000
ktudh redo: slt: 0x0005 sqn: 0x005391cc flg: 0x0012 siz: 2684 fbi: 0
            uba: 0x022173ea.3dc2.02    pxid:  0x0000.000.00000000
CHANGE #2 CON_ID:0 TYP:0 CLS:84 AFN:8 DBA:0x022173ea OBJ:4294967295 SCN:0x0b28.3f9903ba SEQ:2 OP:5.1 ENC:0 RBL:0 FLG:0x0000
ktudb redo: siz: 2684 spc: 5516 flg: 0x0012 seq: 0x3dc2 rec: 0x02
            xid:  0x0022.005.005391cc
ktubl redo: slt: 5 rci: 0 opc: 11.1 [objn: 19726234 objd: 19726236 tsn: 463]
Undo type:  Regular undo        Begin trans    Last buffer split:  No
Temp Object:  No
Tablespace Undo:  No
             0x00000000  prev ctl uba: 0x022173d5.3dc2.02
prev ctl max cmt scn:  0x0b28.3f990377  prev tx cmt scn:  0x0b28.3f990378
txn start scn:  0xffff.ffffffff  logon user: 3124  prev brb: 35746101  prev bcl: 0 BuExt idx: 0 flg2: 0
KDO undo record:
KTB Redo
op: 0x04  ver: 0x01
compat bit: 4 (post-11) padding: 1
op: L  itl: xid:  0x0019.016.0055d79e uba: 0x01e5c11c.7646.03
                      flg: C---    lkc:  0     scn: 0x0b28.3f99032b
KDO Op code: URP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x0315b558  hdba: 0x02a40003
itli: 2  ispac: 0  maxfr: 4858
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 0 ckix: 6
ncol: 20 nnew: 19 size: 0
col  1: [128]
 41 41 41 41 41 41 41 41 42 42 42 42 42 42 42 42 41 41 41 41 41 41 41 41 42
 42 42 42 42 42 42 42 41 41 41 41 41 41 41 41 42 42 42 42 42 42 42 42 41 41
 41 41 41 41 41 41 42 42 42 42 42 42 42 42 41 41 41 41 41 41 41 41 42 42 42
 42 42 42 42 42 41 41 41 41 41 41 41 41 42 42 42 42 42 42 42 42 41 41 41 41
 41 41 41 41 42 42 42 42 42 42 42 42 41 41 41 41 41 41 41 41 42 42 42 42 42
 42 42 42
...
col 19: [128]
 41 41 41 41 41 41 41 41 42 42 42 42 42 42 42 42 41 41 41 41 41 41 41 41 30
 42 42 42 42 42 42 42 41 41 41 41 41 41 41 41 42 42 42 42 42 42 42 42 41 41
 41 41 41 41 41 41 42 42 42 42 42 42 42 42 41 41 41 41 41 41 41 41 42 42 42
 42 30 42 42 42 41 41 41 41 41 41 41 41 42 42 42 42 42 42 42 42 41 41 41 41
 41 41 41 41 42 42 42 42 42 33 42 42 41 41 41 41 41 41 41 41 42 42 42 42 42
 42 42 42
CHANGE #3 CON_ID:0 TYP:2 CLS:1 AFN:301 DBA:0x0315b558 OBJ:19726236 SCN:0x0b28.3f990368 SEQ:1 OP:11.5 ENC:0 RBL:0 FLG:0x0000
KTB Redo
op: 0x11  ver: 0x01
compat bit: 4 (post-11) padding: 1
op: F  xid:  0x0022.005.005391cc    uba: 0x022173ea.3dc2.02
Block cleanout record, scn:  0x0b28.3f9903c6 ver: 0x01 opt: 0x02, entries follow...
  itli: 1  flg: (opt=2 whr=1)  scn: 0x0b28.3f990368
KDO Op code: URP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x0315b558  hdba: 0x02a40003
itli: 2  ispac: 0  maxfr: 4858
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 2 ckix: 6
ncol: 20 nnew: 19 size: 0
col  1: [128]
 41 41 41 41 41 41 41 41 42 42 42 42 42 42 42 42 41 41 41 41 41 41 41 41 42
 42 42 42 42 42 42 42 41 41 41 41 41 41 41 41 42 42 42 42 42 42 42 42 41 41
 41 41 41 41 41 41 42 42 42 42 42 42 42 42 41 41 41 41 41 41 41 41 42 42 42
 42 42 42 42 42 41 41 41 41 41 41 41 41 42 42 42 42 42 42 42 42 41 41 41 41
 41 41 41 41 42 42 42 42 42 42 42 42 41 41 41 41 41 41 41 41 42 42 42 42 42
 42 42 42
...
col 19: [128]
 41 41 41 41 41 41 41 41 42 42 42 42 42 42 42 42 41 41 41 41 41 41 41 41 30
 42 42 42 42 42 42 42 41 41 41 41 41 41 41 41 42 42 42 42 42 42 42 42 41 41
 41 41 41 41 41 41 42 42 42 42 42 42 42 42 41 41 41 41 41 41 41 41 42 42 42
 42 30 42 42 42 41 41 41 41 41 41 41 41 42 42 42 42 42 42 42 42 41 41 41 41
 41 41 41 41 42 42 42 42 42 33 42 42 41 41 41 41 41 41 41 41 42 42 42 42 42
 42 42 42
CHANGE #4 MEDIA RECOVERY MARKER CON_ID:0 SCN:0x0000.00000000 SEQ:0 OP:5.20 ENC:0 FLG:0x0000
session number   = 1327
serial  number   = 20241
transaction name =
version 202375680
audit sessionid 931488888
Client Id =
login   username = USER1

However, on 12cR2, the records differ – the new column values are no longer recorded, and I assume it’s because the values actually didn’t change. There are still both redo entries there, but the column values are no longer there:

REDO RECORD - Thread:1 RBA: 0x00059b.002cd649.016c LEN: 0x0128 VLD: 0x01 CON_UID: 0
SCN: 0x0000000012f55a0f SUBSCN: 59 06/21/2018 07:39:08
CHANGE #1 CON_ID:0 TYP:0 CLS:18 AFN:4 DBA:0x0101a3f6 OBJ:4294967295 SCN:0x0000000012f55a0f SEQ:11 OP:5.1 ENC:0 RBL:0 FLG:0x0000
ktudb redo: siz: 84 spc: 7290 flg: 0x0022 seq: 0x374e rec: 0x0b
        xid:  0x0001.000.000243b0
ktubu redo: slt: 0 rci: 10 opc: 11.1 objn: 100475 objd: 100477 tsn: 7
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
         0x00000000
KDO undo record:
KTB Redo
op: 0x04  ver: 0x01
compat bit: 4 (post-11) padding: 1
op: L  itl: xid:  0x0008.000.0002aff4 uba: 0x0101aed8.30cb.4b
                  flg: C---    lkc:  0     scn:  0x0000000012f557ff
KDO Op code: LKR row dependencies Disabled
xtype: XA flags: 0x00000000  bdba: 0x006ee8f5  hdba: 0x00500003
itli: 1  ispac: 0  maxfr: 4858
tabn: 0 slot: 0 to: 0
CHANGE #2 CON_ID:0 TYP:2 CLS:1 AFN:8 DBA:0x006ee8f5 OBJ:100477 SCN:0x0000000012f559d5 SEQ:1 OP:11.4 ENC:0 RBL:0 FLG:0x0000
KTB Redo
op: 0x11  ver: 0x01
compat bit: 4 (post-11) padding: 1
op: F  xid:  0x0001.000.000243b0    uba: 0x0101a3f6.374e.0b
Block cleanout record, scn:  0x0000000012f55a0f ver: 0x01 opt: 0x02 bigscn: Y compact: Y spare: 00000000, entries follow...
itli: 2  flg: (opt=2 whr=1)  scn:  0x0000000012f559d5
KDO Op code: LKR row dependencies Disabled
xtype: XA flags: 0x00000000  bdba: 0x006ee8f5  hdba: 0x00500003
itli: 1  ispac: 0  maxfr: 4858
tabn: 0 slot: 0 to: 1

I haven’t done a very thorough testing, so there may be a situation when it works and when it doesn’t.

↧

Using Docker to provide self-service database patching

November 20, 2018, 7:45 am

≫ Next: Achieving rock-solid maintenance with template-based action plans

≪ Previous: Redo volume optimization in 12c R2

I’ve been looking into how Vagrant and/or Docker can be used to improve the life of DBAs and maybe others, too. I did this mostly to practice using these tools – especially Docker – that are becoming more and more important in the modern IT landscape. I decided to set up a configuration that allowed for patching the database by starting a Docker container from an image that provides a patched version of Oracle software, and I was surprised by how simple it was.

The vision

It’s important to remember that the changes made inside a Docker container are ephemeral, so it’s expected to lose them when the container is destroyed. Obviously, we wouldn’t like to lose our database if the container was destroyed, thus, the data files have to be stored outside the container and should be attached to the container using a Docker volume. Additionally, the plan is to stop the old container and start a new one with a different version of DB software in it – here, too, the data files need to be located outside the container, otherwise, they would not be accessible to the other container we’d start.

In this proof of concept implementation, I want to be able to detect if a “datapatch” command needs to be run in the database upon startup. This would need to happen only when the patch level of the RDBMS Oracle Home changes, or in our case, when the DB is started from a container created from a different version of the Docker image. Of course, we could run datapatch every time the database is started as it takes quite a bit of time to complete. It’s also a little bit tricky to figure out if the datapatch needs to be run having access only to the data files of the database, thus I’ll do it the following way:

Each version of the Docker image will also hold a “tagfile” that keeps a version number of the Docker image.
At the startup of the container, the local tagfile is compared to the tagfile stored along with the data files on an external volume. If they are not the same, the datapatch needs to be executed. The local tagfile is copied to the external volume after completing the datapatch run.

I’m going to use the Dockerfiles provided by Oracle – mainly written by Gerald Venzl – available on GitHub as a starting point for this little project. The reason for this choice is that the Docker image is built from scratch by starting with an empty Oracle Linux image on top of which the database specific layers are added. The provided Dockerfile is easy to understand and follow, therefore it’s also simple to make the required changes to it. Alternatively, I could have used the Docker images from the Oracle Container Registry as a starting point. But in this case, the learning purpose and the visibility of the Dockerfile itself were the decision makers.

Vagrant

I will be building this on my Windows 10 Home (Hyper-V not supported) laptop, so I can’t run Docker on it natively. I’ll use Vagrant to build a VirtualBox VM that will be my Docker “machine” to work around this limitation, and all further work will happen on that VM.

I’m using MobaXterm as my terminal software, so the commands you’ll see here will look Linux-like (they could even work if you copy 1-to-1 and execute on your Mac OS), although they are executed on Windows.

After installing Vagrant and Virtualbox, the first thing we’ll want to do is download the latest Oracle Linux Vagrant box that will be used to create the Docker “machine” VM:

vagrant box add --name ol-latest https://yum.oracle.com/boxes/oraclelinux/latest/ol7-latest.box
mkdir -p ~/96_VAGRANT/OL-docker
cd ~/96_VAGRANT/OL-docker

The “~/96_VAGRANT/OL-docker” will serve as the working directory for this whole project. Please continue by uploading the following files to this directory. These will be required to create the initial Docker image, and patch the Oracle RDBMS software to create the second version of the container image.

LINUX.X64_180000_db_home.zip – Obtain it from Oracle Software Downloads page. Pick the Oracle Database 18c (18.3) ZIP file for Linux x86-64.
p28689117_180000_Linux-x86-64.zip – Combo of OJVM Component Release Update 18.4.0.0.181016 + Database Oct 2018 Release Update 18.4.0.0.181016. Obtain it from My Oracle Support.
p6880880_180000_Linux-x86-64.zip – Latest version of 18c OPatch. Obtain it from My Oracle Support.

Off-topic warning! If you’re also a fan of getMOSPatch, you can get both of the patches in one go like this (this does not work from MobaXterm, do it in CMD):

java -jar getMOSPatch.jar patch=28689117,6880880 platform=226P regexp=.*_180000_Linux-x86-64.* download=all

Once you have the files in place, create the “Vagrantfile” by running this code block:

cat << EOF > Vagrantfile
disk_size = 100*1024
disk = "ORCL.vdi"
Vagrant.configure("2") do |config|
config.vm.box = "ol-latest"
config.vm.network "private_network", ip: "192.168.33.13"
 config.vm.provider "virtualbox" do |vb|
   vb.memory = "8196"
   unless File.exist?(disk)
      vb.customize ['createhd', '--filename', disk, '--size', disk_size]
   end
   vb.customize ['storageattach', :id,  '--storagectl', 'SATA Controller', '--port', 2, '--device', 0, '--type', 'hdd', '--medium', disk]
 end
 config.vm.provision "shell", inline: <<-SHELL parted /dev/sdc mklabel msdos parted /dev/sdc mkpart primary btrfs 1 100000 mkfs.btrfs /dev/sdc1 echo "/dev/sdc1 /ORCL btrfs defaults 0 0" >> /etc/fstab
   mkdir /ORCL
   mount /ORCL
   mkdir /ORCL/docker /ORCL/oradb
   chown -R vagrant:vagrant /ORCL
   chmod -R 777 /ORCL
   yum install -y docker git unzip
   sed -i.bck -e "s,selinux-enabled,selinux-enabled --graph /ORCL/docker,g" /etc/sysconfig/docker
   systemctl start docker.service
   systemctl enable docker.service
   usermod -a -G docker vagrant
 SHELL
end
EOF

The Vagrantfile defines the following settings:

192.168.33.13 – IP address will be assigned to the private network.
ORCL.vdi – A 100G disk will be added as /ORCL to the VM. I’ll use this disk to store all Docker stuff in it, as well as the data files of the database too. The space will be allocated dynamically, and based on my testing, it will consume 41G of space at the end of the process.
The new disk will be formatted with BTRFS – I had some issues with EXT4 and EXT3 during the testing. Some of the files in the Docker container weirdly disappeared (I did not find the reason for that yet, but I found BTRFS was working okay. It might have something to do with the Layered FS that Docker depends on, but I’m not sure).
Docker, git and unzip will be installed. As well, Docker is reconfigured to move its files to the new disk. Then the Docker service is also started.

Let’s move on to create the VM by running “vagrant up” and then connect to it!

vagrant up
ssh -i ~/96_VAGRANT/OL-docker/.vagrant/machines/default/virtualbox/private_key vagrant@192.168.33.13

Building the first Docker image

There’s nothing really specific about the way I build the initial Docker image for the 18.3 database. I’m basically running the “buildDockerImage.sh” script as Gerald Venzl suggests in the documentation. The only considerable difference is modification of the installDBBinaries.sh script to remove lines that contain “rm -rf $ORACLE_HOME/..”. Apparently some directories from Oracle Home are being removed to reduce the size of the container, but it creates a problem, as Opatch fails to apply patches. I’m also copying / extracting the required patches to the /ORCL disk, as this location will be made available to the running containers.

cd
git clone https://github.com/oracle/docker-images.git
mkdir /ORCL/oradb/patch/
cp /vagrant/p6880880*.zip /ORCL/oradb/patch/
unzip -q -d /ORCL/oradb/patch /vagrant/p28689117_180000_Linux-x86-64.zip
sed -i.back -e '/rm -rf $ORACLE_HOME/d' ~/docker-images/OracleDatabase/SingleInstance/dockerfiles/18.3.0/installDBBinaries.sh
cd ~/docker-images/OracleDatabase/SingleInstance/dockerfiles
cp /vagrant/LINUX.X64_180000_db_home.zip 18.3.0/LINUX.X64_180000_db_home.zip
./buildDockerImage.sh -v 18.3.0 -e
rm 18.3.0/LINUX.X64_180000_db_home.zip

The completion of the build script will leave us with a prepared “oracle/database:18.3.0-ee” Docker image. There is no database yet, it will be created the first time a Docker container is started from this image. Let’s start it up, by mapping /ORCL/oradb as /opt/oracle/oradata to the container, thus, all datafiles stored in /opt/oracle/oradata will actually be stored outside of the container in /ORCL/oradb direcotry of the VM.

docker run -d -it --name d183-tmp -v /ORCL/oradb:/opt/oracle/oradata --privileged oracle/database:18.3.0-ee
docker logs d183-tmp -f
  # Ctrl-C when done
docker stop d183-tmp
docker commit -m "created from 18.3.0-ee export" d183-tmp db:18.3-tmp

The first command triggers the startup of the container named d183-tmp, and, as the database is not yet created, it will also take some time to create the DB. You’ll be able to follow it happening in the logs by running the second command. Once the DB is ready, the Docker container is stopped, and a new image named “db:18.3-tmp” from the current state of the container is created.

Adding the “datapatch” run to the 18.3 image

To facilitate database patching or de-patching, the “datapatch” command needs to be executed. I’ll build a new container image based on “db:18.3-tmp” for that. Remember, all these activities still happen inside the Vagrant VM we created earlier. The “docker build” command will be used to create the new container. Let’s prepare the required files in a separate directory – db-18.3:

Create the directory, and copy the startDB.sh

mkdir ~/db-18.3 && cd ~/db-18.3
cp ~/docker-images/OracleDatabase/SingleInstance/dockerfiles/18.3.0/startDB.sh .

Create the Dockerfile. The Dockerfile contains build instructions for the new image, in this case they are very simple – – use image “db:18.3-tmp” as the source and copy files “tagfile” and “startDB.sh” file to ${ORACLE_BASE} directory (the variable is defined in one of the earlier layers of the image, and its value is preserved). These are the contents of the “Dockerfile”:
```
echo "
FROM db:18.3-tmp as base
COPY tagfile \${ORACLE_BASE}/
COPY startDB.sh \${ORACLE_BASE}/" > Dockerfile
```
Create the “tagfile” with the following contents:
```
echo "db:18.3" > tagfile
```

Adjust the “startupDB.sh” to add the following lines right above the “# Start Listener” line. This additional logic ensures that datapatch is executed when the container is started and if the external tagfile does not match the tagfile of the container. Note, the datapatch is executed twice for 18.3 container to workaround a known issue that causes an incomplete patch rollback if it’s executed just once. The code block below adjusts the file as per this description.

LISTENER_LINE=$(grep -n "# Start Listener" ~/docker-images/OracleDatabase/SingleInstance/dockerfiles/18.3.0/startDB.sh | cut -d: -f1)
head -$(expr ${LISTENER_LINE} - 1) ~/docker-images/OracleDatabase/SingleInstance/dockerfiles/18.3.0/startDB.sh > startDB.sh

echo "
touch \$ORACLE_BASE/oradata/tagfile
cmp --silent \$ORACLE_BASE/oradata/tagfile \$ORACLE_BASE/tagfile || {
  sqlplus / as sysdba << EOF
    STARTUP;
    ALTER PLUGGABLE DATABASE ALL OPEN;
    exit;
EOF
  echo Running datapatch
  \$ORACLE_HOME/OPatch/datapatch -verbose
  \$ORACLE_HOME/OPatch/datapatch -verbose # remove for non-18.3 #
  cp \$ORACLE_BASE/tagfile \$ORACLE_BASE/oradata/tagfile
  sqlplus / as sysdba << EOF SHUTDOWN IMMEDIATE; exit; EOF } " >> startDB.sh

tail -n+${LISTENER_LINE} ~/docker-images/OracleDatabase/SingleInstance/dockerfiles/18.3.0/startDB.sh >> startDB.sh
chmod u+x startDB.sh

We’re now ready to proceed with the build of the new Docker image. The following commands will build the new image “db:18.3”, remove the current container “d183-tmp”, start a new container named “d183” from the new image, and tail the container log. The log will reveal a datapatch run (it’s technically not required at this point, but, the external data file location does not have a tagfile present yet, thus the startupDB.sh condition to run the datapatch is met.):

docker build -t db:18.3 .
docker rm d183-tmp
docker run -d -it --name d183 -v /ORCL/oradb:/opt/oracle/oradata --privileged db:18.3
docker logs d183 -f
  # Ctrl-C when done

Creating the 18.4 image from the 18.3 container

In order to provide a Docker image for the 18.4 database, we patch the Oracle Home in the current container “d183”. We’ll modify the tagfile and startDB.sh, and we’ll commit the container into a new image db:18.4.

The first thing to do is the database software patching. We already have the patch files available on /ORCL, but they’re still owned by root. We need to connect to the Docker container as root (-u 0), and change the ownership of the files:

docker exec -u 0 -it d183 bash
  chown -R oracle:oinstall /opt/oracle/oradata/patch

Additionally, the tagfile and the startupDB.sh need to be modified. We’re changing the version tag in the tagfile and removing the second datapatch run from the startDB.sh file:

  echo "db:18.4" > /opt/oracle/tagfile
  sed -i.back -e '/# remove for non-18.3 #/d' /opt/oracle/startDB.sh
  exit

Next, we connect to the same container as the default user (Oracle) and will perform the normal patching activities of the database software – replace the OPatch with the new one, stop the DB and the listener and apply the 18.4.0.0.181016 DB and OJVM release update patches. No need to start the DB at this point, as that and the datapatch run will be done during the startup of the container:

docker exec -it d183 bash
  cd $ORACLE_HOME
  rm -rf OPatch
  unzip -q /opt/oracle/oradata/patch/p6880880_180000_Linux-x86-64.zip
 
  export ORACLE_SID=ORCLCDB
  ORAENV_ASK=NO
  . oraenv
  sqlplus / as sysdba << EOF
    shut immediate;
    exit;
EOF
  lsnrctl stop
 
  cd /opt/oracle/oradata/patch/28689117/28655784/
  opatch apply -silent
  cd /opt/oracle/oradata/patch/28689117/28502229/
  opatch apply -silent
  exit

Now, the container “d183” currently contains the exact contents we want to have in our 18.4 version of the Docker image – db:18.4. Luckily, it’s easy to create an image from a running container. Let’s do that, and let’s remove the d183 container (because it contains 18.4 software, thus the name is confusing). We’ll also remove the old 18.3-tmp image because it is no longer needed.

docker commit -m "created by patching db:18.3" d183 db:18.4
docker stop d183
docker rm d183
docker image rm db:18.3-tmp
docker image ls
  REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
  db                  18.4                320eeb4c0b96        6 seconds ago       14.3GB
  db                  18.3                f0b291b882cb        2 hours ago         10.4GB
  oracle/database     18.3.0-ee           b761c85e8061        3 hours ago         10.3GB
  oraclelinux         7-slim              b8b00d5b0a75        2 weeks ago         117MB

Does it really work?

It’s easy to test it, we just need to create the containers for each version of the database image – “db:18.3” and “db:18.4”, then we can start and stop them interchangeably to observe how the datapatch upgrades or downgrades the database. You can keep both containers around, but only one of them is allowed to be started at the same time.

docker run -d -it --name d184 -v /ORCL/oradb:/opt/oracle/oradata --privileged db:18.4
docker logs d184 -f
  # Datapatch is expected
  # Ctrl-C when done
docker stop d184 && docker logs d184
docker run -d -it --name d183 -v /ORCL/oradb:/opt/oracle/oradata --privileged db:18.3
docker logs d183 -f
  # Datapatch is expected to remove 18.4 patches
  # Ctrl-C when done
docker stop d183 && docker logs d183
docker start d184 && docker logs d184 -f
  # Datapatch is expected to apply the 18.4 patches again
  # Ctrl-C when done
docker stop d184 && docker logs d184
docker start d184 && docker logs d184 -f
  # Datapatch is NOT expected as we started from 18.4 container the last time too
  # Ctrl-C when done
docker stop d184

The full outputs of all commands executed in this blog are available here.

Summary

The blog post turned out to be longer than I expected, and it might make you think the process is more complicated than expected. Please, take a look at this flowchart describing the main tasks that were performed – there’s really not too much going on!

Overview of creating the db:18.3 and db:18.4 images.

After completing the whole process. You’ll have two Docker containers – d183 and d184 – that you can start interchangeably to patch or de-patch the database between versions 18.3 and 18.4. This approach should also work with the majority of other patches, not only with the release update patches as displayed in this post.

Additionally, you should be able to create more databases by following these steps:

Start a new container X from “Oracle/database:18.3.0-ee”, map a different location for datafiles.
Remove the new container X – it will still leave the data files in place.
Create containers from images db:18.3 and db:18.4 to allow the automated patching that we implemented here.

Conclusions

Obviously, this is far from a “production-ready” solution, but the purpose was to show what’s possible. If you’re working in an organization where developers have their own databases and they want to be able to roll forward and back between different patch levels, they might prefer pulling an image from the container registry and being on a different patch level a few moment later rather than performing the more tedious approach of a manual patch installation or rollback by themselves.

This blog shows a conceptual implementing of a self-service patching feature for Oracle Databases. It’s very likely your experience will be different when you give this idea a go. You’ll probably use a different version of the RDBMS software and the patches you install will probably be different. Additionally, some extra steps specific to your environment will be required, and you’ll also want to think about the ability of running several databases on the same host at the same time.

Despite that, I hope this blog post gives you a few fresh ideas on how to improve and simplify the database development in your company.

↧

Achieving rock-solid maintenance with template-based action plans

February 7, 2019, 6:00 am

≫ Next: How to build a cost-effective serverless blogging platform on GCP – Part 1

≪ Previous: Using Docker to provide self-service database patching

Pythian has always been serious about reducing human mistakes.

Our consultants have always been required to log all terminal outputs for the work we execute so that the information is available to our clients, and to make sure the same work is done the same way the next time. Later, FIT-ACER was invented – a set of checks to be followed by the DBA working on the maintenance to make sure the person is ready to perform the work, and is actively aware of what, why, where and when exactly it needs to be executed.

Most recently, we have started adopting the automation approach; for example, by building Ansible playbooks that perform the maintenance work, which almost completely eliminates any human mistakes. Automation is the ideal, but the path to get there is neither simple nor quick. I was looking for a simple, intuitive process that could be used before the automation was implemented. This blog post will present my solution: a template-based Action Plan Generator using Google Docs that I created to reliably advance the maintenance through all testing cycles and into production.

The current approach

Of course, with our organizational emphasis on reliable, repeatable work, we have been creating action plans for our maintenance activities for a long time. Usually, this is a multi-step process:

The preliminary action plan for the maintenance is created by a DBA who takes into consideration the current state of the system and the change that needs to be applied. The DBA studies documentation and relevant My Oracle Support notes to determine the right sequence of actions.
The action plan is executed in a test/dev system. At this step, the DBA records the exact commands that need to be run, addresses issues, etc.
The final action plan is created by traversing the terminal logs and reviewing the notes from the previous step. Some steps may need to be reordered in this case, and if any issues were observed, the fixes may need to be added to avoid the issues in the next iterations. The outcome of this step is a document that lists the exact commands that need to be executed to get from the current state of the system to the patched state (upgraded, migrated, etc.) successfully.

This approach has worked really well for us and it has been one of the best ways to make sure the action plan is reliably repeatable and no steps are missed. However, it has a potential issue – hard-coded values. It’s clear that some things differ between different systems – port values, host names, usernames, paths and so on, which made it a necessity to add one more step:

Run a find/replace to adjust the hard-coded values in the action plan, and generate separate action plans for every other environment. Review the action plan to spot errors on text replacements.

I personally have had many problems with this step. I’m an Oracle Apps DBA, and when dealing with Oracle e-Business Suite maintenance, it’s not uncommon to have an action plan of 50+ pages in front of you. The largest action plans I’ve worked with were longer than 200 pages. Imagine how difficult it is to review it all and make sure the replace didn’t break anything. Is the find/replace/review process error-prone? Absolutely! We’ve had situations where some values are mis-replaced, or the search pattern matched too many strings in the document, resulting in an undesirable change.

Template-based action plans

The idea is very simple:

The final action plan is created as a template (during Step 3 above), so that the outcome won’t contain any hard-coded values. For example, we may use a “##DB_PORT##” placeholder instead of port 1521. We also include “toggles” to be able to enable/disable complete sections of the document.
All the values for placeholders for every environment that needs to be maintained are defined in a separate spreadsheet that is compact and easily manageable.
The final action plans are generated by a simple macro script by taking the template and updating the placeholders to the correct values.

Creating the action plan template like this saves time and reduces risk. It removes the necessity for manual find/replace, and eliminates the possibility of unintentionally replacing something that shouldn’t be replaced – errors that are not simple to spot when reviewing the result manually. It doesn’t eliminate the need for a review of the generated plan, but it should simplify the process.

Here, in a publicly shared google drive folder, is a working example for you to copy, try out and use for your own benefit. It consists of two documents explained below.

Action plan generator – template

Action Plan Generator – Template is an example of a very simple template. Here’s a small section of it:

Action Plan Generator – Template

Notice the following dynamic pieces in the template:

Variables – strings like ##ORA_USER##, ##DB_HOST_S##, and ##JIRA_ISSUE##. These placeholder variables are replaced with real values from the “Action Plan Generator – Variables” document when the final action plan is generated.
Toggles – ##EM13C>## (the beginning tag) and ##<EM13C## (the end tag). Toggles allow removing the section of the document between the beginning and end tags. If the EM13C is set to “OFF” the content will not show up in the final plan.

Action plan generator – variables

Action Plan Generator – Template is a spreadsheet that defines the values for the placeholder variables and toggles. Here’s how simple it looks:

Action Plan Generator – Variables

There are a few things to note:

You’ll need to update the “Document Template ID” to point it to the template document you actually want to use (open the template Google Doc in a browser, and the ID is part of the URL)
You’ll need to update the “Target Folder ID” to point it to the folder where you want to create the generated action plans.
The table needs to be updated to define the variable and toggle names used in the template document. More columns can be added to allow more environments, and more variables can be added, as well.
Clicking on the “Generate the Document” button will initiate the macro (it will ask for permissions to run; if you’re not sure it’s safe you can examine the code by navigating to Tools -> Script editor). The macro will:
- Look up the template
- Prompt for the environment name for which to generate the action plan
- Back up the previously generated action plan (copy + rename it to add timestamp to its name)
- Copy the template and rename it by replacing ” – Template” to ” – <environment name>”
- Replace the variables and remove the sections between OFF-toggles
- Display a message with the URL to the document

The result

Here’s the action plan section (the same we displayed above for the Template file) after generating the plan for DEV environment (notice that the 13c EM blackout section was removed as the toggle is set to “OFF“, and the variables have been replaced with the actual values):

Action Plan Generator – DEV

You can also take a look at complete files as they were generated: Action Plan Generator – DEV, Action Plan Generator – QA and Action Plan Generator – PROD.

I’ve used this tool a number of times already, as have several other members of my team. Initially, it takes more time to produce the action plan template instead of just putting down all the commands one after another because one needs to consider all environments at the same time. But it’s already much quicker for the next template, as large parts of the previous action plan can be reused. I also can confirm this really simplifies modification of the action plan, as the work happens in a single document – the template – and there’s no need to edit different documents for different environments. I believe this method really helps to reduce the mistakes that DBAs allow in the maintenance activities of their systems.

What’s next?

Automation is still the ultimate goal. But once you start working with the action plan templates, you’ll notice that it requires a generalization of the action plan in a way that it applies to all environments that you maintain. That is actually a halfway towards automation! You’re not investing your time in something that you’ll throw out as soon as your automation framework is in place and you’ve gone through the learning curve for using it, you’re actually creating something that can act as a foundation for your playbooks, recipes or configuration files. You’ll get real automation results faster, and if you’re also avoiding a few human errors on the way – it’s a win-win!

↧

How to build a cost-effective serverless blogging platform on GCP – Part 1

March 13, 2019, 6:00 am

≫ Next: How to build a cost-effective serverless blogging platform on GCP – Part 2

≪ Previous: Achieving rock-solid maintenance with template-based action plans

I remember when I started blogging in 2008. It was a nice learning experience, and it was rewarding, too – both the challenge of formulating my thoughts clearly and the feedback I received, and still am receiving, on some of the oldest blog posts. I have collected some of my blog posts under my own website me-dba.com, I don’t have too many views a month, but that doesn’t mean the blog has been inactive. I’ve used my blogging site for learning, too, and I’ve changed its backend several times trying to find a convenient and inexpensive way of running it without sacrificing the availability or scalability, and trying to find a solution that I was happy with.

This post is the first half of a two-part article where I’ll explain the current technical implementation of my blog site. The first part will explain how I implemented the “Blog as Code” principle to remove the dependencies on specific services and platforms. The second part will explain how the blog site was set up on Google Cloud Platform in a way that requires minimal cost and maintenance overhead. I believe the implementation is quite modern and interesting, and I’d be happy if these instructions helped at least one person to start his or her blogging journey.

It all started back in 2008 with wordpress.com’s free blog (it used a subdomain of wordpress.com, which I didn’t like). Then I purchased the wordpress.com’s subscription for $5/month to be able to use my own domain (I thought it was too expensive for just a few hundred views a month). After that, I installed WordPress on my NAS and hosted the blog from there (but it was too unreliable, and I had quite a few downtimes). The cloud era came and I migrated my WordPress installation to AWS EC2, which worked fine (but I still had to manage it by applying the updates).

Some time ago, I started learning AWS and GCP for my certifications, and I came up with a challenge to run my simple blog with the following in mind:

It needed to be free (or as inexpensive as possible – not counting the domain costs).
Accessible over HTTPS only (https is the new normal, why stick with anything less?). Let’s Encrypt SSL/TLS certificates are available for free!
The implementation should be scalable (not that I’d require it at the moment, but I want to build it future-proof). :)
Use some of the modern DevOps things I’ve been learning too (git, serverless, continuous deployment, Docker, etc).
No maintenance overhead for the services that I use (no manual updates).
Posting a new blog post should be painless and simple (without activities like “connect there, do this, then check that, …”)

I quickly figured out that I needed to simplify the whole implementation to have more options on the desk. I started by experimenting with Jekyll, which in the end allowed me to convert the blog to static web pages and get rid of the heavy WordPress platform. I tried hosting the blog from an AWS S3 Bucket with Route 53 or a GCP Cloud Storage Bucket with HTTP Load Balancer. I looked at GCP’s Cloud (HTTP) Functions and Triggers, I tried running it from a tiny VM instance, and I even tried GitHub Pages, but there was always one inefficiency or another to work around: no SSL support, cost increase, no scaling, insufficient theme support, etc.

My final solution, described here, is a serverless deployment of a static webpage blog that’s generated from the blog’s source stored in a GCP Cloud Repository using Jekyll. GCP Cloud Build Trigger is used to generate the new static web pages automatically upon a commit to the blog’s source repository master branch, and it also deploys the new version to the GCP App Engine that actually serves the web pages. The whole solution fits under the GCP Free Tier’s Always Free limits unless the visitor numbers grow very quickly, and even then this could be a very cost-effective solution. Of course, you should review the description of the Always Free limits carefully to make sure your region/service/configuration is covered before you configure your own blog.

Here’s a quick diagram on how different services interact to automatically deploy the new blog posts after the whole solution is implemented.

The workflow for a blog post publishing

The rest of this article will go through the configuration details, hopefully in sufficient detail for you to follow and set up your own blogging platform!

Blog as Code

Even when I had a WordPress blog, I used a text editor instead of the WYSIWYG editor. I frequently ran into formatting issues with the latter and would switch to the text editor to have more control over what was going on. I discovered Markdown later and really liked the cleanliness of the text files that could be translated into HTML. I even installed some WordPress plugins to support Markdown for blogging.

Having already familiarized myself with Markdown, the next logical step to take was “Blog as Code.” The main idea was to keep everything that’s required for rendering the whole blog in a Git repository. It makes a lot of sense to me not only for the usual benefits Git provides (you can google “why should I use Git” to discover more about it), but also because if this worked, I’d be able to get rid of the database backend and the heavier application tier (WordPress), which would simplify everything a lot.

Jekyll – A Static Site Generator

There are many different options for static site generators, but Jekyll is the most popular at the moment, which is why I chose it. It’s quite simple to set up locally, but since I’m going to use a Docker container later, it makes sense to start using it now. I’ll be using the jekyll/jekyll:latest Docker image.

Setting up Docker is out of scope for this article, but it needs to be done before continuing.

Download the latest Jekyll image and create an empty folder for the source code.

$ docker pull jekyll/jekyll
$ mkdir tempblog.me-dba.com
$ cd tempblog.me-dba.com

Start a container and obtain an interactive bash session, and create the initial blog site structure by running jekyll new, then run jekyll serve to test the blog. Notice that the current location (tempblog.me-dba.com) is attached as “/srv/jekyll” to the container, and the created files will be located outside the container. jekyll new will install all the required dependencies, ruby gems and themes, so this might take a while.
```
$ docker run --rm --publish 4000:4000 --volume="$PWD:/srv/jekyll" --privileged -it jekyll/jekyll:latest bash
$ jekyll new /srv/jekyll
$ jekyll serve
```
At this point, the blog should be accessible on the Docker host ip, port 4000. In my case the IP was 192.168.33.13, so the following two URLs served the demo pages included in the new blog structure (in your case the IP may differ): https://192.168.33.13:4000/ – is the front page of your blog.

We’ll use this moment to commit the container changes to the image from another session on our Docker host machine. Thus, all the downloaded items will be added as a new layer to the Jekyll’s Docker image, and they won’t need to be downloaded again when we start a container next time. Note: the container ID will be different in your case.

# from another session 
$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                               NAMES
04418b39cec7        e220e1251707        "/usr/jekyll/bin/ent…"   23 minutes ago      Up 23 minutes       0.0.0.0:4000->4000/tcp, 35729/tcp   laughing_benz

# Use the ID from the previous output
$ docker commit -m "created by adding all Gems" 04418b39cec7 jekyll/jekyll:latest
sha256:9667c6cbe572e78a8d3b552286b21eb0e34f8264c6c3c7c2b63850680b6cc891

Next, switch back to the first session, and use Ctrl-C to terminate the jekyll serve followed by exit to exit and remove the container.

At this stage, we have generated the skeleton of the source code of the blog. The structure of the current directory should look similar to this :

$ find .
.
./.gitignore
./.sass-cache
./.sass-cache/04702c00cd9fa60073a601ea6f7a61c45d2d95af
./.sass-cache/04702c00cd9fa60073a601ea6f7a61c45d2d95af/minima.scssc
./.sass-cache/2644ddacc47d466a8fd18366436f273677dc9e09
./.sass-cache/2644ddacc47d466a8fd18366436f273677dc9e09/_base.scssc
./.sass-cache/2644ddacc47d466a8fd18366436f273677dc9e09/_layout.scssc
./.sass-cache/2644ddacc47d466a8fd18366436f273677dc9e09/_syntax-highlighting.scssc
./404.html
./_config.yml
./_posts
./_posts/2019-01-21-welcome-to-jekyll.markdown
./about.md
./index.md
./Gemfile
./Gemfile.lock
./_site
./_site/404.html
./_site/about
./_site/about/index.html
./_site/index.html
./_site/assets
./_site/assets/main.css
./_site/assets/minima-social-icons.svg
./_site/feed.xml
./_site/jekyll
./_site/jekyll/update
./_site/jekyll/update/2019
./_site/jekyll/update/2019/01
./_site/jekyll/update/2019/01/21
./_site/jekyll/update/2019/01/21/welcome-to-jekyll.html

Only the highlighted files are the source code of the blog. The _site directory is the generated version of the static pages, and .sass-cache is some caching mechanism that Jekyll uses. Both of these locations are excluded in .gitignore. This is a good point to edit the _config.yml to customize the blog – change the theme of the blog, add plugins and adjust other configurables. Having completed the modifications, test them with jekyll serve again (note: the correct use of whitespace indentation is critical in YAML files):

$ cat _config.yml | grep -v "^ *#"

title: My Awesome Serverless Blog on GCP
email: elsins@pythian.com
description: >- # this means to ignore newlines until "baseurl:"
  This is the example blog created by Maris Elsins in 2019 to support the article named
  "Building a Cost Efficient Serverless Blogging Platform on GCP".
baseurl: "" # the subpath of your site, e.g. /blog
url: "https://tempblog.me-dba.com"
twitter_username: MarisDBA

markdown: kramdown
theme: minima
plugins:
  - jekyll-feed

$ docker run --rm --publish 4000:4000 --volume="$PWD:/srv/jekyll" --privileged -it jekyll/jekyll:latest jekyll serve

Disqus – Adding Comments to the Blog Posts

Native support for the Disqus comment platform will depend on the theme you choose for your blog. In this case, I used the default theme called minima-2.5.0 that already includes the required code to add the Disqus comment threads. I think many themes already include this code, but it’s not really complicated to customize other themes to add the required code too.

Before adding the comment sections to the blog posts, the new site needs to be registered in the Disqus platform. Go to https://disqus.com/admin/create/ (register a new user if you don’t have one yet), and follow these steps to register your new site:

Choose a name for your site, note the unique Disqus URL as its first part is the Disqus shortname that will be required later in Step 4 (in this case it’s “tempblog-me-dba-com”):
Register the new site on Disqus
If you’re presented with the choice, pick “Jekyll” as the blogging platform.
Change the configuration of the site based on your preference, make sure you set the blog URL correctly to your production URL (that still doesn’t serve anything at this moment):
Configure Disqus

Once the site is registered, the shortname needs to be added to the _config.yml:

disqus:
  shortname: tempblog-me-dba-com

As instructed in the theme’s README.md, the site needs to be regenerated in “production mode” by setting the JEKYLL_ENV=production before running jekyll serve or jekyll build like this:

$ docker run --rm --publish 4000:4000 --volume="$PWD:/srv/jekyll" --privileged -it jekyll/jekyll:latest bash -c "JEKYLL_ENV=production jekyll serve"

And that’s it. The comments section should already be rendered correctly, like this, which means we’ve completed setting up the initial version of the blog’s source code:

Example of a working comments section

GCP Source Repository

For the purposes of this demonstration, I’ll store the blog’s code in GCP Source Repositories. Technically, Github would work too, but later I would be required to grant access privileges to all my repositories to GCP Cloud Builder to automate the deployment of the new blog posts. In my case, I have a few private GitHub repositories which I don’t want to grant this access to, so I use the GCP Source Repository.

Let’s create the repository and commit our code to it! This requires several steps:

Create a GCP Project, or use an existing project. In my case the name of the project is tempblog-me-dba.
Install Google Cloud SDK on the machine where your source code is located.

Initialize the configuration of the SDK by running gcloud init, log on as your user and set the newly created project as the default.

$ gcloud init
...
$ gcloud config list
[core]
account = Maris.Elsins@me-dba.com
disable_usage_reporting = True
project = tempblog-me-dba
...

Create the source repository from the command line:

$ gcloud source repos create tempblog-me-dba-repo
API [sourcerepo.googleapis.com] not enabled on project [520897212625].
 Would you like to enable and retry (this will take a few minutes)?
(y/N)?  y

Enabling service [sourcerepo.googleapis.com] on project [520897212625]...
Waiting for async operation operations/acf.6999a26b-ba23-42d3-b1f0-b97ddced5057 to complete...
Operation finished successfully. The following command can describe the Operation details:
 gcloud services operations describe operations/tmo-acf.6999a26b-ba23-42d3-b1f0-b97ddced5057
API [sourcerepo.googleapis.com] not enabled on project [520897212625].
 Would you like to enable and retry (this will take a few minutes)?
(y/N)?  y

Created [tempblog-me-dba-repo].
WARNING: You may be billed for this repository. See https://cloud.google.com/source-repositories/docs/pricing for details.

Install git if you haven’t done it yet. I recommend using version 2.0.1+ as it supports Authentication for GCP better. My code was stored on a Centos Machine, so I followed https://tecadmin.net/install-git-2-0-on-centos-rhel-fedora/ to set it up.

Clone the empty repository from GCP to your workstation:

$ cd
$ gcloud source repos clone tempblog-me-dba-repo
Cloning into '/home/vagrant/tempblog-me-dba-repo'...
Checking connectivity... done.
warning: remote HEAD refers to nonexistent ref, unable to checkout.

Project [tempblog-me-dba] repository [tempblog-me-dba-repo] was cloned to [/home/vagrant/tempblog-me-dba-repo].

Copy all the source files from the original location into the cloned repository (it would have been more efficient to create the repository in the beginning). :)
```
$ cd ~/tempblog.me-dba,com
$ cp -vrp ./* ../tempblog-me-dba-repo/
$ cp -vrp .gitignore ../tempblog-me-dba-repo/
$ cp -vrp .sass-cache ../tempblog-me-dba-repo/
```

Commit the new files to the master and push them to the remote repository.

$ git add -A
$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached ..." to unstage)

        new file:   .gitignore
        new file:   404.html
        new file:   Gemfile
        new file:   Gemfile.lock
        new file:   _config.yml
        new file:   _posts/2019-01-21-welcome-to-jekyll.markdown
        new file:   about.md
        new file:   index.md
$ git commit -m "initial commit"
...
$ git push origin master
Counting objects: 11, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (10/10), done.
Writing objects: 100% (11/11), 3.90 KiB | 0 bytes/s, done.
Total 11 (delta 0), reused 0 (delta 0)
To https://source.developers.google.com/p/tempblog-me-dba/r/tempblog-me-dba-repo
 * [new branch]      master -> master

That’s it! You can verify the remote repository was updated by visiting https://source.cloud.google.com/repos.

Summary

We’ve come to the end of the first part of the article. We’ve set up all the required tools to generate a static html-based blog site with the help of Jekyll, and we’ve also configured Disqus for the only dynamic content that there is – the comments. This implementation doesn’t require any database behind it so all the configuration and the blog posts itself can be easily stored in a source repository. We’ve also started up the blog locally by using the jekyll serve from a Docker instance we used in the process. Of course, these are only the preparation tasks. We still need to set up all the infrastructure on GCP to serve the blog and make it available to the public. That’s the story for the second part of this series.

See you soon in Part 2!

↧

How to build a cost-effective serverless blogging platform on GCP – Part 2

March 14, 2019, 6:00 am

≪ Previous: How to build a cost-effective serverless blogging platform on GCP – Part 1

Welcome back to the 2nd part! Last time, we set up the tools that allowed managing the blog as a set of configuration and Markdown-formatted files stored in a source code repository. Now it’s time to deploy the blog on GCP to allow the world to see it!

Serverless Deployment

Going serverless made a lot of sense to me, especially because my blog is not very well-attended. The costs of these services are calculated only for the actual resource consumption. GCP App Engine also provides some resources under the Always Free program and based on my calculations, my resource usage would be low enough to be able to use the service free of charge.

Another reason for using serverless technologies is the learning opportunity. There is a common trend to decompose large, monolithic applications into smaller pieces and move as much as possible to the serverless world, which allows simpler management of the individual functions. I wanted to learn a little about how it works in practice. Unfortunately, using GCP App Engine was too simple, so my learning goal was not really fulfilled.

GCP App Engine

At this moment, we have our blog files stored locally in the _site directory. Fortunately, GCP App Engine allows deploying and serving them very easily, and as the site is built using only static files, everything is even more simple.

In the root folder of our git repository we need to create a file app.yaml with the following content to tell how the deployment needs to be done:

$ cat app.yaml
runtime: python27
api_version: 1
threadsafe: true

handlers:
- url: /
  static_files: _site/index.html
  upload: _site/index.html
  secure: always
  redirect_http_response_code: 301

- url: /(.*)
  static_files: _site/\1
  upload: _site/(.*)
  secure: always
  redirect_http_response_code: 301

The file basically defines how each URL is handled by the application:

when the root URL https://<domain>/ is accessed, the _site/index.html will be displayed
when any other URL https://<domain>/<path> is requested, it will serve the _site/<path> file
secure: always tells that the https protocol will always be used
redirect_http_response_code: 301 means that whenever an HTTP request is made, it will be redirected to the corresponding HTTPS URL by using the HTTP 301 response.

Obviously, we want to commit the new file to the repo too:

$ git add app.yaml
$ git commit -m "adding the App Engine deployment file"
[master eeea76a] adding the App Engine deployment file
 1 file changed, 22 insertions(+)
 create mode 100644 app.yaml
$ git push
Counting objects: 3, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 451 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
remote: Resolving deltas: 100% (1/1)
To https://source.developers.google.com/p/tempblog-me-dba/r/tempblog-me-dba-repo
   c7e7fc4..eeea76a  master -> master

And finally, the blog can be deployed. You’ll have to choose the region to deploy the blog as part of this process.

$ gcloud app deploy
You are creating an app for project [tempblog-me-dba].
WARNING: Creating an App Engine application for a project is irreversible and the region
cannot be changed. More information about regions is at
<https://cloud.google.com/appengine/docs/locations>.

Please choose the region where you want your App Engine application
located:

 [1] asia-east2    (supports standard and flexible)
 [2] asia-northeast1 (supports standard and flexible)
 [3] asia-south1   (supports standard and flexible)
 [4] australia-southeast1 (supports standard and flexible)
 [5] europe-west   (supports standard and flexible)
 [6] europe-west2  (supports standard and flexible)
 [7] europe-west3  (supports standard and flexible)
 [8] northamerica-northeast1 (supports standard and flexible)
 [9] southamerica-east1 (supports standard and flexible)
 [10] us-central    (supports standard and flexible)
 [11] us-east1      (supports standard and flexible)
 [12] us-east4      (supports standard and flexible)
 [13] us-west2      (supports standard and flexible)
 [14] cancel
Please enter your numeric choice:  10

Creating App Engine application in project [tempblog-me-dba] and region [us-central]....done.
Services to deploy:

descriptor:      [/home/vagrant/tempblog-me-dba-repo/app.yaml]
source:          [/home/vagrant/tempblog-me-dba-repo]
target project:  [tempblog-me-dba]
target service:  [default]
target version:  [20190122t031126]
target url:      [https://tempblog-me-dba.appspot.com]


Do you want to continue (Y/n)?  Y

Beginning deployment of service [default]...
??????????????????????????????????????????????????????????????
?? Uploading 16 files to Google Cloud Storage               ??
??????????????????????????????????????????????????????????????
File upload done.
Updating service [default]...done.
Setting traffic split for service [default]...done.
Deployed service [default] to [https://tempblog-me-dba.appspot.com]

You can stream logs from the command line by running:
  $ gcloud app logs tail -s default

To view your application in the web browser run:
  $ gcloud app browse

The deployment is completed, and the URL of the blog is presented to us: https://tempblog-me-dba.appspot.com. Try it out. (Don’t forget to use your own URL, not this one!)

Using a Custom Domain Name

The default URL was definitely something that I didn’t want to keep because it had the unappealing “appspot.com” part. Therefore, I obtained my own domain name (you can google something like “cheap domain registrars” to find a registrar suitable for you. It should be possible to get one for around $10/year). Once you have your own domain name, you will need to map it to the App Engine deployment by following the instructions in the Custom Domain documentation. You will need to access your domain registrar to add a few more records to the DNS configuration as part of this process:

A TXT record to validate the ownership of the domain.
The IPv4 and IPv6 A and AAAA records that point the domain to the App Engine’s IP addresses.

The validation was easily completed, but I can imagine that this could be quite a tedious process, especially if you’ve set higher TTL times for the DNS records and if you use some less popular registrars. However, the documentation for mapping custom domains is well-written and easy to follow.

Once the domain name is validated and the domain is mapped to the app engine, the SSL certificate provisioning will start automatically. It’s not always simple to see if it succeeded in the console, so I use the CLI to check the status:

$ gcloud app ssl-certificates list --filter="DOMAIN_NAMES=tempblog.me-dba.com"
ID        DISPLAY_NAME         DOMAIN_NAMES
11030802  managed_certificate  tempblog.me-dba.com
$ gcloud app ssl-certificates describe 11030802
certificateRawData:
  publicCertificate: |
    
    -----END CERTIFICATE-----
    -----BEGIN CERTIFICATE-----
    
    -----END CERTIFICATE-----
displayName: managed_certificate
domainMappingsCount: 1
domainNames:
- tempblog.me-dba.com
expireTime: '2019-04-22T11:03:19Z'
id: '11030802'
managedCertificate:
  lastRenewalTime: '2019-01-22T12:03:21.890738152Z'
  status: OK
name: apps/tempblog-me-dba/authorizedCertificates/11030802
visibleDomainMappings:
- apps/tempblog-me-dba/domainMappings/tempblog.me-dba.com

Once these steps are completed, the blog should be available on the new URL – https://tempblog.me-dba.com.

It’s easy to see that we can stop the configuration here, as the blog is published and everything is working. Deploying a new blog post would require the following steps:

Write the new blogpost in Markdown and store the new file in _posts directory of the repository.
Commit the changes and push to master (so they are not lost).

Rebuild the static files to update them by using Jekyll build:

$ docker run --rm --publish 4000:4000 --volume="$PWD:/srv/jekyll" --privileged -it jekyll/jekyll:latest bash -c "JEKYLL_ENV=production jekyll serve"

Deploy the new site to GCP App Engine.
```
$ gcloud app deploy
```

The changes should be present online soon after completing the last step. But we’re not going to stop here as I wanted to remove the manual Steps 3 and 4 and automate the deployments so that my only task would be creating the new content and committing it to the repository. The implementation of this automation will be covered in the next section.

Continuous Publishing

Now, we can set up a mechanism that will watch the source repository we created and detect when new code is committed to the master branch. When that happens, a jekyll build will be triggered to recreate the static files, and deploy them to the App Engine as a new version. This section involves using GCP Container Register for storing the Jekyll Docker image and GCP Cloud Build for automation of the work.

Pushing the Docker Image to GCP Container Registry

Before the image is uploaded to the GCP Container Registry, I’ll add one more tiny script that will take care of running the jekyll build. I’m doing this because I experienced some file access privilege issues while the build was ongoing and they were too difficult to resolve. I think they had to do with the fact that the Docker container mounts the local /workspace directory of the Cloud Build instance to /srv/jekyll to access the required files. The script will simply copy all files to some local place (/u01) inside the container to work with them, and then will copy back the built _site folder. Here’s what needs to be done (this could have been done with a Dockerfile too, but I find this way a little simpler to work with):

Start the container with an interactive bash session and create the required script file:

$ docker run --rm  --volume="$PWD:/srv/jekyll" --privileged -it jekyll/jekyll:latest bash

# the following set of 8 lines is a single command
bash-4.4# echo "rm -rf /u01 /srv/jekyll/_site > /dev/null
mkdir /u01
chmod 777 /srv/jekyll /u01
cp -r /srv/jekyll/* /u01/
cd /u01/
JEKYLL_ENV=production jekyll build
cp -r /u01/_site /srv/jekyll/
rm -rf /u01 > /dev/null" > /root/init_build.sh

bash-4.4# chmod a+x /root/init_build.sh

Commit the change to the image from another session while the container is running:

$ docker ps -a
CONTAINER ID        IMAGE                                  COMMAND                  CREATED             STATUS              PORTS                 NAMES
ef8bd6153c06        gcr.io/tempblog-me-dba/jekyll:latest   "/usr/jekyll/bin/ent…"   16 seconds ago      Up 15 seconds       4000/tcp, 35729/tcp   serene_blackwell

# use the ID from the previous output
$ docker container commit ef8bd6153c06 jekyll/jekyll:latest
sha256:579bd4e195cff00abb38b967009f7727ce1814ae9f8677ce9d903a7683fcbd6e

“Exit” from the interactive bash session of the running container.

We need to push our Docker image to the GCP Container Registry. This is required because The Cloud Build can’t use external container registries, and we have customized the image a little bit by pre-installing the required gem dependencies to be able to start the container more quickly.

We’re going to upload the image to GCP Container registry via command line, thus, the “Google Container Registry API” needs to be enabled for the GCP Project first. Log on to the GCP console, navigate to “APIs & Services” module, locate and click on the “Container Registry API”, and finally enable it if it’s not yet enabled.

Docker images that are going to be pushed to the GCP Container Registry need to follow a specific naming of [HOSTNAME]/[PROJECT-ID]/[IMAGE], so the first task is to tag the customized Jekyll image:

$ docker image tag jekyll/jekyll:latest gcr.io/tempblog-me-dba/jekyll:latest
$ docker image ls gcr.io/tempblog-me-dba/jekyll
REPOSITORY                      TAG                 IMAGE ID            CREATED             SIZE
gcr.io/tempblog-me-dba/jekyll   latest              9667c6cbe572        19 hours ago        480MB

Next, Docker needs to be configured to make it aware of the GCP Container Registry and how to access it. This process is a super simple one-liner in Google Cloud SDK. You should have it already installed if you’re working on the same machine where you set up your access to the GCP Source Repository; otherwise, you’ll need to install it again:

$ gcloud auth configure-docker
The following settings will be added to your Docker config file
located at [/home/vagrant/.docker/config.json]:
 {
  "credHelpers": {
    "gcr.io": "gcloud",
    "us.gcr.io": "gcloud",
    "eu.gcr.io": "gcloud",
    "asia.gcr.io": "gcloud",
    "staging-k8s.gcr.io": "gcloud",
    "marketplace.gcr.io": "gcloud"
  }
}

Do you want to continue (Y/n)?  Y

Docker configuration file updated.

Finally, we can push the Docker image to the container registry:

$ docker push gcr.io/tempblog-me-dba/jekyll:latest
The push refers to repository [gcr.io/tempblog-me-dba/jekyll]
bbe16356ce18: Pushed
4ec78dd675c8: Pushed
ce753fc763b8: Layer already exists
cb136294e186: Layer already exists
bf8d0e7b5481: Layer already exists
7bff100f35cb: Layer already exists
latest: digest: sha256:bc1dd9adf3c08422b474545c82a83f1e79fce23d7feaeb9941afa0b89c093b03 size: 1580

GCP Cloud Build

We’ll use GCP Cloud Build to define the build and deploy processes. I must say I was very impressed by how little configuration is required and how simple it was to instruct the Cloud Build to do what I needed. In our case, we need Cloud Build to run the following two commands:

Build the new blog files:

$ docker run --rm --publish 4000:4000 --volume="$PWD:/srv/jekyll" --privileged -it jekyll/jekyll:latest bash -c "JEKYLL_ENV=production jekyll build"

Deploy the new version to App Engine:
```
$ gcloud app deploy
```

To achieve that, we need to create a new file cloudbuild.yaml in the root of the repository, and its contents should look like this:

$ cat cloudbuild.yaml
steps:
  - name: 'gcr.io/cloud-builders/docker'
    args: ['run', '--rm', '--volume=/workspace:/srv/jekyll', '--privileged', 'gcr.io/tempblog-me-dba/jekyll:latest', '/root/init_build.sh']
  - name: "gcr.io/cloud-builders/gcloud"
    args: ["app", "deploy"]
timeout: "600s"
$ git add cloudbuild.yaml
$ git commit -m "Adding the instructions file for Cloud Build"
$ git push

Next, a Build Trigger needs to be created to initiate the build when a commit to master happens. Navigate to “Cloud Build” -> “Triggers” in GCP Cloud Console. Enable the Cloud Build API if you’re presented with the message that it’s disabled. Then create a trigger:

Select “Cloud Source Repository” option.
Select the name of your repository (tempglog-me-dba-com in my case).
Provide the following options:
- Give your trigger a name.
- Choose “branch” as Trigger Type (“Tag” is another option. In this case the trigger will watch for specific git tags that will initiate the build).
- Choose “Cloud Build configuration file (yaml or json)” as the Build Configuration.
- Make sure that the Build Configuration File Location is already “/ cloudbuild.yaml”.
- Click on “Create Trigger”.

The trigger will be created. Run it manually by Clicking on “Run Trigger” and “Master”:

Running the trigger manually

Navigate to “History” on the left side menu, you may see that the build runs for a while, and then fails:

The app deployment step fails at this point

In my case, the first step succeeded (it should be the same for you too), but the second step failed. The log entries visible on the same page displayed the following entry:

Step #1: API [appengine.googleapis.com] not enabled on project [520897212625]. 
Step #1: Would you like to enable and retry (this will take a few minutes)? 
Step #1: (y/N)? 
Step #1: ERROR: (gcloud.app.deploy) User [520897212625@cloudbuild.gserviceaccount.com] does not have permission to access app [tempblog-me-dba] (or it may not exist): App Engine Admin API has not been used in project 520897212625 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/appengine.googleapis.com/overview?project=520897212625 then retry

This means the App Engine Admin API needs to be enabled for the Service account that the Cloud Build uses – 520897212625@cloudbuild.gserviceaccount.com in this case. Navigate to “IAM & Admin” -> “IAM”, find the service account listed, and add “App Engine Admin” role to that account:

Adding the “App Engine Admin” role to the service account

Additionally, the “App Engine Admin API’ needs to be enabled by navigating to “APIs & Services”.

It may require a minute or two for the new privileges to take effect. Navigate back to the “Cloud Build” -> “History”, open the details of the failed run, and click on “Retry”, to run the build again. This time the build should succeed.

Test the Workflow

Everything is ready. The only thing left is testing the publishing of new blog posts. I’ll create this simple file in the _posts directory and will commit it to the master (this is all that should be needed to publish new content to the blog):

$ cat _posts/2019-01-23-it_works.markdown
---
layout: post
title:  "It Works!"
date:   2019-01-23 12:19:00 +0300
comments: true
categories: serverless blog
---
Congratulations! Your serverless blog works!

$ git add -A
$ git commit -m "It Works"
$ git push

In a short while (it sometimes took around 10 minutes after the new version became available on the App Engine, due to caching, I suppose) the post should be live – published on your new blog. Congratulations!

Closing Remarks

The solution provided in this article was designed to be as inexpensive as possible, and it relies on Always Free resources that GCP provides. Despite that, we should take care to avoid unexpected service charges in case GCP changes the limitations of Always Free or your new blog goes viral suddenly! Take your time to look at the “BIlling” -> “Budgets & alerts”, and create an alert that will notify you if the service charges reach a certain level. I’ve set up the monthly budget of $2 USD, and configured an alert if 50% of the budget is consumed. I can rest assured that I will be notified if the costs go up suddenly without having to log on to the GCP console periodically to check it by myself.

We haven’t discussed troubleshooting and monitoring. This is a topic that would require an article of its own; however, it’s good to know that applications running on the App Engine are logged and monitored by default and by going to the “App Engine” menu in the GCP console, you will be presented with a good amount of usage information. Also, Stackdriver can be used to check the HTTP access logs.

I hope you’ve enjoyed this journey and have managed to build a working blogging platform on GCP by following these instructions. This is just a starting point, and there are definitely more things to learn and discover. There might be a learning curve for working with Jekyll – how to create new blog posts, how to customize the template and the page layouts, how to include images and where to place them in the repository, and so on. But don’t be overwhelmed by the volume of new things to learn. There is plenty of helpful material on the internet already to research and follow. I also understand that the level of detail I’ve provided about working with GCP may be insufficient for people who have never touched it before. Fortunately, there’s also plenty of learning material provided by Google, including documentation to read, sample projects to work on, and videos to watch. It’s clear the cloud is not going away anytime soon, and the knowledge you gain will become handy in the future, too.

Happy blogging!

↧