Hi. I’m Rudy Lee.

Here are some thoughts of mine.

How to Run Code Snippet in Heroku Rails Console

Often in my day to day work, I need to run a snippet in rails console to fix or investigate issues. Since our application is hosted on Heroku, I can do this by running heroku rails console --app app-name and in a matter of seconds I am connected to the rails console on production.

In some cases, I need to run a really long code snippet especially if the fix needs to touch a lot of data. Usually I will copy the snippet line by line into the rails console because sometimes one of the lines will return a data and it will mess up the snippet.

After wasting a lot of time, I started thinking on a better way to do this. That’s when I decided to use eval and open to help me running long code snippet. The idea is pretty simple, I need a place to securely host the code snippet, use open to read the code snippet and run it using eval.

At the moment, I am using Github secret gist. You can use S3 bucket or your own web server as long as heroku can access it. If you are using Github gist, make sure you use the raw link. You can grab the raw link by clicking the Raw button on the top right hand corner of your gist. The format of the link should be something like https://gist.githubusercontent.com/yourgithubaccount/randomhash/raw/randomhash/file.rb

After you got the link, you can write 2 lines snippet to read and execute the file.

1
2
file = open("put-link-to-your-code-snippet-here")
eval(file.read)

I hope you find that useful and let me know if you have a better way to do this.


Go

Download CSV Test in Go

In the project I am working on, I have to write a test for a function that download CSV from external website and store it locally. I am pretty new in Go so please let me know if I am doing it wrong.

The function is pretty simple, it uses the Go net/http package to send a GET request and os package to write the HTTP response to a local file. See the full code below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
package main

import (
  "io"
  "log"
  "net/http"
  "os"
)

func main() {
  DownloadCSV("https://www.asx.com.au/asx/research/ASXListedCompanies.csv", "asx-companies.csv")
}

func DownloadCSV(url string, filename string) error {
  out, _ := os.Create(filename)
  defer out.Close()

  resp, err := http.Get(url)
  defer resp.Body.Close()

  if _, err := io.Copy(out, resp.Body); err != nil {
    log.Fatal(err)
  }

  return err
}

The function we want to test is DownloadCSV which expects two arguments. url is a URL endpoint that host the CSV and filename is the name of local file to store the HTTP response.

There are three things I want to test here:

  1. http.Get shouldn’t return an error
  2. It should create a local CSV file
  3. The content of the local CSV file should match with the server

http.Get shouldn’t return an error

To test the first one, we will use an awesome Go package called net/http/httptest. This package allows us to create a HTTP server and set the response to whatever you want.

In the code below, you can see we start with creating the server which return a 200 response status ( line 10 – 12 ). After that, we pass the URL of the server to the DownloadCSV function and also the required filename.

At the end, we do the assertion to make sure the function does not return an error. We are using Go testing package Errorf method to output the message if the assertion fail. This is important because we need it to mark the test as FAIL.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
package main

import (
  "net/http"
  "net/http/httptest"
  "testing"
)

func TestDownloadCSV(t *testing.T) {
  ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
    w.WriteHeader(http.StatusOK)
  }))
  defer ts.Close()

  file := "test.csv"
  err := DownloadCSV(ts.URL, file)

  if err != nil {
    t.Errorf("Shouldn't have received an error, got %s", err)
  }
}

It should create a local CSV file

We are going to use the os package to check the file existence and delete it after the test finished. Three functions we will use are Stat, IsNotExist and Remove. Stat and IsNotExists are used to assert the file existence and Remove will clean up the file after test finished.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
package main

import (
  "net/http"
  "net/http/httptest"
  "os"
  "testing"
)

func TestDownloadCSV(t *testing.T) {
  ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
    w.WriteHeader(http.StatusOK)
  }))
  defer ts.Close()

  file := "test.csv"
  err := DownloadCSV(ts.URL, file)

  if err != nil {
    t.Errorf("Shouldn't have received an error, got %s", err)
  }

  if _, err := os.Stat(file); os.IsNotExist(err) {
    t.Errorf("Should have created a CSV file")
  }

  os.Remove(file)
}

The content of the local CSV file should match with the server

The simplest way to do this test is to configure the httptest server to return a string and check if that string exists in the test file. We can take this a bit further by using an actual CSV file.

We can use test fixture to achieve this. We will create a new folder called testdata, put a CSV file inside it, set the httptest server to read the file using ioutil.ReadFile and return the content to the client. We will also use ioutil.ReadFile to do the assertion to compare the content of the files. Here is the final version of the test:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
package main

import (
  "bytes"
  "io/ioutil"
  "net/http"
  "net/http/httptest"
  "os"
  "testing"
)

func TestDownloadCSV(t *testing.T) {
  test_csv, _ := ioutil.ReadFile("testdata/asx-companies.csv")

  ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
    w.Write(test_csv)
  }))
  defer ts.Close()

  file := "test.csv"
  err := DownloadCSV(ts.URL, file)

  if err != nil {
    t.Errorf("Shouldn't have received an error, got %s", err)
  }

  if _, err := os.Stat(file); os.IsNotExist(err) {
    t.Errorf("Should have created a CSV file")
  }

  expected, _ := ioutil.ReadFile(file)

  if !bytes.Equal(test_csv, expected) {
    t.Errorf("CSV file should have correct content")
  }

  os.Remove(file)
}

You can check out the full code here: https://github.com/rudylee/go-playground and please leave a comment if find any errors.


Google Spreadsheet as JSON API

Data store is an important piece in most of the modern applications. The implementation can range from a simple text file to a complicated database systems. In this blog post, I will show you how to use Google Spreadsheet as a data store for your application.

Google Spreadsheet provides a convenient way to store, edit, share and retrieve data. This makes Google Spreadsheet appealing if you want to quickly prototype an app and don’t want to spend time building a CRUD interface to manage your data. It is also allow you to output the spreadsheet data in JSON format. This means you use Google spreadsheet as your JSON API.

Publish the spreadsheet to the web

In order to enable this feature, first you need to publish the spreadsheet to the web. You can easily do this by going to the File menu and choose Publish to the web. This only works if you own or have admin access to the spreadsheet. See the screenshot below.

Get the ID of the spreadsheet

The next thing that you have to do is getting the spreadsheet ID from the URL.

The URL of your spreadsheet should be something like this https://docs.google.com/spreadsheets/d/17CAMo4mY7pdlk7jgV2385FLVzDV3L8cUDidhfge8U_J4/edit#gid=0

The spreadsheet ID is the characters between the d and edit, which in the example above is 17CAMo4mY7pdlk7jgV2385FLVzDV3L8cUDidhfge8U_J4

Copy the ID and construct the JSON API endpoint

After retrieving the ID, you can start constructing the JSON API endpoint. The URL format as follow:

1
https://spreadsheets.google.com/feeds/list/replace-this-with-your-spreadsheet-id/od6/public/values?alt=json

If we are using the spreadsheet URL in the previous section ( https://docs.google.com/spreadsheets/d/17CAMo4mY7pdlk7jgVRmgD5FLVzDV3L8cUDiHaT8U_J4/edit#gid=0 ), the JSON API URL will be:

1
https://spreadsheets.google.com/feeds/list/17CAMo4mY7pdlk7jgVRmgD5FLVzDV3L8cUDiHaT8U_J4/od6/public/values?alt=json

The spreadsheet is public but not published to the web

In some cases, you might want to use a public Google spreadsheet but it is not published to the web. I discovered from the official Google forum that you can use importrange formula to retrieve the data from another spreadsheet and import it into your spreadsheet.

1
=importrange("URL-TO-SPREADSHEET", "SHEET NAME!CELL RANGE")

Take for example this public spreadsheet that I don’t have admin access to: https://docs.google.com/spreadsheets/d/1ql32s8kcUB-Q8AEwxrCzPYJzNgaQ2CknW4J0rlnJqfE/edit#gid=1671204426

I will create another spreadsheet and use the importrange formula to import the data from that public spreadsheet into my spreadsheet

1
=importrange("https://docs.google.com/spreadsheets/d/1ql32s8kcUB-Q8AEwxrCzPYJzNgaQ2CknW4J0rlnJqfE","SBW Optimal Conversions!A1:I190")

It should look something like this:

As you might think, this solution is prone to error because the formula will break if the owner of the original spreadsheet changes the sheet’s name.

It’s something I can live with since it’s so much easier to update the sheet’s name rather than asking the owner to publish spreadsheet to the web.


Trello Card Repeater

I’ve been using Trello as my primary task managers for the past couple of years. Recently, I found a very useful feature in Trello that helps me automating the creation of repetitive tasks.

This feature allows you to tell Trello to create a card on specified time such as daily, weekly, monthly or annually.

At the moment, I am using this feature to create a few daily tasks that I want to do first in the morning such as meditation, coding exercise and learning chinese.

I am not a disciplined person and I found that this feature has been helping me forming new habits. I set it to put the tasks on the top of my todo list which makes it difficult for me to ignore those tasks in the morning.

Research says it takes around 66 days for a new habit to form. I suggest you to try this feature if you want to create a new habit and make it part of daily routine.

Check out this link for more details on how to use and set up the repeat card https://blog.trello.com/trello-card-repeater


Run Sidekiq Jobs Without Starting Worker Process

You can add the code snippet below to config/initializers/sidekiq.rb if you don’t want to start a separate sidekiq workers.

The configuration below will make sure that the sidekiq jobs will be executed without worker process.

This is handy if you don’t want to open an extra terminal tab or tmux window for worker process.

1
2
3
4
5
# config/initializers/sidekiq.rb
if Rails.env.development?
  require 'sidekiq/testing'
  Sidekiq::Testing.inline!
end

See the official Sidekiq wiki for more information: https://github.com/mperham/sidekiq/wiki/Testing


Regular Expression For A String Containing One Word But Not Another

Last weekend, one our apps that is hosted on Heroku were reporting a lot of R14 errors.

R14 is an error that thrown by Heroku if the machine is running out of memory.

I quickly jumped into (logentries)[https://logentries.com/] to download the log file and opened it in Sublime Text.

However, I was having problem finding the request that is causing the problem because we also have our background job workers reporting R14 errors.

I decided to use regex to find a line that has R14 but doesn’t contain any of the background worker’s name,

This is the regex that I used to find the line:

1
^(?!.*(lowworker|highworker|run)).*R14.*$

The regex above will match the line that has R14 but doesn’t contain lowworker or highworker and run


Monitor Background Jobs with New Relic Query Language

Background

In this blog post, I’ll show you how to set up an alert to monitor your sidekiq jobs using New Relic Query Language and New Relic Alert.

I was given a task to find a solution to monitor our sidekiq jobs. In the past, I used New Relic Sidekiq Plugin ( https://newrelic.com/plugins/secondimpression/131 ) to do this.

The plugin is a Ruby app that connects to your Redis instance, retrieves all of the sidekiq metrics such as jobs, queues and send it to New Relic using the agent library.

This means you need to host the Ruby app somewhere and make sure the plugin can connect to your Redis instance.

However, I found a much better solution using NRQL that doesn’t require you to set up a new server or install any plugins.

New Relic Query Language ( NRQL )

NRQL definition from the official New Relic docs ( https://docs.newrelic.com/docs/insights/nrql-new-relic-query-language/using-nrql/introduction-nrql )

The New Relic Query Language (NRQL), similar to SQL, is a query language for making calls against the Insights event database. NRQL enables you to query data collected from your application and transform that data into dynamic charts. From there, you can interpret your data to better understand how your application is used in a variety of ways.

Using NRQL, You can run a query to get the amount of background jobs that has been executed for a specific time period.

Here is the query to get the count of background jobs:

1
SELECT count(name) FROM Transaction WHERE transactionType='Other'

New Relic Alert and NRQL

First thing you have to do is to create a New Relic alert policy using NRQL. See the screenshots below:

Choose NRQL on the Categorize step

And put the the following query:

1
SELECT count(name) FROM Transaction WHERE transactionType='Other'

After that, you can set up a condition when it will fire the alert.

In the screenshot below, you can see that I set the alert to fire if there are no background jobs running within 15 minutes.

I hope this tutorial will give you an idea on how to monitor your background jobs.


Setting Up Elasticsearch Watcher to Check For Cluster Status on Elastic Cloud

Last week, I was busy migrating our staging and production Elasticsearch clusters from AWS Elasticsearch to Elastic Cloud. The reason behind this migration is because we need dynamic scripting feature in our application and Elastic Cloud is the only managed Elasticsearch hosting that currently supports dynamic scripting.

In terms of pricing, Elastic Cloud is slightly more expensive than AWS Elasticsearch. I think this is because they are using AWS EC2 under the hood. You can compare the pricing of both services here https://aws.amazon.com/elasticsearch-service/pricing/ and https://www.elastic.co/cloud/as-a-service/pricing.

As of now, Elastic Cloud supports the latest version of Elasticsearch which is 5.1.2. If you like living on the edge, I recommend you to try Elastic Cloud.

Creating a watcher

On AWS, we can use Cloud Watch to monitor our Elasticsearch cluster health status as well as monitoring other metrics such as memory and cpu usage. With Elastic Cloud, we have to use Elastic Watcher or Alerting to monitor and trigger alerts.

Currently, there is no UI to set up the watcher on Elastic Cloud. To create a watcher, you have to send a PUT request to your cluster. Please note that this blog post is based on Elasticsearch version 1.7.6 and Elasticsearch Watcher version is 1.0.1.

First thing you have to do is to enable the watcher plugin on the Elastic Cloud clusters configuration. See screenshot below:

The next thing to do is to add an alert recepient email to the Elastic Cloud whitelist. In order to do this, go to Account > Email settings and scroll to the bottom of the page. See screenshot below:

Shortly after that, you will receive an email to confirm this request for whitelist. Confirm the request and you are ready to receive email from Elastic Cloud.

Now open up your REST client app or if you are one of those CLI Guru, you can stick with CURL. As I mentioned earlier, we will send a PUT request to our cluster to create a watcher.

The endpoint of the request is something like this http://elastic-cloud-username:elastic-cloud-password@elastic-cloud-cluster-host:9200/_watcher/watch/cluster_health_watch

You have to replace elastic-cloud-username, elastic-cloud-password and elastic-cloud-cluster-host with your cluster details.

And here is the JSON content of the request: ( please replace the host, auth username, auth password and to email with your cluster details )

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
{
  "trigger" : {
    "schedule" : { "interval" : "10s" }
  },
  "input" : {
    "http" : {
      "request" : {
       "host" : "add-your-elastic-cloud-host-here",
       "port" : 9200,
       "path" : "/_cluster/health",
       "auth" : {
          "basic" : {
            "username" : "your-elastic-cloud-username",
            "password" : "your-elastic-cloud-password"
          }
        }
      }
    }
  },
  "condition" : {
    "compare" : {
      "ctx.payload.status" : { "eq" : "red" }
    }
  },
  "actions" : {
    "send_email" : {
      "email" : {
        "to" : "the-recepient-email-address",
        "subject" : "Cluster Status Warning",
        "body" : "Cluster status is RED"
      }
    }
  }
}

In a nutshell, the request above will create a watcher that will get triggered every 10s, gets the input from our Elasticsearch /_cluster/health page, checks for cluster status ( see condition section ) and sends an email if the condition is matched.

Here is the screenshot of my PUT request using Insomnia REST Client:

After sending the request, we can confirm if the watcher is created successfully or not by visiting this endpoint on our browser http://elasticsearch-cluster-host:9200/_watcher/watch/cluster_health_watch

If the watcher is created successfully, you should see a response like this:

Delete the watcher

You can send a DELETE request if you want to delete the watcher

curl -XDELETE http://elasticsearch-cluster-host:9200/_watcher/watch/cluster_health_watch

Check if your watcher was triggered

You can check if your watcher has been triggered by sending a GET request to /.watch_history*/search?pretty with the following query:

1
2
3
4
5
{
  "query" : {
    "match" : { "result.condition.met" : true }
  }
}

If the query returns a hit, it means that your watcher has been triggered. This is helpful during debugging.

That’s it for now, the next thing I need to figure out is to create alerting for CPU and memory usage. I’ll leave it for another blog post.


Setting Up L2TP VPN to Bypass Great Firewall Of China

Last month, I traveled to China for the second time. Unlike my first trip, this time I am more prepared to bypass the great firewall of China.

During my first trip in China, I was mainly relying on simple SSH tunnel to get access to Gmail and all other blocked services. This solution is unrealiable because I couldn’t use it on my Android phone. Aside from that, I also kept having constant dropouts which explained in this blog post http://blog.zorinaq.com/my-experience-with-the-great-firewall-of-china/

After an extensive research and also a recommendation from one of my friends, I decided to install an L2TP VPN server in Japan. I choose Japan because it’s close to China and I can use Tokyo AWS Region.

I ended up using this ansible playbook that I found when I was looking for tutorials https://github.com/jlund/streisand. It’s basically an Ansible Playbook which help you to install various software such as OpenVPN, L2TP, Tor, etc. You just need to run one shell script and it will install all of those software to your target host.

Running the playbook

Since I already have ansible installed, I just need to clone the project and run the setup script. If not a complete tutorial on how to get started you can check this installation guide here https://github.com/jlund/streisand#installation.

Cloning the project

1
git clone https://github.com/jlund/streisand.git && cd streisand

Running the setup script

1
./streisand

When you run the setup script, it will ask few questions such as where to host the server, AWS Access Keys, etc. I am using AWS because I can use the free tier to run the VPN server. On AWS, it will take around 45 minutes to finish the whole installation process.

Using the VPN

When the whole installation finished, the playbook will create instructions files in the server. You need to SSH to the server in order to view the instruction.

1
ssh ubuntu@server-ip

Go to the NGINX folder to see file

1
cd /var/www/streisand/l2tp-ipsec

Read the instruction

1
cat index.md | more

In the file, you will find instructions on how to connect to the L2TP server from all different operating systems.


HAProxy as a reverse proxy for cloudinary images

We are using in one of our applications Cloudinary to host and resize images on the fly. We are also using Cloudflare for our CDN and DNS management.

I was given a task to setup a CNAME subdomain in CloudFlare to forward the request to Cloudinary. This way we can still have the benefit of serving static images from CDN as well as reducing the Cloudinary bandwidth usage.

My solution is to set HAProxy as a reverse proxy which responsible to fetch images from Cloudinary server. You can see the overview diagram below:

The first thing we have to do is to create an ACL in HAProxy for our cloudinary subdomain

In the configuration below, we are telling HAProxy to forward all requests from cloudinary-asset.rudylee.com to cloudinary-backend:

1
2
3
4
5
6
7
listen  http
        bind 127.0.0.1:8080
        maxconn     18000

        acl host_cloudinary hdr(host) -i cloudinary-asset.rudylee.com

        use_backend cloudinary-backend if host_cloudinary

Next one is to create a new backend.

1
2
3
backend cloudinary-backend
        http-request set-header Host res.cloudinary.com
        server cloudinary res.cloudinary.com:80

Restart HAProxy and you should be able to use the subdomain to serve images from Cloudinary (eg: http://cloudinary-asset.rudylee.com/rudylee/image/upload/12298848/icon/12379593943923.png )

Requesting the images through SSL should work if you have SSL termination configured in your HAProxy.