Processing emails with Postfix and Rails

Posted by Dan Sosedoff on August 10, 2011

This is a short manual on how to setup postfix and rails application to receive and process email messages.

Stack:

  • Debian / Ubuntu Server
  • Postix
  • Ruby 1.9
  • Rails 3.0

Overview

You have an application where users get email notifications. And you want to allow them to reply directly to the email.
In order to do so, each email should have an unique (depends on situation) reply-to address. Usually its something like that:

P946d272cf7da4dd6b0cb613605bced65@yourdomain.com

This means that the mailserver you use should support dynamic/virtual email addresses and forwarding.

Postfix Configuration

First, you’ll need to install postfix (in not installed):

apt-get install postfix

Configuration should look like this:

# See /usr/share/postfix/main.cf.dist for a commented, more complete version

default_privs = apps

# Debian specific:  Specifying a file name will cause the first
# line of that file to be used as the name.  The Debian default
# is /etc/mailname.
#myorigin = ap

smtpd_banner = $myhostname ESMTP $mail_name (Ubuntu)
biff = no

# appending .domain is the MUA's job.
append_dot_mydomain = no

# Uncomment the next line to generate "delayed mail" warnings
#delay_warning_time = 4h

readme_directory = no

# TLS parameters
smtpd_tls_cert_file=/etc/ssl/certs/ssl-cert-snakeoil.pem
smtpd_tls_key_file=/etc/ssl/private/ssl-cert-snakeoil.key
smtpd_use_tls=no
smtpd_tls_session_cache_database = btree:${data_directory}/smtpd_scache
smtp_tls_session_cache_database = btree:${data_directory}/smtp_scache

# See /usr/share/doc/postfix/TLS_README.gz in the postfix-doc package for
# information on enabling SSL in the smtp client.

myhostname = YOUR_APP_HOSTNAME
alias_maps = hash:/etc/aliases
alias_database = hash:/etc/aliases
myorigin = /etc/mailname
mydomain = YOUR_APP_DOMAIN
mydestination = YOUR_APP_DOMAIN, localhost
mynetworks = 127.0.0.0/8 [::ffff:127.0.0.0]/104 [::1]/128
mailbox_size_limit = 0
recipient_delimiter = +
inet_interfaces = all
relay_domain = localhost
recipient_canonical_maps = regexp:/etc/postfix/recipient_canonical

Last option recipient_canonical_maps allows you to define a dynamic email addresses and forward them to the specific system mailbox for processing.

Create a file /etc/postfix/recipient_canonical:

/^P[0-9abcdef]{1,}(M[0-9]{1,})?/ apps

This will add a virtual recipient addresses and forward messages to user apps.

NOTE: Regular expressions should be in POSIX format. For test you can use regextester.com

Email Aliases

After you added support for virtual addresses all mail will be delivered to the system user mailbox (apps). But, we need to drive all that traffic into our app. In order to do so we will have to setup mail piping directly into your application script.

Edit /etc/aliases:

apps: "| /home/apps/APP_NAME/current/script/email_receiver_script"

And rebuild the aliases db by running:

newaliases

Do not forget to restart postfix:

/etc/init.d/postfix restart

You can test out the email delivery. For errors check /var/log/mail.info

Mail Receiver Script

Since all mail will be forwarded directly to our mail receiver script via piping there are few things to consider:

  • Email receiver should consume as less memory as possible.
  • Email receiver should not load the whole application (because of item above).
  • Email receiver should only validate and preprocess incoming messages and leave actual processing to another subsystem via queue.

Configuration

There are few ruby libraries that are well suited for this case:

  • mail – Email processing, ruby 1.9.2 compatible (comparing to tmail which is not)
  • redis – Simple key-value in-memory database.
  • resque – Redis-backed library for creating background jobs, placing those jobs on multiple queues, and processing them.

Install gems:

gem install mail redis resque

Here is an example email receiver script:

#!/usr/bin/env ruby 

require 'rubygems'
require 'mail'
require 'redis'
require 'resque' 

class EmailReply
  @queue = :email_replies 

  def initialize(content)
    mail    = Mail.read_from_string(content)
    from    = mail.from.first
    to      = mail.to.first 

    if mail.multipart?
      part = mail.parts.select { |p| p.content_type =~ /text\/plain/ }.first rescue nil
      unless part.nil?
        message = part.body.decoded
      end
    else
      message = part.body.decoded
    end 

    unless message.nil?
      Resque.enqueue(EmailReply, from, to, message)
    end
  end
end 

EmailReply.new($stdin.read)

This script receives the mail message then tries to extract the plaintext body. If the email message is valid it adds it to the queue for future processing.

Mail Queue processing

After we put emails into the queue we’ll need to create a worker.

If you need to extract a reply from the body, use (mail_extract)[https://github.com/sosedoff/mail_extract]:

gem install mail_extract

Simple worker (resque job worker), extracted from one of the projects. (RAILS_ROOT/lib/email_reply.rb):

class InvalidReplyUUID    < StandardError ; end
class InvalidReplyUser    < StandardError ; end
class InvalidReplyProject < StandardError ; end
class InvalidReplyMessage < StandardError ; end 

class EmailReply
  @queue = :email_replies 

  def self.parse_email_uuid(str)
    if str =~ /^P[0-9abcdef]+(M[\d]+)?@/i
      parts = str.scan(/^P([0-9abcdef]+)(M([\d]+))?/).flatten
      project_uuid = parts.first
      message_id = parts.size == 3 ? parts.last : nil 

      result = {:project_uuid => project_uuid}
      result[:message_id] = message_id unless message_id.nil?
      result
    else
      raise InvalidReplyUUID, "Invalid UUID: #{str}"
    end
  end 

  def self.perform(from, to, body)
    user = User.find_by_email(from)
    if user.nil?
      raise InvalidReplyUser, "User with email = #{from} is not a member of the app."
    end 

    info = parse_email_uuid(to) 

    project = Project.find_by_uuid(info[:project_uuid])
    if project.nil?
      raise InvalidReplyProject, "Project with UUID = #{info[:project_uuid]} was not found."
    end 

    if info.key?(:message_id)
      message = project.messages.find_by_id(info[:message_id])
      if message.nil?
        raise InvalidReplyMessage, "Message with ID = #{info[:message_id]} was not found on project '#{project.name}'"
      end
    end 

    params = {
      :project  => project,
      :body     => MailExtract.new(body).body,
      :markup   => 'plain',
      :sent_via => 'email'
    }
    params[:message] = message unless message.nil? 

    message = user.messages.new(params)
    unless message.save
      raise RuntimeError, "Unable to save message. Errors: #{message.errors.inspect}"
    end
  end
end

NOTE: Its important that both mail receiver and worker are using the same queue.

Create a resque.rake in RAILS_ROOT/lib/tasks:

require 'resque/tasks'
task "resque:setup" => :environment

And fire it up:

rake resque:work QUEUE=email_replies

Fast and easy password generation

Posted by Dan Sosedoff on January 22, 2011

I used to generate a lot of passwords and usually it would be an online service found by googling “generate password online”. It worked fine until i got tired of it and decided to find something else, much easier and faster. Something that will give me the results right away from terminal while doing server setup.

Here we go, bash script (found online and modified by my needs):

#!/bin/bash
 
charspool=('a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' 'k' 'l' 'm' 'n' 'o' 'p'
'q' 'r' 's' 't' 'u' 'v' 'w' 'x' 'y' 'z' '0' '1' '2' '3' '4' '5' '6' '7'
'8' '9' '0' 'A' 'B' 'C' 'D' 'E' 'F' 'G' 'H' 'I' 'J' 'K' 'L' 'M' 'N' 'O'
'P' 'Q' 'R' 'S' 'T' 'U' 'V' 'W' 'X' 'Y' 'Z' '-' '_');
 
len=${#charspool[*]}
 
if [ $# -lt 1 ]; then
  num=20;
else
  num=$1;
fi
 
randomnumbers=$(head -c $num /dev/urandom | od -t u1 | awk '{for (i = 2; i <= NF; i++) print $i}')
echo -n "password: "
 
for c in $randomnumbers; do
  echo -n ${charspool[$((c % len))]}
done
echo

Installation

Just create a script called something like “genpassword” in your bin dir and make it executable.

sudo nano /usr/local/bin/genpassword
sudo chmod +x /usr/local/bin/genpassword

Works on any unix-like machine with bash installed.

Usage

Usage is pretty straight-forward. Just type “genpassword” in terminal and you’ll get the password.

genpassword # This will give a default length (20 chars) password
genpassword 64 # This will give you a 64 chars long password

Debugging email output in Rails

Posted by Dan Sosedoff on November 05, 2010

Need to troubleshoot email output? No problem. Just need to configure your development environment with the following lines:

# RAILS_ROOT/config/environments/development.rb
config.action_mailer.delivery_method = :sendmail 
config.action_mailer.sendmail_settings = {:location => "/usr/bin/fake-sendmail.sh"}
config.action_mailer.default_url_options = { :host => "YOUR HOST" }

And here is the fake-sendmail utility i wrote: (save as “/usr/bin/fake-sendmail.sh”)

#!/usr/bin/ruby
 
dir = "/tmp/fake-sendmail"
headers = ''; $stdin.each_line { |l| headers << l ; break if l.strip.length == 0 }
body = '' ; $stdin.each_line { |l| body << l }
format = body.match(/html/i) ? "html" : "txt"
filename = Time.now.to_i
 
Dir.mkdir(dir) unless File.exists?(dir)
File.open("#{dir}/#{filename}_headers.txt", "w") { |f| f.write(headers) }
File.open("#{dir}/#{filename}.#{format}", "w") { |f| f.write(body) }

Paste: http://pastie.org/private/er61gafxb6fzowjo4chjvq

As the result you’ll see 2 files created for each outgoing email. One has all email headers and another one has the body.

Also, it works for Sinatra and Merb. Woot!

Maintenance page for Rails applications with nginx

Posted by Dan Sosedoff on September 22, 2010

Here is a small tooltip how to setup maintenance page for Rails applications under nginx. I usually put offline.txt into public directory while making any changes or testing.
Solution works with any type of backend server (passenger/php-fpm/upstream)

Sample vhost config:

server {
  listen 80;
  server_name HOSTNAME;

  location / {
    root /path/to/your/public/dir;
    passenger_enabled on;
    rails_env production;

    if (-f $document_root/offline.txt) {
      return 503;
    }
  }

  location /maintenance.html {
    root /path/to/your/public/dir;
  }

  error_page 503 /maintenance.html;
}

Pastie: http://pastie.org/private/pgnhvnkj4uxe4mkgnhvqw

Custom field aggregations in Sphinx using SphinxQL

Posted by Dan Sosedoff on September 06, 2010

Sphinx is a really powerful tool for a full-text database search. It is the perfect option as a search engine on your website’s data.
In default mode it works as a regular tcp server and has multiple native language bindings for php, ruby, c, etc. But its another outstanding feature is MySQL Protocol Connectoin and SphinxQL, which is similar to native mysql query language.

So, ok. Lets say we have N documents with M attributes. Attributes could be different: string, integer, double, boolean. Out objective is to perform attribute aggregation based on specified search term (user-defined, etc). That will give us full information on data selected only by search term. Its only use-case when you really need to get these aggregate fields. Next part is tricky and not really efficient.

First of all, you have to setup Sphinx search daemon instance using different configuration file (it could not run both). Another problem – you have to setup another data sources and index files, Sphinx puts a lock on all used-right-now files.

Lets assume we have a database of books. We need to build a form with sliders which could be used as user-friendly search filter. All we need is to get a list of min and max attributes values. But there is a problem: sometimes, while working with sphinx you might find yourself trying to use it like you usually do with regular RDMS. Unfortunately, sphinx has a different design. Basically, sphinx has one primary field which presents in each search request – DocumentID. Its an unique id that represents your data ID, which makes it harder to product aggregate data. And there is no way to get rid of that field.
The whole idea of our aggregation – using boolean match mode with no weighting performed at all. In that case all results will have weight field = 1. That will give us ability to group all the results by weight field, rejecting the DocumentID field.

Here is the sample query:

SELECT
  MIN(reviews) AS min_reviews, MAX(reviews) AS max_reviews,
  MIN(pages) AS min_pages, MAX(pages) AS max_pages,
  MIN(pub_year) AS min_date, MAX(pub_year) AS max_date,
  @weight AS w
FROM 
  INDEX_NAME
WHERE
  MATCH('SEARCH_TERM') AND pages > 30
GROUP BY w OPTION ranker = none

The result of this query will be one row with field alias names. Thats’s it.

All statements are fully customizable. Just check full SphinxQL reference for details.

Snippet: Cleanup your Git repository

Posted by Dan Sosedoff on June 15, 2010

Snippet (found on net) for removing files from repository that are no longer present under your project.

$ git rm $(git ls-files -d)

For best use add it to bash alias file: ~/.bashrc or ~/.bash-aliases (under ubuntu):

alias gitclean='git rm $(git ls-files -d)'

Handy HTTP requests with Curb and Ruby

Posted by Dan Sosedoff on June 13, 2010

While working on one of the projects, i tried to find multi-purpose HTTP request class that can use different network interfaces/ip addresses with retry option (if connection slow or server not responding for some reason).

Here is a small class wrapper build on top of Ruby Curb implemented as a module:

module ApiRequest
  USER_AGENTS = [
    'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3',
    'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727)',
    'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.3) Gecko/20100423 Ubuntu/10.04 (lucid) Firefox/3.6.3',
    'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_3; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.70 Safari/533.4',
    'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.2) Gecko/20100323 Namoroka/3.6.2',
    'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100401 Ubuntu/9.10 (karmic) Firefox/3.5.9'
  ]
 
  CONNECTION_TIMEOUT = 10
 
  @@interfaces = []
 
  # get random user-agent string for usage
  def random_agent
    USER_AGENTS[rand(USER_AGENTS.size-1)]
  end
 
  # get random IP/network interface specified in @@interfaces
  def random_interface
    size = @@interfaces.size
    size > 0 ? @@interfaces[rand(size-1)] : nil
  end
 
  # perform request, assign_to - specify network interface/ip
  def perform(url, assign_to=nil)
    puts url
    interface = assign_to.nil? ? self.random_interface : assign_to
    req = Curl::Easy.new(url)
    req.timeout = CONNECTION_TIMEOUT
    req.interface = interface unless interface.nil?
    req.headers['User-Agent'] = self.random_agent
    begin
      req.perform
      if req.response_code == 200
        return req.downloaded_bytes > 0 ? req.body_str : nil
      else
        nil
      end
    rescue Exception
      return nil
    end
  end
 
  # perform request by number of attempts
  def fetch(url, attempts=3)
    result = nil
    1.upto(attempts) do |a|
      result = self.perform(url)
      break unless result.nil?
    end
    return result
  end
end

And sample usage:

class TestRequest
  include ApiRequest
 
  def foo
     body = self.fetch('http://google.com')
  end
end

If module variable “@@interfaces” is array of ip addresses or network interfaces then one of them (randomly selected) will be used to perform request. Also, function “fetch” has parameter “attempts” which set to 3 by default. It means that operation will be invoked n times until result is downloaded from url. Otherwise – it returns nil.
Function perform has a parameter “assign_to” (which it not used in “fetch” function) that allows to bind request to specified interface. It is useful if you have situation when you might use different workers that bound to exact interface or just one that uses random ip`s. Also, class ApiRequest has a list of user agents which it uses randomly for each performed request.

Pastie: http://pastie.org/private/j19j3hbebte9bjqaydslmg

Setting processor affinity for a certain task or process in Linux

Posted by Dan Sosedoff on June 06, 2010

When you are using SMP you might want to override the kernel’s process scheduling and bind a certain process to a specific CPU(s).

What is this?

CPU affinity is nothing but a scheduler property that “bonds” a process to a given set of CPUs on the SMP system. The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs. Note that the Linux scheduler also supports natural CPU affinity:

The scheduler attempts to keep processes on the same CPU as long as practical for performance reasons. Therefore, forcing a specific CPU affinity is useful only in certain applications. For example, application such as Oracle (ERP apps) use # of cpus per instance licensed. You can bound Oracle to specific CPU to avoid license problem. This is a really useful on large server having 4 or 8 CPUS

Setting processor affinity for a certain task or process using taskset command

taskset is used to set or retrieve the CPU affinity of a running process given its PID or to launch a new COMMAND with a given CPU affinity. However taskset is not installed by default. You need to install schedutils (Linux scheduler utilities) package.

$ apt-get install shedutils

Under latest version of Debian / Ubuntu Linux taskset is installed by default using util-linux package.

The CPU affinity is represented as a bitmask, with the lowest order bit corresponding to the first logical CPU and the highest order bit corresponding to the last logical CPU. For example:

  • 0×00000001 is processor #0 (1st processor)
  • 0×00000003 is processors #0 and #1
  • 0×00000004 is processors #2 (3rd processor)

To set the processor affinity of process 13545 to processor #0 (1st processor) type following command:

$ taskset 0x00000001 -p 13545

If you find a bitmask hard to use, then you can specify a numerical list of processors instead of a bitmask using -c flag:

$ taskset -c 1 -p 13545
$ taskset -c 3,4 -p 13545

where -p : Operate on an existing PID and not launch a new task (default is to launch a new task)

via http://www.cyberciti.biz/tips/setting-processor-affinity-certain-task-or-process.html

Install Git on CentOS 5.2

Posted by Dan Sosedoff on June 28, 2009

First, we need to install all dependencies:

# yum install gettext-devel expat-devel curl-devel zlib-devel openssl-devel

Next, get the git 1.6.x sources:

# wget http://kernel.org/pub/software/scm/git/git-1.6.3.3.tar.gz

Then, unpack and cd into git sources folder and install it:

# make && make install & make clean

That`s it, now you`ll have git system ready to go.

Useless filesystem over MySQL

Posted by Dan Sosedoff on April 06, 2009

Here is a completely useless filesystem based on MySQL database storage – mysqlfuse, implemented with Fuse.
I didnt find any way how i can use it, but meanwhile, this fs working. Not perfect of course, in that case its not maintained for a long time. Doesnt support information about free drive space, so any filemanager keeps saying ‘Error: No space left on device’. Such case making it more useless.

It`s really easy to set it up.

First, we need to install developer headers for fuse:

$ apt-get install libfuse-dev

Next, getting sources (32bit only, not working in 64bit):

$ wget http://voxel.dl.sourceforge.net/sourceforge/mysqlfs/mysqlfs-0.4.0-rc1.tar.bz2

Unpack it, and compile:

$ tar -xjvf mysqlfs-0.4.0-rc1.tar.bz2
$ cd mysqlfs-0.4.0-rc1
$ ./configure && make && make install

Next, we need to setup the database

CREATE DATABASE mysqlfs;
GRANT SELECT, INSERT, UPDATE, DELETE ON mysqlfs.* TO mysqlfs@"%" IDENTIFIED BY 'password';
FLUSH PRIVILEGES;

And create database schema. SQL file located in root folder of the sources

$ mysql -uroot -p mysqlfs < schema.sql

And finally, mount filesystem to some folder:

$ mysqlfs -ohost=MYSQLHOST -ouser=MYSQLUSER -opassword=MYSQLPASS -odatabase=mysqlfs MOUNT_DIR

Now, its gonna be working. To use automatic configuration parameters you can create section [mysqlfs] in your mysql configuration file (my.cnf)

Parameters:

-ohost=
    MySQL server host

  -ouser=
    MySQL username

  -opassword=

    MySQL password

  -odatabase=
    MySQL database name

That`s it. Anyway, using FUSE there is a way to create so weird filesystems proxy. For example, there is SQLite over FUSE. And it is too old. Next time i`ll write about Amazon S3 over FUSE projects.