Snippet: Cleanup your Git repository

Posted by Dan Sosedoff on June 15, 2010

Snippet (found on net) for removing files from repository that are no longer present under your project.

$ git rm $(git ls-files -d)

For best use add it to bash alias file: ~/.bashrc or ~/.bash-aliases (under ubuntu):

alias gitclean='git rm $(git ls-files -d)'

Handy HTTP requests with Curb and Ruby

Posted by Dan Sosedoff on June 13, 2010

While working on one of the projects, i tried to find multi-purpose HTTP request class that can use different network interfaces/ip addresses with retry option (if connection slow or server not responding for some reason).

Here is a small class wrapper build on top of Ruby Curb implemented as a module:

module ApiRequest
  USER_AGENTS = [
    'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3',
    'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727)',
    'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.3) Gecko/20100423 Ubuntu/10.04 (lucid) Firefox/3.6.3',
    'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_3; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.70 Safari/533.4',
    'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.2) Gecko/20100323 Namoroka/3.6.2',
    'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100401 Ubuntu/9.10 (karmic) Firefox/3.5.9'
  ]
 
  CONNECTION_TIMEOUT = 10
 
  @@interfaces = []
 
  # get random user-agent string for usage
  def random_agent
    USER_AGENTS[rand(USER_AGENTS.size-1)]
  end
 
  # get random IP/network interface specified in @@interfaces
  def random_interface
    size = @@interfaces.size
    size > 0 ? @@interfaces[rand(size-1)] : nil
  end
 
  # perform request, assign_to - specify network interface/ip
  def perform(url, assign_to=nil)
    puts url
    interface = assign_to.nil? ? self.random_interface : assign_to
    req = Curl::Easy.new(url)
    req.timeout = CONNECTION_TIMEOUT
    req.interface = interface unless interface.nil?
    req.headers['User-Agent'] = self.random_agent
    begin
      req.perform
      if req.response_code == 200
        return req.downloaded_bytes > 0 ? req.body_str : nil
      else
        nil
      end
    rescue Exception
      return nil
    end
  end
 
  # perform request by number of attempts
  def fetch(url, attempts=3)
    result = nil
    1.upto(attempts) do |a|
      result = self.perform(url)
      break unless result.nil?
    end
    return result
  end
end

And sample usage:

class TestRequest
  include ApiRequest
 
  def foo
     body = self.fetch('http://google.com')
  end
end

If module variable “@@interfaces” is array of ip addresses or network interfaces then one of them (randomly selected) will be used to perform request. Also, function “fetch” has parameter “attempts” which set to 3 by default. It means that operation will be invoked n times until result is downloaded from url. Otherwise – it returns nil.
Function perform has a parameter “assign_to” (which it not used in “fetch” function) that allows to bind request to specified interface. It is useful if you have situation when you might use different workers that bound to exact interface or just one that uses random ip`s. Also, class ApiRequest has a list of user agents which it uses randomly for each performed request.

Pastie: http://pastie.org/private/j19j3hbebte9bjqaydslmg

Setting processor affinity for a certain task or process in Linux

Posted by Dan Sosedoff on June 06, 2010

When you are using SMP you might want to override the kernel’s process scheduling and bind a certain process to a specific CPU(s).

What is this?

CPU affinity is nothing but a scheduler property that “bonds” a process to a given set of CPUs on the SMP system. The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs. Note that the Linux scheduler also supports natural CPU affinity:

The scheduler attempts to keep processes on the same CPU as long as practical for performance reasons. Therefore, forcing a specific CPU affinity is useful only in certain applications. For example, application such as Oracle (ERP apps) use # of cpus per instance licensed. You can bound Oracle to specific CPU to avoid license problem. This is a really useful on large server having 4 or 8 CPUS

Setting processor affinity for a certain task or process using taskset command

taskset is used to set or retrieve the CPU affinity of a running process given its PID or to launch a new COMMAND with a given CPU affinity. However taskset is not installed by default. You need to install schedutils (Linux scheduler utilities) package.

$ apt-get install shedutils

Under latest version of Debian / Ubuntu Linux taskset is installed by default using util-linux package.

The CPU affinity is represented as a bitmask, with the lowest order bit corresponding to the first logical CPU and the highest order bit corresponding to the last logical CPU. For example:

  • 0×00000001 is processor #0 (1st processor)
  • 0×00000003 is processors #0 and #1
  • 0×00000004 is processors #2 (3rd processor)

To set the processor affinity of process 13545 to processor #0 (1st processor) type following command:

$ taskset 0x00000001 -p 13545

If you find a bitmask hard to use, then you can specify a numerical list of processors instead of a bitmask using -c flag:

$ taskset -c 1 -p 13545
$ taskset -c 3,4 -p 13545

where -p : Operate on an existing PID and not launch a new task (default is to launch a new task)

via http://www.cyberciti.biz/tips/setting-processor-affinity-certain-task-or-process.html

Install Git on CentOS 5.2

Posted by Dan Sosedoff on June 28, 2009

First, we need to install all dependencies:

# yum install gettext-devel expat-devel curl-devel zlib-devel openssl-devel

Next, get the git 1.6.x sources:

# wget http://kernel.org/pub/software/scm/git/git-1.6.3.3.tar.gz

Then, unpack and cd into git sources folder and install it:

# make && make install & make clean

That`s it, now you`ll have git system ready to go.

Useless filesystem over MySQL

Posted by Dan Sosedoff on April 06, 2009

Here is a completely useless filesystem based on MySQL database storage – mysqlfuse, implemented with Fuse.
I didnt find any way how i can use it, but meanwhile, this fs working. Not perfect of course, in that case its not maintained for a long time. Doesnt support information about free drive space, so any filemanager keeps saying ‘Error: No space left on device’. Such case making it more useless.

It`s really easy to set it up.

First, we need to install developer headers for fuse:

$ apt-get install libfuse-dev

Next, getting sources (32bit only, not working in 64bit):

$ wget http://voxel.dl.sourceforge.net/sourceforge/mysqlfs/mysqlfs-0.4.0-rc1.tar.bz2

Unpack it, and compile:

$ tar -xjvf mysqlfs-0.4.0-rc1.tar.bz2
$ cd mysqlfs-0.4.0-rc1
$ ./configure && make && make install

Next, we need to setup the database

CREATE DATABASE mysqlfs;
GRANT SELECT, INSERT, UPDATE, DELETE ON mysqlfs.* TO mysqlfs@"%" IDENTIFIED BY 'password';
FLUSH PRIVILEGES;

And create database schema. SQL file located in root folder of the sources

$ mysql -uroot -p mysqlfs < schema.sql

And finally, mount filesystem to some folder:

$ mysqlfs -ohost=MYSQLHOST -ouser=MYSQLUSER -opassword=MYSQLPASS -odatabase=mysqlfs MOUNT_DIR

Now, its gonna be working. To use automatic configuration parameters you can create section [mysqlfs] in your mysql configuration file (my.cnf)

Parameters:

-ohost=
    MySQL server host

  -ouser=
    MySQL username

  -opassword=

    MySQL password

  -odatabase=
    MySQL database name

That`s it. Anyway, using FUSE there is a way to create so weird filesystems proxy. For example, there is SQLite over FUSE. And it is too old. Next time i`ll write about Amazon S3 over FUSE projects.