Using Amazon product images on your website

Posted by Dan Sosedoff on June 15, 2010

Amazon has an awesome image service. You can use their product images on your site, adjusting them for you needs. All you have to know – one image url of your product. Having that string will provide you an access to its dynamic image scaling service which i had to use recently.

So, lets say you have books on your website, but you dont have any good images for them. There is 2 ways to solve your problem: 1) download it from whatever place and resize 2) use amazon!

Here goes small overview.

Unfortunately, i didnt have any time to play with image service for different countries, but i assume that wont change that much. Lets take a look on a regular image:

http://ecx.images-amazon.com/images/I/41ygBmdaIfL._SL500_SS100_.jpg

It has different parts:
1) URL base: http://ecx.images-amazon.com/images/I/
2) Image code: 41ygBmdaIfL
3) Size format (surrounded by underscores): _SL500_SS100_
4) Format: jpg/gif/png

Some words about image format. It can vary from square thumbnails to images with specific max width and height. For example: _SX100_ will produce image that 100 pixels wide, height will be calculated proportionally. SH100 will give opposite result, scaled by 100 pixels maximum height, SS100 – 100×100 pixels thumbnail. And so on, you can find other similar crop codes while exploring amazon store on different pages, all you need is to take a look on image sources.

Now, we need to use this with Ruby:

require 'net/http'
 
module Amazon
  # parse amazon image url and get image code and extension
  def self.parse_image(url)
    result = url.scan(/^http:\/\/ecx.images-amazon.com\/images\/I\/([a-z0-9\-\%]{1,})(.*)_.(jpg|jpeg|gif)/i)
    unless result.nil?
      unless result[0].nil?
        match = result.first
        return {:code => match.first.to_s, :extension => match.last.to_s}
      end
    end
  end
 
  # make a new amazon image url based on code and size
  def self.make_image(image, size)
    "http://ecx.images-amazon.com/images/I/#{image[:code]}._#{size.upcase}.#{image[:extension]}"
  end
 
  # check if actual image exists
  def self.check_image(url)
    begin
      uri = URI.parse(url)
      req = Net::HTTP::Get.new(uri.path)
      res = Net::HTTP.start(uri.host, uri.port) { |http| http.request(req) }
      return res.code == '200' && res.content_length.to_i > 0
    rescue Exception
      false
    end
  end
end

And usage:

url = 'http://ecx.images-amazon.com/images/I/51O65dIoZCL._SX117_.jpg'
info = Amazon.parse_image(url)
unless info.nil?
  new_url = Amazon.make_image(info, 'sx100')
  if Amazon.check_image(new_url)
    puts "Cool! Resized image: #{new_url}"
  else
    puts "Sorry, this image does not exist!"
  end
else
  puts "Cant identify image!"
end

Some notes about the process. The only reason why method “check_image” uses GET method instead of HEAD is because if image cannot be generated or not found in amazon`s cache the response is still valid sometimes. I`ve checked it on 50k images and sometimes HEAD request indicates that response is valid while it not supposed to. Otherwise i would use HEAD.

Handy HTTP requests with Curb and Ruby

Posted by Dan Sosedoff on June 13, 2010

While working on one of the projects, i tried to find multi-purpose HTTP request class that can use different network interfaces/ip addresses with retry option (if connection slow or server not responding for some reason).

Here is a small class wrapper build on top of Ruby Curb implemented as a module:

module ApiRequest
  USER_AGENTS = [
    'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3',
    'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727)',
    'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.3) Gecko/20100423 Ubuntu/10.04 (lucid) Firefox/3.6.3',
    'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_3; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.70 Safari/533.4',
    'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.2) Gecko/20100323 Namoroka/3.6.2',
    'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100401 Ubuntu/9.10 (karmic) Firefox/3.5.9'
  ]
 
  CONNECTION_TIMEOUT = 10
 
  @@interfaces = []
 
  # get random user-agent string for usage
  def random_agent
    USER_AGENTS[rand(USER_AGENTS.size-1)]
  end
 
  # get random IP/network interface specified in @@interfaces
  def random_interface
    size = @@interfaces.size
    size > 0 ? @@interfaces[rand(size-1)] : nil
  end
 
  # perform request, assign_to - specify network interface/ip
  def perform(url, assign_to=nil)
    puts url
    interface = assign_to.nil? ? self.random_interface : assign_to
    req = Curl::Easy.new(url)
    req.timeout = CONNECTION_TIMEOUT
    req.interface = interface unless interface.nil?
    req.headers['User-Agent'] = self.random_agent
    begin
      req.perform
      if req.response_code == 200
        return req.downloaded_bytes > 0 ? req.body_str : nil
      else
        nil
      end
    rescue Exception
      return nil
    end
  end
 
  # perform request by number of attempts
  def fetch(url, attempts=3)
    result = nil
    1.upto(attempts) do |a|
      result = self.perform(url)
      break unless result.nil?
    end
    return result
  end
end

And sample usage:

class TestRequest
  include ApiRequest
 
  def foo
     body = self.fetch('http://google.com')
  end
end

If module variable “@@interfaces” is array of ip addresses or network interfaces then one of them (randomly selected) will be used to perform request. Also, function “fetch” has parameter “attempts” which set to 3 by default. It means that operation will be invoked n times until result is downloaded from url. Otherwise – it returns nil.
Function perform has a parameter “assign_to” (which it not used in “fetch” function) that allows to bind request to specified interface. It is useful if you have situation when you might use different workers that bound to exact interface or just one that uses random ip`s. Also, class ApiRequest has a list of user agents which it uses randomly for each performed request.

Pastie: http://pastie.org/private/j19j3hbebte9bjqaydslmg

Making HTTP requests from different network interfaces with Ruby and Curb

Posted by Dan Sosedoff on June 09, 2010

At some point you will find that you have reached requests per IP limit while using some API or crawling resources. And if you`re doing it via standard Net::HTTP you`ll face the problem that you cannot assign request class to specified network interface (or IP). Bummer? No. Even if you cant do it with core class you might take a look on Curb – libcurl ruby binding. It has everything that you need to make regular get/post/etc requests. And of course – easy.

A simple example (real ip`s are changed):

require 'rubygems'
require 'curb'
 
ip_addresses = [
  '1.1.1.1',
  '2.2.2.2',
  '3.3.3.3',
  '4.4.4.4',
  '5.5.5.5'
]
 
ip_addresses.each do |ip|
  req = Curl::Easy.new('http://www.ip-adress.com/')
  req.interface = ip
  req.perform
  result_ip = req.body_str.scan(/<h2>My IP address is: ([\d\.]{1,})<\/h2>/).first
  puts("for #{ip} got response: #{result_ip}")
end

Output (ip`s are changed):

for 1.1.1.1 got response: 1.1.1.1
for 2.2.2.2 got response: 2.2.2.2
for 3.3.3.3 got response: 3.3.3.3
for 4.4.4.4 got response: 4.4.4.4
for 5.5.5.5 got response: 5.5.5.5

At least its working. Havent done any performance tests.
Sample on pastie: http://pastie.org/private/afxlcuk1npwjov3wer5hw

WebDAV client in ruby 1

Posted by Dan Sosedoff on May 02, 2009

Here is a simple example how to make native WebDAV client with Ruby sockets. No additional gems or extensions needed – just all basic classes.

class WebDAV
	attr_reader :host, :port, :protocol, :chunk_size
	@socket = nil
 
	def initialize(host,port=80,protocol='HTTP/1.1',chunk=8096)
		@host = host.to_s
		@port = port.to_i
		@protocol = protocol
		@chunk_size = chunk.to_i
	end
 
	def build_header(method, path, content_length=nil)
		header = "#{method} #{path} #{@protocol} \r\n"
		header += "Content-Length: #{content_length}\r\n" if !content_length.nil?
		header += "Host: #{@host}\r\n"
		header += "Connection: close\r\n\r\n"
		return header
	end
 
	def request(method, path)
		open
		header = build_header(method, path)
		if @socket.write(header) == header.length then
			return @socket.gets.split[1]
		end
	end
 
	def delete(path)
		request('DELETE', path)
	end
 
	def head(path)
		request('HEAD', path)
	end
 
	def mkcol(path)
		request('MKCOL', path)
	end
 
	def put(path, localfile, auto_head=true)
		if !File.exists?(localfile) || !File.readable?(localfile)
			raise "File not exists or not accessible for reading!"
		end
 
		open
 
		datalen = File.size(localfile)
		header = build_header('PUT', path, datalen)
 
		begin
			if @socket.write(header) == header.length then
				written = 0
				File.open(localfile,'r') do |f| 
					until f.eof? do
						written += @socket.write(f.read(@chunk_size))
					end
				end
 
				if written == datalen
					close
					if !auto_head
						return true
					else
						return head(path)
					end
				end
			end
		rescue Exception => e
			puts e
			return false
		end
	end
 
	def open
		begin 
			@socket = TCPSocket.open(@host,@port)
			return true
		rescue Exception => e
			puts e
			return false
		end
	end
 
	def close
		begin
			return @socket.close
		rescue 
			return false
		end
	end
end

This class supports only basic http/dav methods (PUT, DELETE, MKCOL, HEAD) and can be extended very easily and designed to work with all files, reading them by small chunks (default is 8096 bytes).
Im using this class sometimes with nginx.

Deps:

require 'socket'
require 'digest'

Usage:

# create connection
conn = WebDAV.new('your.host.com')
 
# upload file (without autocheck), return true/false value
result = conn.put('/test.mp3','/home/.../..../..../file.mp3', false)
 
# upload file with autocheck, returns http response code (201, 404, ... ) so you`ll know what exactly happened
result = conn.put('/test2.mp3','/home/.../file.mp3')

Also, here is a wrapper class to produce MD5, SHA1 file hashes that supports big files.

class FileHash 
	def self.md5(path)
		d = Digest::MD5.new
		File.open(path,'r') do |f| 
			d.update(f.read(8192)) until f.eof?
		end
		return d.hexdigest
	end
 
	def self.sha1(path)
		d = Digest::SHA1.new
		File.open(path,'r') do |f| 
			d.update(f.read(8192)) until f.eof?
		end
		return d.hexdigest
	end
end

Usage:

FileHash.md5('/path/to/file')
FileHash.sha1('/path/to/file')

This webdav class not pretending to be stable in production environment, but can be useful for some “one-time” tasks with less code.

Simple file uploader to Amazon S3 Service 1

Posted by Dan Sosedoff on March 22, 2009

For a long time i was thinking that Amazon`s Simple Storage Service (S3) is very complicated thing. But, it was before i tried it. Couple days ago, i got account to S3 and started exploring API`s and architecture. Now i see how stupid i was :) It`s really easy to handle all operations with files and buckets. Pricing also comfortable.

Welcome to cloud computing! :) I started using it with Ruby. Regular gem and docs can be found at http://amazon.rubyforge.org/

So, the first useful tool i decided to created – simple uploader of local files to amazons server.
First, we need to create bucket and make it public:

Bucket.create('NAME_HERE',:access => :public_read)

Here`s the client ruby script:

#!/usr/bin/ruby
 
require 'rubygems'
require 'aws/s3'
 
include AWS::S3
 
$s3_bucket = "BUCKET_NAME"
$s3_key = "API_KEY"
$s3_secret = "API_SECRET"
 
def s3_store(localfile)
	if File.exists?(localfile) &amp;&amp; File.readable?(localfile)
		puts "Uploading file [#{localfile}]. Size: #{File.size(localfile)} bytes."
		name = File.basename(localfile)
		Base.establish_connection!(:access_key_id => $s3_key, :secret_access_key => $s3_secret)
		S3Object.store(name, open(localfile), $s3_bucket, :access => :public_read)
		puts "Download link: http://s3.amazonaws.com/#{$s3_bucket}/#{name}"
	else
		puts "File not exists or not accessible. Please check file and try again!"
	end
end
 
path = ARGV[0]
if !path
	"Please specify the file to upload."
else
	s3_store(path)
end

Download script: http://files.sosedoff.com/036cfedd/

BTW, I found cool firefox add-on to manage S3 objects/files. It`s pretty easy.
Link to extension – http://www.s3fox.net
Screenshot:

Fetching album covers from Last.Fm API 3

Posted by Dan Sosedoff on February 15, 2009

As previous post was about fetching covers media from Amazon Web Services, this post will be about fetching covers from popular music site – Last.fm. API documentation page

#!/usr/bin/ruby
 
require 'rubygems'
require 'net/http'
require 'cgi'
require 'xmlsimple'
 
# key from API documentation
$lastfm_key = "b25b959554ed76058ac220b7b2e0a026" 
$lastfm_host = "ws.audioscrobbler.com"
 
def fetch_cover(artist, album)
	artist = CGI.escape(artist)
	album = CGI.escape(album)
 
	path = "/2.0/?method=album.getinfo&api_key=#{$lastfm_key}&artist=#{artist}&album=#{album}"
	data = Net::HTTP.get($lastfm_host, path)
	xml = XmlSimple.xml_in(data)
	if xml['status'] == 'ok' then
		album = xml['album'][0]
 
		cover = {
			:small => album['image'][1]['content'],
			:medium => album['image'][2]['content'],
			:big => album['image'][3]['content']
		}
 
		return cover
	end
 
	return nil
end
 
puts fetch_cover('Nickelback', 'Dark Horse').inspect

Download ruby script

Fetching album covers from Amazon Web Service 1

Posted by Dan Sosedoff on February 15, 2009

On my small project i was looking for web service to get media covers from. I found that i can use Amazon Web Services API. The documentation for this ECommerce Service is pretty old, but it still works.
More detailed information about API you can find here

#!/usr/bin/ruby
 
require 'rubygems'
require 'net/http'
require 'cgi'
require 'xmlsimple'
 
$amazon_key = "12DR2PGAQT303YTEWP02" # NOT MY KEY (FOUND ON INTERNET)
$amazon_host = "webservices.amazon.com"
 
def fetch_cover(artist, album)
	artist = CGI.escape(artist)
	album = CGI.escape(album)
 
	path = "/onca/xml?Service=AWSECommerceService&AWSAccessKeyId=#{$amazon_key}&Operation=ItemSearch&SearchIndex=Music&Artist=#{artist}&ResponseGroup=Images&Keywords=#{album}"
	data = Net::HTTP.get($amazon_host, path)
	xml = XmlSimple.xml_in(data)
	if xml['Items'][0]['TotalResults'].to_s.to_i then
		cover = {
			:small => xml['Items'][0]['Item'][0]['SmallImage'][0]['URL'],
			:medium => xml['Items'][0]['Item'][0]['MediumImage'][0]['URL'],
			:big => xml['Items'][0]['Item'][0]['LargeImage'][0]['URL']
		}
		return cover
	end
	return nil
end

So, after execution of this function you will get array with 3 different images (small, medium, big).
I use XML-Simple gem for ruby. Can be installed this way

sudo gem install xml-simple

That`s it. Download script