What is Scraping in Ruby?
Scraping in Ruby is a way to extract data from web pages so you can use it in your program. We usually scrape data when the website doesn’t provide raw data, APIs for instance, for the information we need. So we just pull the web page to our program and extract some raw data from it.
For example, this web page presents the best 300 movies to watch according to Rotten Tomatoes.


You see the list of the movies and some information about each one. Our mission here is to read this information and store it in our program. Let’s dive deep into this.
Introducing Nokogiri
The word “Nokogiri” is a Japanese saw, a tool that is used in woodworking. It is also a Ruby gem that is used for data scraping ;-). To install this gem on your machine use the following command in your terminal:
gem install nokogiri
Make sure to include your Gemfile in your project directory and run bundle install
. Here is an example of how the Gemfile would look like:
source "https://rubygems.org"gem 'byebug'
gem 'nokogiri'
Let’s start our code-along:
Create a new project:
In your machine, create a new project directory, and let’s call it movie-scraper
, and inside that directory create another directory and call it lib
. finally, create the scraper
file. You can run the following commands:
mkdir movie-scraper
cd movie-scraper
mkdir lib
cd lib
touch scraper.rb
Require nokogiri and open-Uri
Inside your new file add the following two lines:
require 'nokogiri'
require 'open-uri'
The open-uri
gem is already included in your library, so you don't need to install it. This gem will pull the whole webpage and store it inside a variable in our program.
Create the class
Also, we will need a new class in the same file, scraper.rb
, to include all the scraping code:
require 'nokogiri'
require 'open-uri'class MovieScraperend
Send HTTP request to the movies page
We need to send a request to the web page that has all the data we need. Here we will use open-uri
and nokogiri
gems to do so:
require 'nokogiri'
require 'open-uri'class MovieScraper
url = "https://editorial.rottentomatoes.com/guide/essential-
movies-to-watch-now/"
html = open(url)
doc = Nokogiri::HTML(html)
end
Now that we have all we need inside the doc
variable, which is an array of all the elements in the web page, let's browse through it.
Use the CSS selectors
We will need to use our skills in selecting an element in the doc
variable. Here is a link that will help you understand how to select an element inside a web page. On our web page, we need to select the element that has all the movies. By inspecting the page, you can see that all the movies are inside this element div .row.countdown-item
. Let’s store it by itself inside an array that will contain all the inner elements. This makes it easier for us to select from it.
require 'nokogiri'
require 'open-uri'class MovieScraper
url = "https://editorial.rottentomatoes.com/guide/essential-
movies-to-watch-now/"
html = open(url)
doc = Nokogiri::HTML(html)
movies = doc.css("div .row.countdown-item")
end
Getting each movie name
Now that we have all the movies stored in one array, we can iterate over it and get the name of the movie by using this selector, div h2 a
.
require 'nokogiri'
require 'open-uri'class MovieScraper
url = "https://editorial.rottentomatoes.com/guide/essential-
movies-to-watch-now/"
html = open(url)
doc = Nokogiri::HTML(html)
movies = doc.css("div .row.countdown-item") movies_names = []
movies.each_with_index do |movie, idx|
movies_names << movie.css("div h2 a").text.strip
end
end
You see that we are storing all the movies name in the movies_names array. Using the same techniques, you can get all other movies’ information and store it in your program in any format you like. The most common one is using a hash.
That was how you scrape data from a webpage on the internet using Nokogiri. Hope that helped you in your first scraping application in Ruby. If you have anything you would like to add, please comment below!