Code Along That Explains an Easy Way to Scrape Data in Ruby

Shaqqour
4 min readJan 31, 2021
Photo by Dane Deaner on Unsplash

What is Scraping in Ruby?

Scraping in Ruby is a way to extract data from web pages so you can use it in your program. We usually scrape data when the website doesn’t provide raw data, APIs for instance, for the information we need. So we just pull the web page to our program and extract some raw data from it.
For example, this web page presents the best 300 movies to watch according to Rotten Tomatoes.

Rotten Tomatoes 300 Essential Movies to Watch

You see the list of the movies and some information about each one. Our mission here is to read this information and store it in our program. Let’s dive deep into this.

Introducing Nokogiri

The word “Nokogiri” is a Japanese saw, a tool that is used in woodworking. It is also a Ruby gem that is used for data scraping ;-). To install this gem on your machine use the following command in your terminal:

gem install nokogiri

Make sure to include your Gemfile in your project directory and run bundle install . Here is an example of how the Gemfile would look like:

source "https://rubygems.org"gem 'byebug'
gem 'nokogiri'

Let’s start our code-along:

Create a new project:

In your machine, create a new project directory, and let’s call it movie-scraper, and inside that directory create another directory and call it lib. finally, create the scraper file. You can run the following commands:

mkdir movie-scraper
cd movie-scraper
mkdir lib
cd lib
touch scraper.rb

Require nokogiri and open-Uri

Inside your new file add the following two lines:

require 'nokogiri'
require 'open-uri'

The open-uri gem is already included in your library, so you don't need to install it. This gem will pull the whole webpage and store it inside a variable in our program.

Create the class

Also, we will need a new class in the same file, scraper.rb , to include all the scraping code:

require 'nokogiri'
require 'open-uri'
class MovieScraperend

Send HTTP request to the movies page

We need to send a request to the web page that has all the data we need. Here we will use open-uri and nokogirigems to do so:

require 'nokogiri'
require 'open-uri'
class MovieScraper
url = "https://editorial.rottentomatoes.com/guide/essential-
movies-to-watch-now/"
html = open(url)
doc = Nokogiri::HTML(html)

end

Now that we have all we need inside the doc variable, which is an array of all the elements in the web page, let's browse through it.

Use the CSS selectors

We will need to use our skills in selecting an element in the doc variable. Here is a link that will help you understand how to select an element inside a web page. On our web page, we need to select the element that has all the movies. By inspecting the page, you can see that all the movies are inside this element div .row.countdown-item . Let’s store it by itself inside an array that will contain all the inner elements. This makes it easier for us to select from it.

require 'nokogiri'
require 'open-uri'
class MovieScraper
url = "https://editorial.rottentomatoes.com/guide/essential-
movies-to-watch-now/"
html = open(url)
doc = Nokogiri::HTML(html)
movies = doc.css("div .row.countdown-item")
end

Getting each movie name

Now that we have all the movies stored in one array, we can iterate over it and get the name of the movie by using this selector, div h2 a .

require 'nokogiri'
require 'open-uri'
class MovieScraper
url = "https://editorial.rottentomatoes.com/guide/essential-
movies-to-watch-now/"
html = open(url)
doc = Nokogiri::HTML(html)
movies = doc.css("div .row.countdown-item")
movies_names = []
movies.each_with_index do |movie, idx|
movies_names << movie.css("div h2 a").text.strip
end
end

You see that we are storing all the movies name in the movies_names array. Using the same techniques, you can get all other movies’ information and store it in your program in any format you like. The most common one is using a hash.

That was how you scrape data from a webpage on the internet using Nokogiri. Hope that helped you in your first scraping application in Ruby. If you have anything you would like to add, please comment below!

--

--

Shaqqour

Full stack software engineer. Passionate about making people’s lives better and easier through programming. LinkedIn.com/in/shaqqour