Reflections: Web Scraper CLI Application with Ruby

Barry Nguyen
Nerd For Tech
Published in
3 min readMay 28, 2021

--

This full stack software engineering course module’s assignment was to build a CLI application in Ruby that scrapes a public website.

The key objectives for this assignment were:

  1. To understand the structure of a CLI application in Ruby
  2. Demonstrate the application of object oriented programming in Ruby
  3. Write a CLI application in Ruby where the data provided must go at least one level deep. A “level” is where a user can make a choice and then get detailed information about their choice.
  4. Demonstrate good use of object oriented design patterns by creating a collection of objects, not hashes to store data.
  5. Demonstrate the use of Nokogiri (a Ruby gem) to parse HTML and XML
  6. Demonstrate the use of the Open-URI (a Ruby gem) which is a wrapper for Net::HTTP, Net::HTTPS and Net::FTP.
  7. Apply the DRY (“don’t repeat yourself”) and Separation of Concerns principles to programming .

Summary

I decided to build a CLI application to scrape data from Seek.com.au, which is a search engine website that lists jobs in Australia.

The reason why I chose this was because I felt it would be a great learning challenge to apply my new Ruby programming skills and revise the use of CSS selectors to extract HTML data and scrape a relatively established and complex website.

Revision of Ruby Programming

Building a CLI application that scrapes public websites was an enriching learning experience to consolidate my Ruby programming skills and knowledge as it covered some of the Ruby essentials including:

  • Basic control flow — how “if” statements work
  • Variable scopes — method, instance and class
  • Object instantiation — #new (and possibly #initialize)
  • The meaning of the “self” keyword
  • Method types — class vs. instance methods
  • Method return types — knowing what methods return
  • Iterating through collections — at least using #each with a block

Challenges

The key challenges I experienced were applying CSS selectors to a complex website to scrape data and consolidating my understanding of object oriented programming using Ruby — in particular, context/the meaning of ‘self’, and applying DRY (“don’t repeat yourself”) and separation of concerns computer science principles. Working through the learning labs helped consolidate the new concepts however building an application was a more beneficial approach to tie together the concepts and its relationships.

Lastly, I ran into some software compatibility issues with my course and Ruby rspec testing due to my use of a new MacBook Pro M1 processor. As a result, I transitioned from Visual Studio Code to AWS Cloud9 IDE which is web-browser based. This experience I have had with Cloud9 has been exceptional, apart from the rare loss of connection which required re-logging in. It has done the job perfectly well for the programming work I have been undertaking and in some ways, faster than to operate than Visual Studio Code, from any computer with an internet connection!

Conclusion

It has been an overwhelming but rewarding learning experiencing juggling a full time course with a full time job and family! It highlights how critical it is to develop strict time management and “learning how to learn” skills, so that you can continue to maintain strong personal relationships and good health, while mastering your craft as a software engineer.

With my new level of confidence, knowledge and skills, I am greatly looking forward to the next module on the Ruby on Ruby rails framework.

--

--

Barry Nguyen
Nerd For Tech

Tech Entrepreneur | Primary Care | 40 Under 40: Most Influential Asian-Australians 2020 | Full Stack Software Engineer in Training.