Skip to main content

A guide to Selenium Architecture and Basics

Selenium is one of the most popular set of tools which are used for automating cross-browser web-applications. Selenium works on common operating systems - Windows, Linux/Unix and Mac and supports all major browsers like Chrome, Firefox, Internet Explorer, Edge, Opera etc. It also supports different programming languages like C#, Java, Javascript, Python, Ruby etc. This gives you the flexibility to the write the automation script in the same language in which the application under test is developed, though this is not a requirement. You can use any language of your choice for writing automation script.

Selenium Components :

Selenium IDE is mainly a browser add-on/extension mainly used for recording / replaying the script.

Selenium RC (Remote Control) would inject Javascript into browsers. This is now depreceated and is now replaced with Selenium Web Driver.

Selenium Webdriver allows you to create test scripts using the progamming language of your choice through the language bindings. In simple terms, Selenium Webdriver sends the HTTP requests (REST API) to the browser drivers and handles the response.

Let's quickly look at the architecture of Selenium Webdriver. There are four components :-



Language Bindings provide support to different programming languages that can be used to create Selenium scripts. 

JSON (Javascript object notion) is a protocol used for transferring data between client and server on the web. 

Note : In Selenium 4, JSON Wire Protocol is now replaced with W3C Protocol (Browser Drivers and Browsers are already based on W3C Protocol).

Browser Drivers acts as server, which interacts with the respective browsers. Since each browser is implemented differently by different vendors, browser drivers provides a way to interact with browser without worrying about browser internals. These browser drivers are typically provided by the respective browser development teams and can work with only respective browser e.g. if you would like to work on Chrome, you need ChromeDriver.

Typical workflow is:

- For every selenium command which is executed, HTTP request is created and send to Browser Driver

- Steps to be executed on the actual browser is sent by Browser Driver (HTTP Server)

- HTTP Server recieves the exectution status and send it back to automation script.


Selenium Grid in simple terms is used to run multiple tests in parallel on different browsers and different operating systems.

Comments