Automating web-based tasks with Selenium? Efficiently. That's the name of the game here, so.. Take the reins and make technology work for you.
A coding story in three chapters (with a bonus).
The Fellowship of the Functions
A Functional Overview:
There's a whole lot going on under the hood, and you're gonna want to know what makes this thing tick.
For the newcomers out there, on using functions:
Sometimes you got bits of code that you need to use over and over again, right? Well, functions, they make your code a whole lot easier to read and understand. I mean, think about it: instead of trying to decipher a tangled web of spaghetti code, you got these nice, neat little packages that do one thing and do it well. And when you need to make a change, well, you just gotta tweak the function, and b-a-m, problem solved. No muss, no fuss.Word of advice: Embrace 'em, you love 'em. Trust me, your future self is gonna thank you for it.
First, let's have a look at this simplified version of the main function:
# Main Code
def main():
# Initiate an instance of the web browser
driver = initialize_browser()
driver.get(LINK)
# Wait a bit
time.sleep(SLEEP_TIME)
# Parse the text
input_text = load_input_file(INPUT_FILE)
sentences = split_text_into_sentences(input_text)
chunks = list(generate_chunks(sentences))
# Eventually print some more details
if PRINT_DETAILS:
print_preprocess_infos(input_text, chunks)
# Find the input field
input_field = get_input_textarea_element(driver)
# Translate the text
translation = translate_text(driver, input_field, chunks)
# Save to output file
write_output_file(OUTPUT_FILE, translation)
# Shut down the browser instance
driver.close()
The Initialization of the Browser
In which the Selenium-forged steed is summoned, and the journey begins.
First, we've got the initialize_browser()
function. This is where the magic starts - it's setting up a brand new instance of the Firefox browser, all decked out with our custom options. Headless mode is on by default! No need for a window to appear, we're going full stealth mode here.
The Parsing of the Text
Where the words are divided into manageable chunks, as if by the wisdom of the regex.
Next, we've got load_input_file()
. This one's pretty straightforward - it's just reading the contents of a file and handing us back the text.
Then there's split_text_into_sentences()
. This is where the script takes that input text and breaks it down into individual sentences. Gotta make sure we're not overwhelming the translation service, you know? Bite-sized chunks are the way to go.
And speaking of those chunks, that's where generate_chunks()
comes in. It's taking those sentences and slicing them, making sure each sentence-block is small enough to play nice with the translation service. No more hitting character limits.
The Gathering of the Fields
In which the input field is sought and found.
Now, the real showstoppers: get_input_textarea_element()
. This is the function that use Selenium to find the right spot on the web page to do our work. I mean the input field, where we're gonna pour in our text. Without it, no circus troupe at your fingertips, ready to leap through hoops an' do backflips at your every whim.
The Fetching of the Results
The final step, where the fruits of the labors are harvested and the story ends.
Finally, we've got translate_text()
and write_output_file()
. These are the heavy hitters. translate_text()
is where the rubber meets the road, sending those sentence chunks off to be translated and bringing back the long awaited goods. write_output_file()
is the grand finale, putting the whole shebang down on paper (or, you know, in a file).
Whew, that's a lot to take in, I know. But well, once you've got this thing up and running, you’ll see, it's gonna be smooth sailing. Just sit back, let the wind.. Hmm.. the script do its thing.
If you skipped the project introduction, feel free to check the first chapter or just dive deep into the source code in the next chapter!
The code is available on Github.
(Cover picture: Laura, 1944).