With as much Javascript work as we're doing these days, I'm starting to do more and more quick one-off utilities in Javascript. Yesterday I had such a task: update over a hundred different ad slot codes that appeared multiple times in a text file.

The problem: do all the replacements from a CSV file, in a different file

The file in question was a Javascript file, and the ad codes were all quoted, and followed a pattern. An ad consultant had provided a CSV file with the old codes in one column, and the new replacement in the other:

CSV data

There were over a hundred of these codes to replace, and they appeared some 240 times in the Javascript file. One immediate issue you can see is that, as in the picture above, some of the old strings were substrings of other old strings -- if we were doing these in order, and there was a match on the 4th item, it could end up with the first replacement (with an extra 2 at the end) if we weren't careful.

Perusing the target file suggested an easy fix:

                googletag.defineSlot('/8501595/MedRec_A_InStream_300x250', [300, 250], 'div-gpt-ad-1481759363742-2').addService(googletag.pubads());
                googletag.defineSlot('/8501595/LargeMobileBanner_B_InStream_320x100', [320, 100], 'div-gpt-ad-1481759363742-3').addService(googletag.pubads());
                googletag.pubads().enableSingleRequest();
                googletag.pubads().setTargeting('NodeTitle', ['Home']).setTargeting('ContentType', ['homepage']).setTargeting('SecondaryCategory', []).setTargeting('EventType', []).setTargeting('PrimaryCategory', []).setTargeting('NodeID', [targetingObject['nodeId']]).setTargeting('IndexPage', []).setTargeting('PageView', [pageViewCount()]);
                googletag.enableServices();
              }
            });
          }

          injectAd($('.google-ad-footer'), 'div-gpt-ad-1481759237264-5', 'google-ad desktop centered force-block-display', '');
          injectAd($('.google-ad-footer'), 'div-gpt-ad-1481759363742-3', 'google-ad mobile', 'height:100px; width:320px;');
          injectAd($('.google-ad-header'), 'div-gpt-ad-1481759363742-1', 'google-ad mobile', 'height:100px; width:320px;');
          createInterstitial('div-gpt-ad-1481759237264-0', 'div-gpt-ad-1481759363742-0');

... there are many different ways these codes are used, but they are always surrounded by single quotes.

So the problem to solve is doing over a hundred search-and-replace-all operations using a regular expression. Lots of ways to do this, but on this day my brain was working in Javascript, so that's where I went.

The Solution: Write a command-line script in Javascript

Turning to my shell, first I grabbed a couple libraries: Command, which provides an easy way to quickly scaffold a command line tool; and PapaParse, to parse the CSV file:

npm init # Create the project
npm install commander papaparse

In my index.js file, I required these libraries, as well as Node.js's filesystem library:

#!/usr/bin/env node

const program = require('commander');
const fs = require('fs');
const Papa =  require('papaparse');

Now it used to be more of a hassle to do Javascript I/O operations, such as parsing a file. Libraries like PapaParse don't return results -- instead, they take a callback function as an argument, and pass the result to the callback function. This is what leads to "callback hell", and is one of the reasons a lot of developers dislike Javascript. But these days, there are other ways of handling this.

First off, there are Promises. Promises flatten your code by moving the callback to an object you can more easily pass around.

With ES2017, asynchronous functions were introduced, which takes this a step further, and is available in Node.js 8 and later. This flattens your code even further, and is a great fit for our utility. We don't care about our little program having other work to do -- we want more of the Unix approach of doing one step, then feeding the result into the next. Async/await allows us to write the utility as if each step were synchronous, e.g. fully complete before the next step runs.

Under the hood, asynchronous functions make use of Promises, so for this utility, we need to wrap our PapaParse operation in a promise. Here's what that looks like:

Papa.parsePromise = function(file, options) {
    return new Promise(function(complete, error){
        Papa.parse(file, {
            ... options,
            complete,
            error
        });
    });
}

... we have added a parsePromise method to the Papa object, which now returns a promise and thus can be used with Async/Await.

Next, we set up our actual command:

program
    .arguments("<csvfile> <target>")
    .action(async function(csvfile, target){
        const csvStream = fs.createReadStream(csvfile);
        const results = await Papa.parsePromise(csvStream, {
            header: true
        });
        const targetFile = fs.readFileSync(target);

# Do stuff here

        console.log(final);
    })
    .parse(process.argv);

This is essentially code we use a lot for this type of thing. "program" is the Commander object, and it has several simple methods for setting up command line arguments. It calls the function passed in .action with the arguments provided on the command line. You can also add multiple .option() methods to add various parameters, if you need them -- this is a really simple way to quickly get something sophisticated up and running.

.action is where all the action is. Here, we pass in our function, marked "async" -- this makes it so we can use "await" inside the function body. We open the file as a stream, pass it into our parsePromise Papa function, and await the result. And we read the target file. By the time we get to # Do stuff here, we have the parsed CSV file in results, and the raw file to make replacements in in the targetFile variable (as a buffer, this will still need to get converted to a string). And we will print the final result from a variable called "final".

The last thing added to the program setup is the actual parsing of the command line input.

So that's the full setup. Now, the meat of this code, the whole reason we wanted to use Javascript, can come down to a single line (split out here for readability):

        let final = results.data.reduce(
            (output, row) => output.replace(new RegExp(row.old + "'", 'g'), row.new + "'"),
            targetFile.toString()
        );

The data is in results.data. "Reduce" will iterate through all of the items in data, calling a function and accumulating a result which gets returned to "final" (which is what we're printing at the end of the .action function). The starting data (the second parameter to reduce) is the target file converted to a string. The first parameter to reduce is a newer style arrow function. The first parameter of that arrow function is the accumulator, "output". This starts out with the string of the target file, and in each iteration, gets the return value of the previous iteration. The second parameter, row, gets the value of the specific row from the spreadsheet.

With an arrow function, the right side of the arrow implicitly returns the output of a single Javascript statement. (If there were multiple statements, we would need to add braces and an explicit return). So our single statement does a Regex replacement, using the "g" flag to replace the value globally, and adding the trailing single quote to make sure we replace the exact string instead of a substring.

That's pretty much it! A single line does all the work, the rest is all just file operations which is approaching boilerplate at this point. To use this, we need to make the script executable, and send the results to a file:

chmod +x index.js
./index.js replacements.csv google-ads.js > test-google-ads.js

Done!

Add new comment

The content of this field is kept private and will not be shown publicly.

Filtered HTML

  • Web page addresses and email addresses turn into links automatically.
  • Allowed HTML tags: <a href hreflang> <em> <strong> <blockquote cite> <cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h1> <h2 id> <h3 id> <h4 id> <h5 id> <p> <br> <img src alt height width>
  • Lines and paragraphs break automatically.