How To Extract LinkedIn Likers (DIY Edition)

How To Extract LinkedIn Likers (DIY Edition)

GillesDC / salesflare.com

About a week ago, I shared an out-of-the-box way to extract Profile URLs for the people that commented on your newest viral LinkedIn post.

If you want to catch up why it’s interesting to do that and how, head over to:

http://gillesdc.im/HowToGetLinkedInCommenters

Getting Profile URLs for LinkedIn Commenters is fairly easy, because the source code of the page actually reveals those URLs. All we had to do was use a Regular Expression (Rexegr) to the URLs out of the code quickly.

Likers is a different story. Inspecting the element that contains all of your post’s likers leaves us with hashed URLs – an encryption of some kind a non-coder earthling like myself cannot get pass.

There’s a workaround though.

What we can get are the names of the people commenting. They just show up on the page, so there’s no hiding for them within the code.

Once we have the names, we’ll run automated search queries in Google for each individual name on site:linkedin.com and scrape the first result.

In about 90% of the cases this will get you the LinkedIn Profile URL. Sometimes the first result is a link to a LinkedIn Pulse article they’ve written or the link to a search query on LinkedIn for that person. The latter is usually the case when you have a person whose name is fairly common.

Avoiding to have LinkedIn Pulse articles in domains could be avoided by writing a piece of code where you’d skip the first result if it’s a linkedin.com/pulse/ link, it’s just that I don’t know how to do that. I know how to blacklist domains themselves, but not specific slugs of a domain like in this case.

A way to avoid scraping links to LinkedIn search queries if you have names that are fairly conventional, is to include the tagline of the person you’re looking for in the automated Google search queries. You can also scrape these from the source code, but it’s more complicated than it sounds.

We’ll first go through the steps needed to get LinkedIn Profile URLs with scraping just the names and running them through Google. For the pros, there’ll be instructions on how to include taglines in search queries and improve your return rate at the end.

Rookie Level

Process

  1. Load up all the likers, triggering infinite scroll

  2. Copy <div class="modal-content-wrapper"> from the source code

  3. Pull out the names in Regexr

  4. Copy the names into a Google Sheet with header ‘Name’

  5. Set up Phantombuster Agent

  6. Link the Google Sheet to the PhantomBuster script I’ll be sharing with you guys

  7. Launch the Agent and download the file

Step 1: Load up the likers

Click the box with the three dots to load up a window with all the people who liked your post.

This list works with infinite scroll, meaning you have to go all the way down again and again to have all of them load up.

I would use a piece of JavaScript to trigger this automatically but the one I have doesn’t seem to work on the pop-upish window that contains LinkedIn likers.

For me it’s not that big of a deal (yet) to trigger the scroll manually as I have yet to write a LinkedIn post that gets more than 300 likes – but for people hitting 1k and more this may be frustrating.

I’m pretty sure a more experienced coder than I (which means any level of coding) can come up with something that does triggers the infinite scroll on LinkedIn’s liker boxes.

Step 2: Inspect the page and copy <div class="modal-content-wrapper"> from the source code

(In Google Chrome)

Right-click somewhere on the box, like on a name, and choose ‘Inspect’. 

Chrome will now pull up the page’s code.

Hover within the code until you get the element that covers the liker box.

It’s called <div class="modal-content-wrapper">

See the screenshot below. Right-click and choose Copy > Copy Element.

It doesn’t really matter which part of the HTML you’re copying – the only thing that really matters is that you’re copying code that contains the names you want, but this would be more or less the cleanest way.

How To Extract LinkedIn Likers (DIY Edition)
How To Extract LinkedIn Likers (DIY Edition)

Step 3: Extract the names from the code using Regexr

Go to http://regexr.com/

Paste the Element you copied into the ‘Text’ field.

Change the expression in the ‘Expression’ field to:

class="name">(.*)<

Under the ‘Text’ field, click ‘List’ and change the List code to:

$1\n

How To Extract LinkedIn Likers (DIY Edition)
How To Extract LinkedIn Likers (DIY Edition)

You’ll now have all the names in a nice list upon clicking the ‘List’ tab.

Step 4: Copy the names into a Google Sheet

Make a Google Sheet and copy the names into the first column.

Name the header ‘Name’.

Step 5: Set up PhantomBuster agent

To get from names to LinkedIn profiles, we’re going to search the names in Google on site:linkedin.com

Automatically, of course.

PhantomBuster is a platform that lets you do stuff like this. We’re going to hit it up with a task and have it execute that task as many times as needed to get all the LinkedIn profiles.

Head over to https://phantombuster.com and create free account.

Select ‘Agents’ and create a new one.

How To Extract LinkedIn Likers (DIY Edition)
How To Extract LinkedIn Likers (DIY Edition)

You’ll be taken to a scary-looking code interface.

Stay calm, you just need to copy-paste a script into it.

Specifically, this script: http://gillesdc.im/LinkedInLikersRookie

Copy-paste and save.

Step 6: Link your Google Sheet to the script

The script will be sourcing the names from the column of the Google Sheet you just created.

Replace the link in the code with your link, make sure the link doesn’t end with a ‘/’

How To Extract LinkedIn Likers (DIY Edition)
How To Extract LinkedIn Likers (DIY Edition)

How To Extract LinkedIn Likers (DIY Edition)
How To Extract LinkedIn Likers (DIY Edition)

Last step before you launch:

Go into ‘Settings’ and set a number higher than 0 for ‘number of retries’.

Google has the tendency to block IPs if a certain number of searches are run from the same IP in quick succession. The script will relaunch and use another IP if this happens.

A free PB account is restricted to 10 retries, meaning the Agent can relaunch a max of 10 times upon Google blocking the IP. If you have A LOT of post likers, this may not be enough to get all of them. The solution would be to take a paid PB plain or to simply remove the names for which you’ve gotten a profile from the Google Sheet and start over.

What you also can try is to change the Search Engine to DuckDuckGo.com because it won’t block you as easily, giving you more data for each retry. You can set this up in the source code pretty easily (just look for where “google.com/search…” is mentioned and replace it with a DuckDuckGo search string. I haven’t tried this myself yet.

Step 7: Launch and download

How To Extract LinkedIn Likers (DIY Edition)
How To Extract LinkedIn Likers (DIY Edition)

Pro Level

Just using the names to get LinkedIn URLs will get your fair results, but not perfect. Instead of the user’s profile, the Google scrape may return Pulse articles or search queries for the user’s name in LinkedIn – mostly the case when they have a common name.

To avoid this, you can include headlines in your search query – this will make the search query more specific and thus gets you more accurate returns.

Process

  1. Load up all the likers, triggering infinite scroll

  2. Copy <div class="modal-content-wrapper"> from the source code

  3. Pull out the names and headline in Regexr

  4. Paste the names and headlines into an Excel spreadsheet and perform some magic to get them into two columns

  5. Paste the names and headlines into a Google Sheet with headers ‘Name’ and ‘Headline’

  6. Set up Phantombuster Agent using the script I’ll be sharing with you guys

  7. Link the Google Sheet to the PhantomBuster Agent 

  8. Launch the Agent and download the file

Step 1: Load up the post likers

This is the exact same step as in the rookie level. Just trigger the infinite scroll up until you’ve loaded all the post’s likers.

Step 2: Copy <div class="modal-content-wrapper"> from the source code

Again the same thing as for the easier way.

Inspect the page and get to the div containing all the post likers, then right-click that element and copy it so you can paste it into Regexr.

Step 3: Pull out the names and headline in Regexr

This is where it gets different.

Before we were interested in just the name – now we need to extract the the headlines too.

You would think the most easy way is to first us a regular expression to extract the names and then repeat the process for the headlines. 

The problem is that not all users appear to have a headline. When I tested this, I had 140 post likers. Extracting the post likers indeed gave me 140 profiles – when repeating it for the headlines I only got 134 though. Since there is no automated way to match the headlines to the names afterwards, we’re going to have to extract names and headlines on the same row and split them afterwards in Excel.

So, we’ll be using another regular expression compared to Rookie level.

More specifically, this one:

class="name">(.*)[\n](.*)<\/

At the list view, use this to get both name and headline:

$&\n

You should now have something like this:

How To Extract LinkedIn Likers (DIY Edition)
How To Extract LinkedIn Likers (DIY Edition)

Now, paste that output in a new Excel sheet.

Step 4: Paste the names and headlines into an Excel spreadsheet and perform some magic to get them into two columns

Pasting the output in an Excel sheet will get you one column with the names and headlines alternating each other + a bunch of code in between we don’t need.

We’re going to perform some quick Excel moves to clean up the data and get everything where we want it to be.

First, we want to get all the even rows with the headlines in a new column on the same row as the names.

Use this formula in cell B1:

=IF(ISEVEN(ROW(A2)),A2,"")

And paste the formula all the way down in the B column.

This will get the even rows in the new column and because you’re putting those even A rows in the uneven B rows, they’ll now be perfectly alongside the name they belong to.

Like this:

How To Extract LinkedIn Likers (DIY Edition)
How To Extract LinkedIn Likers (DIY Edition)

Next, we’re going to get rid of the spaces in front of the headlines, using TRIM.

=TRIM(C1)

This will get rid of the whitespace.

How To Extract LinkedIn Likers (DIY Edition)
How To Extract LinkedIn Likers (DIY Edition)

The rest we can clean up using find and replace.

Make a new empty column right next to A.

Select the column with all the headlines without whitespace (which is now column D) and copy.

Select the header of your newly created column B and ‘Paste Values’.

Delete columns C and D.

In column A, get rid of class="name"> and </h3 using find ‘Find & Replace’ replacing with nothing.

In column B, do the same for <p class="headline"> and </

Step 5: Paste the names and headlines into a Google Sheet with headers ‘Name’ and ‘Headline’

Speaks for itself.

Instead of one column, we now need two. The script you’ll be using in PhantomBuster will look for columns with the names ‘Name’ and ‘Headline’ so make sure to name them accordingly.

Step 6: Set up Phantombuster Agent using the script I’ll be sharing with you guys

Time to scrape Google again.

For Phantombuster to get take both columns into account, set up a new Agent using this script:

http://gillesdc.im/LinkedInLikersPro

Don’t forget to save.

Go into Settings and set the number of retries to 10 to make sure the script automatically relaunches if Google blocks the IP. Don’t forget to save.

Step 7: Link the Google Sheet to the PhantomBuster Agent 

The script will be sourcing the names from the columns of the Google Sheet you just created.

Replace the link in the code with your link, make sure the link doesn’t end with a ‘/’

How To Extract LinkedIn Likers (DIY Edition)
How To Extract LinkedIn Likers (DIY Edition)

How To Extract LinkedIn Likers (DIY Edition)
How To Extract LinkedIn Likers (DIY Edition)

Step 8: Launch and download

Same thing as rookie level, launch the other script and gather results (output.csv)

Results will be close to 100%.

Targeting

You can auto-connect with your engagers using a tool such as LinkedInHelper or GPZ LinkedIn Tools. Or you can use Dux-Soup’s newly launched autoconnect feature in combo with it’s Revisit feature.

If you have no idea what I’m talking about, feel free to hit me up on Messenger or connect with me on LinkedIn.

Leave a Reply

Your email address will not be published. Required fields are marked *