This webcast was originally published May 25, 2017
In this video, BB King discusses advanced techniques for leveraging Python in open-source intelligence (OSINT) gathering. He demonstrates how to extract and analyze data, providing insights into handling and parsing responses from various APIs and web services. Through practical examples, BB illustrates methods to enhance data collection and analysis for intelligence purposes using Python.
- BB King presents on using Python for open source intelligence (OSINT) to gather information, aiming to make attendees comfortable with Python for their projects.
- The webinar discusses techniques for extracting useful information from web responses using Python, and handling different data formats like HTML and JSON.
- The session also covers the use of tools like Recon-ng for automating OSINT tasks, and the importance of verifying gathered intelligence.
Highlights
Full Video
Transcript
John Strand
All right, everybody, so let’s go ahead and let’s get this started. I’ll kick it off. so BB King is going to be giving this presentation, kind of a continuation of a theme. The last webcast we had was Joff talking about Python, which was, I think, horrifying for a lot of people.
Whenever he’s doing regular expressions on the fly, and people are like, well, could I have a kitten? He’s like, all you got to do is change this in the red edge. And, now we’re going to continue that theme and, that creepy vibe.
And BB is going to be giving a presentation on how you can stalk your friends and family using, python, because that’s what we do here. We bring families together, whether or not certain members know it or not.
and also we have a really, really cool conference coming up. Deadwood, south dakota. We’re doing wild west hack and fest. The, training will be the 25th through the 26th, and the actual con is the 27th through the 28th.
It is in Deadwood. It is over right about the same time as dead weird, which is an awesome, super cool Halloween party. You guys should please check it out.
Tickets are still available, although we’re selling them quickly. We do expect that we’re going to sell out, as always. And, we have some amazing people. We got Dave Kennedy’s coming. We’ve got Ed ScOtus, we’ve got Mike poore.
We’ve got a whole. We have all the people. We’ve got all.
Sierra
Egypt.
John Strand
Yeah, Egypt, Mubix, secure ideas. We have Kevin Johnson coming. Larry is coming, Paul is coming. M Chris Gates.
BB King
I can’t even.
Sierra
I can’t even name them anymore. There’s so many. So. But they’re all on the website.
John Strand
They are all on our website.
Sierra
So we’re pretty excited.
John Strand
Yes, yes. So please, please, please check, it out.
Sierra
Yes. That is John’s head on the cowboy.
BB King
That is. Thank you for noticing.
Sierra
He did. I got fired sometimes. All right.
John Strand
All right. And I’ll, hand it. Oh, we were going to talk a little bit about the python class for sans as well. do we have that on another slide, BB? There we go. so sans has a python class.
If you guys want to learn more about Python, check out 573. And with that, let’s kick it over to BB. BB, please take it away.
BB King
All right, so, yeah, like John said, we had a great talk, about Python last week about log analysis and more kind of system administrative type stuff from Joff, who teaches that class, and that was great.
This is another use of python and open source intelligence. My goal here is to get you comfortable with Python. If you’re not into it already, but enough to get around. I’m going to try to help you get to the next step where you can start playing with it and find your own projects to have some fun with it.
I find if, I think most people are like this, if you get one of those gigantic books, like, have you seen the learning python book, that pink O’Reilly book? It’s like three inches thick.
I would get to page 60 and lose interest.
John Strand
I like that picture that’s on Twitter. It’s like most coding books, it’s the equivalent of how to draw an owl. And it’s like, draw two ovals, and you draw the two ovals, and then it says, color the fill in the rest of the owl.
Sierra
You’re done.
John Strand
There you go.
BB King
That’s it.
John Strand
That’s coding, right?
BB King
Well, the story I heard about the big python intro book is it’s mostly whitespace. Get it? Python joke. All right, so I’m going to help you get your python going, if a little bit.
this will help you find something to work on to keep going. We’re going to talk about what Osint is. and we’ll walk through an example. I’ll show you some reconnng stuff, and, we’ll get moving so that you can leave this with something that you can take away with you.
So when I first started, I did this, talk similar to this one, a year ago at a python conference locally. And when I put it in, people said, what’s Osint? I don’t know what Osint is. So I asked the developers that I know on Twitter, and they said they didn’t know what Osint was, and I realized that this was a case of sometimes we talk bad, and we don’t realize that we’re talking bad.
We use words that don’t mean things to the people that we’re talking to. So, python has one. And this was the first thing that kind of got me, little. I don’t know.
I felt Python was kind of strange when I came across this concept of list comprehension, because it’s something you use all the time in python, but the meaning of this comprehension has nothing at all to do with either of the words that’s in it.
Well, maybe list, but comprehension is, you have to dig through the dictionary to find, an instance of the definition that corresponds to what python actually does for it.
so if you’re not familiar m with python and someone says, oh, you should use list comprehension for that, they’re not going to follow you. so Osint is kind of the same way.
it’s jargon, I guess. I didn’t realize that it was, but it is so very quickly. I think most of the folks here know what Osint is. it’s open source intelligence.
intelligence is just information that comes from different sources. Those sources can be open or closed. so Osint is open source intelligence.
So just some examples of sources that are considered to be open and things that are not. So if you don’t need special access or permissions or credentials or a position with an agency to get access to it, then it’s kind of open source.
initially, years ago, this meant, like, newspaper articles in foreign countries. now it means anything you can find on the Internet. And a lot of things that you think might not be open source turn out to be after there’s a breach of some kind.
So this is what we’re looking for. We’re looking for information that’s just available out there, if where to look and what to look for.
So here’s one example. I did this just the other day. I wondered if there was anybody at SpaceX named Brian who, had a site on LinkedIn.
So I searched for that with Bing. Bing, does the same search operators that Google does. And this is the guy I found. And I pixeled out the stuff, just so you can find in the same way. But I don’t want to call attention to this guy if I don’t need to.
so, open source intelligence, what might we do with this? On just this one page, LinkedIn gives you people also viewed over here. And, some folks make their, profiles public, and they put lots of detail out there about what they’re doing, which is awesome for pen testers that are looking for information.
How do we get more information from people? so this guy was the manager of, vehicle assembly. And over here we have the vp of, vehicle engineering.
So maybe that’s his boss. And we have other people here. There’s a guy named Shawn who’s in manufacturing. Stephanie’s a supplier performance specialist. Cesar down here, Kristen down here, all these people that he works with.
So maybe if you find his phone number or his email address and you want to try to extract some information about him, this is some stuff that you can use to do that. This is open source intelligence.
if you’re interested in python and open source intelligence, where those two things get together, you could do much worse than looking up Justin’s sites. he has a website called automating Osint and he’s very active in this.
he’s all about teaching and he gives you really, well documented code examples, things, that you can follow, things you can build on. this is just a recent thing from him about, monitoring pastebin, for terms, that might be of interest to you.
So he walks you through setting up, a script and a file of input, terms to search for. And you can have this thing run periodically, on, a shared, vps.
Or you can run it locally and whenever those keywords show up in pastebin, you’ll know about it. So you don’t have to bother doing the work yourself. Once you set it up and know what you’re looking for, you make the computer do the work for you and you just process the results.
another quick example, before I get into the meat of this, need a debit card on Twitter, is kind of fun to follow. this again, I pixelated these things out. They were not like that when I found them.
people, I don’t know why, but they like to post photographs of their credit cards or their driver’s license or their first paycheck or their passport.
crazy stuff that people just out there on Twitter, maybe not realizing that it’s public, maybe not realizing that those numbers are meaningful to people and could be useful in identity theft.
but even there, people play the game both sides. So this one he’s talking about, the number. Ooh, it’s a lucky card because it’s got 777 on both sides. Well, it’s the same number on both sides, so of course it does.
and if you look at this one here, see, it says, already canceled and he’s trolling you. So you can’t always trust the information you find. But there’s a lot of it out there to look at and it can be fun.
So why would you use Osint for your next Python project? Because it’s kind of fun and it’s really what makes it fun to me is that you get to see all this information that’s already out there, just kind of for the noticing.
It’s, a whole different web out there once you start looking at it through, programmatically through Python or something else. So here’s one example that we do at work when we’re trying to do some reconnaissance about a company.
we’ll find individuals, we look for email addresses. we look for lists of people who are working at a place and try to find out what we can about them so that we can buddy buddy up with them, do some social engineering, see if we can get some more information out of them.
So I started with, we’re doing Python. So I googled the phrase Python programming luminary and Steve Holden was like one of the top four hits.
And he’s also got a website out here on O’Reilly and he’s also published his email address. So I’m comfortable using his email address as an example because he stuck it out there for us.
So you start with the email address and you want to check to see if it’s valid. There’s a service out there called mail tester. There’s lots of ways to do this, but there’s one out there called mail tester. And you just give it an email address and click check address and it goes, and it uses the domain name, it looks up the MX servers and then it queries those MX servers to find out if that’s a valid account, not just a valid email address, like a regular expression, like, is the format correct?
But does this email address exist on this domain? there’s a couple ways you can do that. This is it all in the background for you, and it tells you whether it’s valid or whether it’s invalid.
So not Steve is invalid. It doesn’t exist on that server. So you get one of two answers, this is a good address or it’s a bad address, right? That’s great, but there are some servers that don’t give you that information.
So, if you’ve ever interacted manually with a mail server, which is fun, there’s like two or three ways you can do it.
You can type out, there’s a verify command that just says, does this exist? and if that’s disable, it will tell you to just try to send an email and we’ll see what happens. And the other is you can do receipt to rcpt two and then type in an address and hit return.
And it will sometimes tell you that that’s not a valid email address if the verify one doesn’t work. and some of them just won’t tell you. Some of them know what game you’re playing and they’ll say, hey, if you want to send an email, send it, I’ll do the best I can.
So yahoo does that, gmail does that, a bunch of servers are starting to now not tell you whether it’s a valid email address. So this is where you get into the osint part and the learning part and the paying attention, the advanced paying attention I talked about earlier.
so I imagine most of about f twelve in your browser. It’s been there for years now. so I think all the major browsers now do this. on any page in f twelve and you open up the developer tools console and in what was this, chrome.
I think this is kind of what it looks like. These are all tabs, different things you can look at. and it will show you when you send the request, it will show you what the request looked like.
So this request, when I tried to validate Steve’s email address, it sent it to this URL. It sent it by a post, not a get, and the response was 200. And these are the parameters that it sent.
It said a language which I didn’t type in, and an email which I did. So the interesting thing here is that all I did was I said, hey, is Steve Oldenweb a valid email address?
And my browser, the website added some information for me. It said, oh, also we want this to be in English. So this is a simple example of that. But this is how people get into trouble.
with online services a lot of time is the information that you intend to post is not the only information that gets posted here. It’s harmless with the language, but there are other cases where that, that metadata, that extra stuff can be super interesting.
There’s just incisive thing on stalking people basically through Facebook. And so for this one to work you have to be able to see someone’s Facebook timeline.
So maybe it’s their public timeline, maybe you’re friends with them, maybe you’ve tricked them into being your friend. And you look at their timeline and buried in the HTML is not just the content that they’re posting, but also a timestamp of when everything is posted.
So when that person posts that post, they’re not typing in the time, but the browser is adding that for them. So this is again something you don’t intentionally put, but it’s out there.
So Justin takes that to build like a histogram of when people are posting. So when do they sleep, when are they at home? Are they posting at work all day? just kind of interesting things, funny things.
But so how is that useful? It can be useful if you’re if you’re stalking somebody, of course. But it can be useful if you’re maybe part of your job at work is to monitor your employees Internet use.
And maybe they’re not supposed to be posting on Facebook all day. Maybe you pay them to perform some task other than posting on Facebook. And if there’s an investigation you could do this, you could check their Facebook timeline and you could see when they’re posting and, if they were on the clock at that time.
Well now you’ve got some information, more than just speculation about what they might have been doing when. So this is what mail tester is doing.
It’s talking to the mail server. You’re not talking to the mail server. It is. That’s one of the assumptions in recon ng and a lot of open source intelligence is you’re trying to make it so that the research can’t get tied back to you.
So if you interact with a system that is controlled or available to the target, then they can potentially notice that you’re researching them, that you’re looking at their stuff.
And initially here you don’t want to do that. So mail tester is talking to the mail servers, but you are not. And I suspect it would be difficult to go to mail tester and ask them, hey, who was it that looked up Steve at Holden web at 01:16 p.m.
the other day? I don’t think they would have that information. So let’s, what would this look like in Python? So we figured out what is the URL we’re getting to, what’s the method as opposed?
Not yet. And what are the parameters that go, Python has awesome modules and this is what I think, like the greatest value in Python is just a huge variety of really high quality modules that insulate you from lots of details, and make things super simple to do.
So this is all it takes to send that exact same request in Python. You import the request module, you have to install it first. It’s not part of the default install, but that’s literally three words, Pip, install requests and wait a second.
And there it is. So this is the request you’re sending. We got this out of the browser and we’re gonna look at the response status code and it’s a 200. But here’s the response.
So what is that response? It’s the HTML that your browser was going to render right. It’s not giving you just a quick answer whether you’re good or not, whether that email address is valid or not.
It’s buried in there somewhere, but it’s not easy to find. See how small the scroll bar is. There’s lots of stuff, so how do you pull it out?
And this one kind of takes me back, years and years ago, before I used python, I used Perl, and I still miss Perl. but I would try to parse documents for a lot of things at work and it was a lot of parsing HTML.
there was a perl module called www mechanize that did a lot of this stuff for you, but it would choke on things that were not well formed.
And browsers, like the fundamental purpose of a browser is to render poorly formed HTML so things that the browser can make sense of.
It’s difficult for modules to make sense of, it’s difficult for you to manually go in there and handle, oh, there’s no closing tag or oh, everything’s on one big long line, or there’s lots of white space between these things, lots of low level stuff that the requests module can help you not worry about.
And other modules. There’s an XML parsing module I’ll get to in a minute that helps you with that also. So anyhow, so regular expressions again, right? this is kind of an overkill for looking through the response.
And if you want to know more about regular expressions in Python, Joff’s talk from the other day is fantastic for that. he’ll show you why this is a little bit dangerous just to have this here and not a little thing in front.
Anyhow, this looks through the content for the phrase is space valid? And here I’m printing out the zero for the first match that matched and it found it so, okay, so great, so it said Isvalid, but maybe that string appears more than once in the response.
and if you look at the response as rendered in the browser, you’re probably not going to see it. I think I showed you before, there’s not much on the page, but it’s not uncommon at all for text, that is displayed to appear as in a comment somewhere in the HTML, or as an attribute for an HTML tag.
There’s lots of places that could be. Maybe this is part of a JavaScript that’s totally unrelated to what you’re doing.
John Strand
Hey, Brian, just a quick note, it kind of looks flaky on your screen with your share, your screen. but I think gotowebinar is flaking out because your audio is going up and it’s going down.
I think it’s gotowebinar, but everybody that’s on, just so you guys know, we will share the slides as well. So even though it kind of looks jarbled, you guys will get a copy of the slides to make it easier for everybody to see it after the fact.
BB King
All right, I’m sorry about that. I checked my speed, everything’s good. Seems like it’s good network wise. So yeah, slides available? so we could parse the document manually as we just talked about, go through and try to build the dom tree and figure that out yourself.
But what if it’s not well formed? That can be difficult. if you’re new to python you might know about modules, but there’s lots of them. So how do which one to start with? And this can be a rabbit hole to trying different modules and different philosophies.
You’ll find with Python as with anything, there are like religious zealots out there who will say you should never ever do it this way. And people say you should only ever do it this way, about the exact same thing.
So when you’re getting started it’s hard to know what to look, what to use, what to pick up, what’s valuable. so the way I solve that problem for myself is I try to look at what other people have done before.
And this is where reconnaissance comes in. doing this, looking at other people’s code can save you lots of time. It can show you how to use features, and techniques that you didn’t know about.
Maybe there’s a tool that you use all the time, but you didn’t know it had this one cool feature here that’s going to save you a lot of time. that’s how I learn now, I don’t look at books too much.
I try to look at actual living code to see how people actually do stuff. So like I said, reconng is a tool that’s been out there for quite some time now.
it’s sponsored by bhis, but Tim tones does all the work and maintains it. and it’s out on it. It’s actually on BitBucket, not GitHub, but recon dash ng.com will send you there.
it’s open source, it’s free. There’s not even like a paid version or paid modules. It’s all entirely free. Tim is really open to new development, to additions and modifications.
he’s got a development guide out there that I suggest you read before you send any pull requests, because he’s got some philosophy, about how he maintains this, if you think about open source stuff where you have contributors from all over the place, the contributor is going to give their piece of code and then run away and the maintainer has to keep that going for however long.
So he’s reasonably got some expectations and some requirements for how those things should be built so that he can maintain them over time. the biggest thing there is, he really tries to minimize dependencies.
So there’s a parsing module, in python called beautiful soup, which I think is a play on, tag soup. Like if things channel tags tag soup.
So beautiful soup does a lot of the stuff I talked about earlier where it’s not a well formed document. Maybe there’s a missing a tag, maybe there’s an extra tag. Beautiful soup can help with those things, but he doesn’t have that included.
And so if your pull request pulls in this whole other module, he’s going to say hey wait a second, maybe there’s some other way you can do this. That’s a dependency, it’s not part of the default vault install.
So anyway, read the development guide before you do anything major on those things. So in recon ng, it’s just a database.
And then scripts that call third party services like mail tester to populate that database. These are the categories, the tables that are in the database, and these are some of the modules for how you can fill them.
Here’s the schema for how things are laid out. Contact, is a person, a profile is like your LinkedIn or your GitHub page repository is your GitHub repository company’s credentials, this is from public data leaks, and hosts, which is hosts, web servers, whatever kind of servers out there that we can find through there.
Getting to use recon ng at first, when I first started using it, I had the hardest time making sense of what the syntax was for calling things. And the biggest thing that helped me the most was to realize, to see this separation.
This is how the modules are published. it’s kind of like metasploit. It’s meant to be a little bit like metasploit. So there’s a section and a slash and then from, to from what table to what table?
And then the module name. So the bing LinkedIn cache module is part of the recon family. It pulls from the company’s table and it populates the contacts table. So that makes sense, right?
So in order to use that one you have to have a company and when it’s done it will hopefully have populated the contacts table a little bit. here’s one that takes contacts and it updates contact.
So this mail tester reads from your contacts table, gets some more information hopefully and then updates the context table with that. And this was the trick. This was, for me, this was the thing that made reconng easy to use.
going from I don’t understand this to really easy to use. You can search for anything that’s part of that path. So if you search for contact Dash, this is going to find you any module that starts with the contacts table and does something.
So you could also search for dash contacts and that would get you things that fill the context table.
So we use mail tester manually and there’s a module for mail tester. So this is the great thing to start learning from because how it works on the web and how it works in our little python thing there, at least part of it, how to make the request and get the response doing.
so we’re going to search for mail and it turns out there’s only one thing that has mail in the name of it and show options, just like metasploit. there’s show info which gives you options plus extra stuff.
There are choices for the source, all of these, all the modules, there’s a default source. Usually it reads from one of the tables but you can feed it a string directly. You can change the query that it runs.
you can control where it gets its data, from when it runs the module. So here’s how to manually do this guy.
we would add to the context table and then it prompts you for all these things. So I filled in his name and his email address. These are all optional. nothing is required in the reconnng database and this is what you end up with.
So looking at the options, the source is, I’m just using the default source. And the default here is, I’m sorry not to remove invalid email addresses.
So if mail tester comes back and says that’s not a valid email address, you have an option here to tell reconnaissance to get that out of the database because it’s no good.
you run the module and it gives you email addresses valid. So it did all that work for you. It created the request, it sent the request, it read the response, it partially response, and it decided whether it was valid or not. All for you here.
So that’s great, but what is it doing? If we’re trying to learn, how Python might do this, what’s it doing? there’s a turn on debug output in recon ng, which is called, it’s not called debug, it’s called verbosity.
There’s a verbosity level of zero, one or two, and I think two turns, this on sends you what the request looks like. So you can see same thing sending testmail Php to mailtester.com with Steve Holdenweb and the languages English.
And that’s the response we got. You can also set up a local proxy. You can tell recon ng to use a proxy.
And I always send everything through burp suite just because even what I saw just there isn’t enough for me. I want to know exactly what’s going on. So you can send everything through burp suite, turn off interception and just go back and look at it, after it’s done, to see what it’s been getting.
So to see how it’s actually doing that, you can look in the module itself. And here we are. This is the reconnng folder that gets created when you check it out. And then there’s a folder called modules and then recon and contacts and then mail tester.
So this is exactly what you use to load the module. it’s the exact same thing. So once once you’ve used a module, where on your file system to find that module. And this is the entire script, the whole thing is 35 lines long and a bunch of it is essentially comments.
So this I think is a good example of python coding, especially if you’re just learning how to do this. to me if something is concise and readable, that’s a win.
That’s what I want to use as my examples to learn from. So let’s go through this module real quickly and see what’s in there. It imports two modules. One is part of the recon module and the other one is lxml HTML.
and it imports a method from there. So what is that? What is lxml HTML? So google for it to find out. this is the page, where it describes that I just googled for python lxml.
And I found there’s this from string method, which is what we’re inputting here.
John Strand
Hey bb, we just had a good question for Mike. He said, can you change the user agent?
BB King
Yes, you can change the user agent. The default user agent identifies reconnng as recon ng. But it’s a setting and you can set it to be anything you want.
So if you would like to be a browser, you can be a browser. If you would like to use your own, user agent, you can do that too. So yeah, if you want to be extra stealthy, set it to, send something from Firefox through burp suite and see what its user agent string is and then set that, recon ng to use that same thing.
Good question. So this fromstring method takes a string and it creates some sort of a data structure out of it.
And that’s good enough for now. We don’t need to spend a whole lot of time going into the mechanics of that because the module is doing it for us. That’s the whole point. so looking through this line 29, so this is from string.
It’s created some object and then I’m not sure what it’s doing here. It’s removing something. Maybe that’s important, maybe it’s not. But on 29 here, this looks like it’s getting a message list and it’s using xpath.
And maybe you don’t know Xpath, but if you read it, it’s kind of intuitive. Kind of intuitive. So there’s a table and we’re looking at the last thing there. And there’s a table row and the last thing there.
So it kind of makes sense if you just take the time to look at it and try to, noodle it out. What’s it telling you?
Sierra
Hey, Brian?
BB King
Yes?
Sierra
Someone, wants to know, are you using Python 2.7 or three?
BB King
This is 2.7. the whole recon energy project is based on the 2.7 branch. the python two versus python three thing is, I don’t know, it’s almost a distraction.
It’s one of those things that some people have really strong opinions about one or the other. And there’s a good argument to use Python three, and that is that it’s newer and python two, I think it’s end of life is scheduled for 2020.
but they’re really almost different languages. You can’t always take a python two script and run it in a python three interpreter. there were some breaking changes that they made on purpose for good reasons, that make it so that you kind of have to choose.
2.7 is kind of cool because it back ported a bunch of those changes so that you can write a script that will run in 2.7 and is more likely to run in a three, on the three branch.
but recon Ng is all in 2.72.7 is what I have that works. So we’re going to see what this is doing. We’re going to save those HTML files because then we can read them locally.
I, don’t have to interact with the server. It takes a lot of variables out of the equation and we’re going to look through it to see if we can find what it was finding. And this is a good excuse, this is really just an excuse to learn some more tools.
Everybody knows grep, but what does the grep output look like? What you usually do? Maybe this isn’t what you usually do. And here’s some stuff that grep can do that maybe will be useful in the future. You can have it print the line number, you can have it print some context around the match and not just the match.
so we’re using that here. So we’re going to look for the last instance of the table and it finds it down here on line 149.
So that’s the last table row. And then we’re looking for the last table data. And that’s obviously below this one. that’s down here on. Where is it this guy? And look, we’re getting to make sense.
And this is the content of that role. So this makes sense. This is the text, the string that we saw in the UI when we first used the thing through the website.
Now how do you do this yourself? Yes, Sierra, sorry, I feel like I’m.
Sierra
Interrupting you, but another question. Think about these online program converters. They translate from one programming language to another.
BB King
I haven’t used scary, but that’s not scary.
John Strand
That sounds horrifying to me, but it actually sounds like it might be kind of fun to play with. I don’t know if I do anything with production level code.
it sounds like a neat trick.
BB King
Some of the things are kind of deterministic. One of the changes from Python two to three is that the print function became a function. So in python two, you do print and then your string, and in python three you do print and then your string in parentheses.
So that’s an easy thing you could programmatically do. You can just insert those parentheses. But there’s other stuff, like the math that it does has changed. I think the division, operator, I don’t remember what the difference is, but I know that it used to return something.
I think it returned the floor, like the lowest, the decimal part of the response or not the integer part of the response. Anyhow, some of it’s not deterministic.
So we’ve got our local files, we’ve saved successful, response, failure response, and then we also saved that not allowed response because I wonder what it’s going to do with that. That’s something that’s, it’s a third option.
Either it’s good or it’s bad or. What do you think the module does with that third option? What would you do with that third option? Is it going to be meaningful? So we need to load from a file, not from a string that we’ve gotten off the wire.
So we’ve got to read the documents a little bit here from the xml and there’s a parse function that takes a file. So awesome. We’re going to use that.
And in the interactive python interpreter we’re going to play with these to see if we can get the same kind of result that, the reconnaissance module got.
And this is all it takes too. So you open up the python, ide vital. I can’t remember what you call this thing. The, repl, that’s what it is, repl, which stands for something.
so you import the parse function, you parse the document that you loaded. this is just copied and pasted from the recon ng and then message list to print it out.
It gives us that string. Got it. And then there’s this other line in the module that joins on spaces, everything in the message list.
So, okay, so I’m going to print that and I get the same thing. So this was, an array of one item. So it’s the same here. If I had more than one item, I would have more than this here.
I’m not sure how that would come into play, how that would factor into this particular service, but there’s some logic going on here that seems to be extraneous and then we’re not found.
and this gives us the same kind of response that we expected, making sense. And in the module, the not allowed condition where the mail server is not going to tell you if it’s good or bad, just kind of gets ignored.
it does not exist. If it doesn’t exist, we delete it. If it were set to delete the email, but the not allowed condition, there’s no case to handle that, in the recon ng module.
So why is that interesting? That’s interesting because the module is doing stuff for you. It’s taking all of that big HTML response and it’s pulling out what’s important and it’s doing something with what’s important.
But the person who wrote this module decided what was important and decided what to do with it. So in this case it seems a reasonable thing to do. If it can’t verify whether it’s good or not, it seems reasonable to just move on to the next one.
But maybe for your use case, there’s a better answer. maybe the mail server that you’re using should always return good or bad, and if it returns, I’m not telling you that indicates some problem somewhere else.
So to me this is like the difference between just using a tool and knowing what it’s doing and being able to adapt it to different circumstances, understanding what it does and why it does those things, and then choosing for yourself whether you agree that those are good things to do or not.
That gets you to the next step where you can make some of your own tools, make some of your own decisions, and make improvements to things. So I covered this when we talked before.
What’s the extra, why is it joining those things together on the space string? maybe there’s something else going on. Maybe there’s a different kind of response that has, that’s going to give us more than one, an array that’s more than one element longer.
I don’t know. I haven’t found out. I don’t know the answer to this one. But if you’re interested, if you just can’t let it go, this might be something that would be worth looking for. Maybe mail tester allows you to submit two email addresses at once.
I don’t know. So we talked about how to contribute to recon ng. It is actively maintained. and updates are very much welcome.
So this is one way to use Python to pull information off the web. And this is like the complicated way, the kind of painful way where you’re pulling it out of HTML, out of a context that has a totally different, purpose.
It’s meant to be rendered by a browser, not to be picked apart, by you in your script. so if you can find APIs that use the services you want, this is so much easier.
the APIs, they generally send and return, JSON JavaScript object notation. So it’s meant to be computer readable. You can actually take that response and turn it directly into an object in Python and then look through it.
So the response is going to contain an array of things or a dictionary of things. And you can look things up by their position in the array or by their name if it’s a dictionary. And it’s so much easier and so much more reliable.
So the next thing we can do, now that we know his address is good, we can find out where else this guy is. And this is where the spying stuff comes in. if this is a good email address, then maybe it’s used for other places.
And lots of services will use your email address as your username. And some of them even use that in the URL like Twitter does that. Your Twitter handle is twitter.com, your Twitter handle, GitHub, is the same way.
Lots of places use that as your identifier. So if someone’s identifier, you can go across all of those services that about and see if that identifier exists on all those services.
So you can see where else this person might be active. again, it’s not like totally reliable because anybody can register any name, but it’s a good first step and it’s a good place to go to look and see, maybe manually verify if this is the same person on this other service or not.
So full contact does this for you. They have a huge list of services and if you give them a username they will go and look through all those other services to see if that username exists there as well.
the website does this all, at once. The response is, it fills up as it finds answers to things. So there’s a big long list of services looking for and they turn red or green as the response comes back in.
So they have an API. So you need API key to run their stuff. And this is what I would do to figure out how to use that API and pull out what’s interesting to me.
I would send some requests manually and by manually it depends on how the service is set up. Some of them, rest based APIs, the thing you’re looking for is encoded in the path.
So those you can often use just through your browser. You can type out the URL, like the GitHub one, GitHub.com, whatever for that username. And if you get a response that has content, then that username exists there.
And if you get a different response then it doesn’t, and then do the same thing. But with Python to send some requests just like we did before. Use that request module, send the request, parse out the response, see if you can figure out what’s in there that’s interesting to you.
And this is where burp comes in so handy because with the script that you write is only going to show you what you’ve told it to show you. And until how it’s formatted in the response, you’re not sure what to ask for.
The API documents should tell you that, and they do. But sometimes it’s easier to see it in context with a real response. So if you send all this stuff through Bert, you can see what the whole response looks like.
and you can more easily know what you want to look for in that response. for the stuff that uses JSON, there’s a burp app store plugin called JSON Beautifier which takes JSON, that may not be formatted in a way that’s easy to read.
Maybe it’s one big long line and it formats it indented as you would expect it to be so that you can read it more easily. So Burp is an awesome helper for this as well.
And then the other thing you can do is run the full contact module from reconnng, maybe with verbosity set to two so you can see what it’s sending and what it’s getting back.
Every APIs that I’ve seen, this one, the full contact one, the GitHub one, the digitalocean one, all the ones I’ve looked at, they have examples for how to make these calls using curl.
And this will save you some coding time. Also the command lines kind of get really long sometimes, but it takes away a lot of complexity. It focuses directly on interacting with service and not so much with python or whatever programming language you’re using or the environment you’re running from or all of that other stuff, if you can do it with Curl, then you can do it somewhere else as well.
So using that with Mister Holden, we found all this information about him just from full contact. So full contact took his email address, that’s all it took. And it looked up his profile and where was this?
This was probably on Holden web. And then it found him on GitHub, on Twitter, on Flickr, all these different places. this one’s interesting because Google here, this is not his username, this is not his email address.
So it found something that maybe he wouldn’t have been able to find quite so easily. And then down here, don’t overlook this. The confidence at the bottom is just a number.
87 isn’t different from 82 or 91. it’s just a reminder that this guy might not be the same guy, it’s just the same name somewhere.
So take all this stuff as possibilities and then verify those things. Especially if you’re going to mess around with them. Like if it’s a friend of yours and you’re going to try to troll them on their other service, do something to make sure it’s them first.
Make sure that you’re not trolling random people that won’t know who you are. it’s just a starting point.
So in recon ng these are all these tables. So now we’ve got some profiles for Mister Holden. He’s going to be filled in here with what service it was, the URL where he’s at, all that stuff. So you can use all this out of reconnng directly to find where he is.
It will give you a list of the URL’s and you can just click on those URL’s and go view those things. So the value in recon ng here is that it does the work for you and it collects all the results in a way that’s easy for you to follow up on.
but you have to follow up on them. Just because it shows up in the results doesn’t mean it’s reliable. So that’s kind of the overview.
And this is where you can have fun with some python and to see how the web works when you’re not using a browser. just some ideas to do. Pick some friends, people, maybe people you follow on Twitter or somewhere.
Look them up through the APIs that that service offers and see what you can find about them. The Twitter API, you can pull down somebody’s list of tweets, all the ones that are available. There’s a limit to how many come down.
I can’t remember what it is, but it’s the easiest way to get everything somebody has said on Twitter. Because the web ui doesn’t show you everything in order.
some replies don’t show up the same way as individual tweets do. Things get manipulated, and algorithm into different orders and that kind of stuff. So that’s a great one to start with, the Twitter API, you do need an API key, but it’s free.
there’s limits, rate limits to how much you can send at a given time. But they’re totally reasonable for this kind of stuff, some practical things to do. Look up your employer or someone you might want to work for and see what you can find out about them.
if you can establish a common interest with somebody, what your employer.
John Strand
I don’t like where this is going.
BB King
This is a great way to find a job. This is a great way to find, maybe people that already work in your company that could help you in some of your goals.
So finding, common interests, I think it can be a little creepy if you don’t do it right. But just to find out that, oh, this other guy who works in this other department I’m interested in, he coaches pee wee soccer too, and I’ve done that, that’s something we have in common.
So maybe that’s an icebreaker, some way to start that conversation. Let’s see. And then don’t forget to go in there, look at the recon ng source code.
It’s really very consistently clean and solid to see how that stuff works so that you can learn how to do some of these on your own. Maybe there’s a different use case. like recon ng doesn’t have a give me a list of somebody’s tweets function, because that’s not its focus.
But maybe you want that. Maybe there’s something you could write based on what’s in there. Learning some of the cleanness of the code and how those things work, what modules work, that’s a great thing to do, I think.
And then write your own version and compare it to how the recon ng one works. If, you can write your own version first and then go back and compare it, that’s, I think the best way to do it, because then you solve the problems yourself.
And now you go back and you see how somebody else solved those same problems. So if you both did it the same way, that’s great for in one way, and if you did it differently, that’s, I think, even better, because now you’re going to see another way of approaching the same problem if you find something to improve.
And sometimes these things are stupid, simple things to improve. my first contribution to reconng was, I fixed a typo. There was some variable and that it was plural and it was failing in certain circumstances because in one place in the code that didn’t always get to it was singular.
So literally my first pull request for reconnng was, hey, add an s. So nothing is too small, nothing is too dumb. If it actually makes an improvement.
And then I have some resources for you to follow up on after this. there’s reconnaissance, obviously. Justin Seitz, he’s an automating osint. he’s got some classes. He’s got a lot of free, material available.
he’s really very good, very friendly, very good at what he does. very generous in what he shares. He also has those, the two, python books are from him. So if you’re familiar with those books, if you’ve learned from those, then you already know his style.
Michael Hoffman is at Osint ninja. he’s [email protected]. dot. He’s a sans instructor. I think he’s actually working on an osint course. that’s all I know about that.
But he also teaches the web, app pen testing course. also very friendly guy, very willing to help. If you have, some questions or pointers, take, a look at his stuff and maybe you can pick up some stuff from there.
And then that last one, the intelltechniques.com, is starting, to. I almost said cross a line. It’s starting to be a little bit different. Focus. This one is more focused on investigations, like law enforcement type stuff.
Like, we need to find where this guy is, type things. more focus on finding an individual and tracking an individual’s activity online. so not quite the same, but maybe useful.
And certainly lots of the techniques that are. That are shown there are great, there’s a book he’s got available. He’s, got a meta search engine in there that you can search. I can’t remember how many it is.
A dozen, two dozen different, engines for the same strings all at once. You put it in once on his site, he clicked the button, and again, his site doesn’t searching for you. So you’re insulating yourself a little bit from the targets of those things.
But just pick something to play with and make yourself do it in python, and see where it gets you. I think it’s a whole lot of fun. It really is a lot of fun.
That’s where I end. I have some time for questions or whatever.
John Strand
Yay, questions.
BB King
Thank you.
John Strand
that was a good tip on looking at prospective employers. Someone’s looking for a job.
Sierra
I didn’t check out the features, but what about PGP keys? As this is unique, intends to follow different, aliases.
John Strand
I think what we usually do. Correct me if I’m wrong, BB, but whenever you go to like keys dot MIt.net for PGP keys, you can have multiple email addresses associated with a single key, and that can lead you into additional, like, possible email addresses and profiles for a target.
BB King
Yes, that’s true. I believe that that’s one of the modules in reconnaissance. given an email address that will look it up on, keyservers at MIT and also in like, whois records on DNS, all kinds of stuff that way.
Yeah, yeah.
John Strand
Are we hiring at Bhis? I don’t know.
Sierra
Not really.
John Strand
Not really. people sometimes, they fall through the trees and, and, we pick them up as quickly as possible. The password count, I like the,
BB King
What was, what was.
John Strand
So, the total password count. I think we had seven. I think we had seven people ask about the password on the, off the right shoulder of Bb right here of summer 2017.
So, yeah, we had a couple of people that were interested in that. We did, get another good question. What made you move to Python m from Perl? I’m not even going to touch BB.
Take that one away.
BB King
Have fun. It was the proliferation of tools that are written in Python. I was very grudgingly moved to Python.
I was one of those religious plural people. And as soon as I saw that white space was significant in python, I went, that’s not for me. but I got over it because it’s really, really effective and there are so many tools and so many resources out there, that make it easy to learn and make it so valuable.
John Strand
Yeah, yeah.
BB King
I think it’s interesting.
John Strand
Hal Pomeranz has a great quote on it. He said, his biggest complaint, the reason why he likes Perl, is he can look at somebody’s and very quickly make a determination as to whether or not it’s crappy perl or good perl.
He said the problem with Python, because of the whitespace, the way indentations are held is all the code looks good, even though it might be really bad code. It’s formatted very, very clean. And, he says it makes it a lot harder to find out if somebody’s a bad programmer, if they use Python.
So it’s just an interesting perspective, too.
BB King
Wow, that’s the first time I’ve heard that Perlcode is, oh, no, Python’s clear. Yeah, yeah. Perl code is.
John Strand
It just looks more consistent regardless of how good the code is.
Sierra
Cool.
BB King
So Perl did kind of the same thing Python did. Around the time I switched over, they started the Perl six project, and so Perl six is a break with perl five to at least the same degree that python three is to python two.
So at some point, I was learning a new language anyway, so might as well use the one that, everybody else is using. Cool?
Deal.
John Strand
All right, so we’re going to wrap this up. Thank you very much for attending, everybody, and as always, we will post the video on our blog, and we will shoot that out here, in a little while. So thanks again, and we’ll see you guys on the next webcast.
Take care.