All posts by dmitryc

Using Machine Learning To Detect Anomalies

I’m going to start blogging more about detection of protocol/app anomalies, detection of lateral movement and/or data exfiltration, and more. For many years I have been watching users and applications furrow their way across networks and I’m gonna start data-dumping that info here 🙂

But…first…I manage a web server for a friend. It occurred to me that machine-learning could be useful in alerting when an attack is under way. I took the following steps

1) Get as much data as possible for this device. For Apache, this just meant gathering all the log files.

2) Parse the data and, for each session, look at the path taken as the user or bot perused the server (Note: outside of my initial scope, but timestamps are useful here to weed out a user versus a machine).

3) So, an average session will look like R1->R2->R3->RX where each “R” is a request. So R1 could be index.html, R2 could be “Contact Us”, R3 could be “contact_form.php”, etc. I started using Markov to build a model; however, instead, I took each set of 2 and initialized those values…e.g. S={R1->R2,R2->R3,R3->RX}. For the next session I might have S={R1->R5,R5->R3,etc.}. At the end of all the parsing, I have a big set of all state transitions possible for each R. So, given RX, there are a finite number of R states that RX can transition to.

4) For each of the R states, I now re-parse the log file and find the number of transitions. This is a matrix that shows the number of observed transitions from RN to every other R state. So, for instance, let’s say that R1 goes to 3 possible states : R4 (27% of time), R11 (3% of time) , and R12 (70% of time). Then the R1 row of our matrix looks like [0, 0, 0, .27, 0, 0, 0, 0, 0, 0, .03, .7]

5) There were some special cases that I had to account for (any page transitioning to the main page, any page transitioning to itself, etc.). Once I accounted for these, I ran my program against the log files and created LOW, MEDIUM, and HIGH alerts. I didn’t use a true standard deviation and I ignored the LOW and MEDIUM stuff…I just wanted the hits where the number for that transition was extremely low or 0. From our example above, this would be a transition like R1->R2=0. I didn’t really expect great results and figured that I would have to do a lot more tweaking…well, this wasn’t the case. I actually got really, really good data on my first run. Example:

732 total state transitions tracked
HIGH RISK GET /componentes3.7/fckeditor/editor/fckeditor.html->GET /affiliate/affiliate53/fckeditor/editor/fckeditor.html

HIGH RISK GET /portfolio/aui/FCKeditor/editor/fckeditor.html->GET /componentes3.7/fckeditor/editor/fckeditor.html

HIGH RISK GET /wp-content/uploads/wpfouot.php->POST /wp-content/plugins/Login-wall-etgFB/login_wall.php

etc.

So, I can use really basic machine learning to find my attackers in my web logs. I then parse out the attackers’ IP addresses and can throw them into a firewall ruleset. In the future, I would like to automate this and find when my server is under attack, send a message to my firewall which drops in a route rule which spins all of the attackers traffic to my honey net 🙂

Speaking of honeypots, You can also honeypot certain pages. For instance, I could create bogus files or directories based on what I see attackers going after (like the report from above) and drop canary tokens in there to (see Canary Tools). I can embed honeypot links within HTML comments and see where bots (or humans) are taking links from commented code and trying them out. I can put links in my robots.txt file and see who goes after them…there are so many ways to do this…and, at the end of the day, I can either run these attackers off my network or into a fake network…it’s just TONS and TONS of fun 🙂

!Dmitry
dmitry.chan@gmail.com

Oracle CSO is right

The internet (or at least twitter) is exploding regarding this, now deleted, post : Mary Ann Davidson blog post

Let me start by saying that she is right. Yes, she’s right. Breaking the EULA is against the law. You can’t argue about that.

You can’t argue that they should be paying a bug bounty. You may *want* them to pay a bug bounty, but that is the companies decision. If they choose not to pay a bug bounty, that’s their prerogative.

As a consumer, you can choose to use their product (EULA and all) or not. That is something that you have control over.

As a researcher, you can choose to break the EULA or not. Arguing that someone should modify their EULA so that what you’re doing isn’t a violation is childish.

I wish Oracle had stood by their CSO and left the blog online. I understand that they don’t want additional scrutiny on their product, but the scrutiny will be there irregardless (as it has been for many years now). Leaving the post online would have shown some ‘backbone’. If INFOSEC goes PC, it’s bad for us all. I’d rather someone tell me what they really think and we can go from there.

!Dmitry
dmitry.chan@gmail.com

Play some D!

Hi there. Long-time-no-blog 🙂

If you haven’t already, go read this: https://t.co/d2hwhmzzuz

Note: this blog applies to Corporate networks. If you’re a coffee shop or a college, you’re on your own 🙂

I’ve been a network defender for many years. I currently work for a software company that builds network software which helps companies gain insight into how their network is being used and/or abused. I didn’t choose to go into network defense – it chose me. In 1997 at my first “real job” out of college, I was a part of a team that tracked down some hackers that were running around owning a bunch of Solaris servers. From that day, I was hooked.

Network defenders don’t get a lot of credit. If you do your job right, no one ever talks about it. If you do your job wrong, you’ll hear about it every day for the rest or your short-lived career. An attacker can be wrong a million times and only needs to be right once. That’s an advantage. An attacker can spend 2 years in the bowels of one software app. A defender cannot. Accept this fact and move on…we can still win. The attacker has to use your network whilst evading detection. A lot of them don’t spend a lot of time figuring out how to do this right. They don’t have to be stealthy about exfiltrating data because it hasn’t mattered – the defense has been weak. How many recent infections used the darknet as a C&C?…ummm, your network monitoring solution should be SCREAMING AT YOU if someone connects out via Tor or i2p.

The network is like a bodies immune system (though not nearly as complex). The job, if you’re up to it, is to be the immune system. You can’t stop all infections from getting in. In fact, it can be argued that infections must get in to build the immune system. Firewalls and other devices can block things that we have knowledge of; however, something that we haven’t previously encountered will eventually get in (maybe via email, hacked USB drive, 0-day, whatever). Our job is to detect the foreign body, eradicate it, and update the immune system such that that strain of virus can no longer get it. So, how can you do this?

1) know what is “normal” for each host on your network. What ports do they offer? What ports do they connect to? What do their traffic patterns look like for each port? Who do they talk to? Who talks to them? what network protocols do they speak? How long do sessions stay nailed up? If you know this sort of stuff, then an attacker exfiltrating a gig of data cannot be hidden…it’ll stick out like a clown at an IBM business meeting.

2) Method 1 will detect lateral movement, but if you employ dead space within your network, you can flag lateral movement with just a single packet. Use honeynets, host-based IDS, traffic analysis (why is engineering dept trying to talk to HR?), etc. Spray your databases with bogus data that should never be accessed. Put up fake file servers and watch for access or watermark the files and watch them if they move around the network. Be creative…make your network a hostile environment for those who would attack it. The locals know how to get around, the attacker will have to figure out how to move around the network. Make this a painful process for him/her.

3) Look for invalid use of standard ports. Have you ever seen Skype find an “out door” on a network…What about vpn, i2p, p2p, Tor,etc.? Sending outbound traffic over well known ports is very, very common on most networks I have monitored. For each outbound port allowed through your firewall, you should flag on anomalous traffic over that port. What is anomalous? If the port is 80, only valid HTTP should flow over that port. If the port is 443, only TLS/SSL should flow over that port. Find the people tunneling data or sessions out of your network and you have a short list of the folks to keep an eye on.

4) Let the users know that you are watching. If Mabel from Accounting comes in on Monday morning and uploads 2 gig of baby pictures to dropbox, you should go have a chat with her. Get the word out. User education is often overlooked…millions is spent on nifty software but you don’t even have a full time employee working on user education. Sad.

There’s a lot more that I could write, but network defense isn’t a “cookie cutter” operation. Each admin will have to be creative and come up with their own maze for the attackers to run. Good luck out there!

!Dmitry
dmitry.chan@gmail.com