malware hunting – hijacking through winsock

Unbenannt

Intro

propably everyone had experience with more or less sophisticated malware.
as usual removal is quite easy and straight forward. sysinternals are big helpers here – although in many cases even the plain old task manager helps to locate the unwanted stuff and remove it manually through registry and so on.

yesterday i had a much more curious case.
a customer reported issues with his pc – many apps are randomly crashing without any special pattern.
nothing special so far – i digged on the pc and checked the logs. just some application crashes from nvstream (seems to be related with nvidia’s driver and some kind of media streaming capabilities). nothing required on this machine. so i uninstalled the driver with all of it’s components and just reinstall the current video driver.

during uninstall i noticed only one curious thing: we’re using teamviewer for remote management of client machines (that’s not yet the curious part :-)).
when the machine came up the machine popped up online in my teamviewer contacts, shortly afterwards gone offline and online again. some kind of weird?

Getting curious

before i completed cleaning up some other parts of the machine i saw another app crash.
okay – something seems to be causing trouble in this system. as it happend randomly in almost any application a system component like a driver, filesystem may be related.

i went to sysinternals and got the latest version of process explorer. it turned on verify signatures and check virustotal (thanks @markrussinovich) to get some good overview of the most things running on the system.

here come the funny part: process explorer showed up that the server is unknown or cannot be resolved
(there has been some delay like 2 or 3 seconds until this timeout occured)

what? virustotal gone? no way.

i popped up the commandline and did a quick nslookup on virsutotal. no record found.
using nslookup with an explicitly set server – not working.
are you kidding me?

i tried to ping virustotal. working.
i opened IE and chrome to get to the virustotal. working.
are you really kidding me?

Digging more into it

okay – somebody doesn’t want to let the process explorer talk to virustotal.
simple tricks like renaming the application didn’t work – maybe some kind of hash or originator check.
(perhaps these bad guys have also implemented some kind of database lookup for blocked applications and/or protocols whatsoever)

next step? digging a little deeper. getting wireshark.

so wireshark is running with filter on dns. i get into the console, do a quick ipconfig /flushdns and set a ping on virustotal.com – there is a dns request sent.
restarting wireshark, same settings. popping up process explorer (after another flushdns) -> check virustotal. no dns request sent.

conclusion at this step: nice idea – didn’t see such a sophisticated interception yet.
but where is this thing sitting?

a quick search for rootkits didn’t showed up anything malicious so far.
using showed up common services and – it did even a check on virustotal. requests haven’t been blocked from there.
scrolling through i saw an unnamed service that pointed to a .sys file in the tempdir of the user. the file didn’t exist anymore. but yep – that has been the entry point.
much more interesting: another component had 2/57 hits – so not recognized that much.
much much more interesting: this component had been a winsock driver – even with an invalid signature.
and seriously who is programming a winsock driver but not signing it ? i won’t expect anything good behind this.

location had been:
C:\windows\system32\Gambali64.dll
But – the file hadn’t been there. No way to open or see the file a this location. Somebody is hiding.
Only a config file GambaliOff has been there – written in chinese.
Then i tried another path to this dll
C:\windows\sysnative\Gambali64.dll
And there has been the file – of course i took a copy to check the contents of this file later on.

long story short

hitting google with “gambali64.dll” showed up many entries with hijackthis logs where this file has been involved and nothing that indicated this file could be trustworthy.
so i chose to delete the driver (via autoruns) and rebootthe machine – everything works fine.

and here’s the funny part:
– teamviewer is coming once online after a reboot (no more short disconnect that may be related to a winsock integration)
– windows updates had issues before – working again
– before switching to ESET antivirus from kaspersky, kaspersky had issues contacting the update servers (while manually navigating the had been possible)

by the way:
date of the infection had been 17.04.2015 – 2 months ago. and still only two hits on virustotal. seems like some polymorphic code has been used here.

fyi:
i digged already into the contents of gambali – there’s an export for WSPStartup that is used when integrating into winsock as LSP (layered service provider).
perhaps i’ll check more into details as it’s quite interesting to see how it works. and even more perhaps i’ll write another post on this – but i can’t make a promise so far.

Posted in general

Exchange – POP / IMAP problems

Enabling POP and IMAP on exchange server may get you some trouble if services fail or setup is not correct at some point of time. One of the most painful errors is a connectivity problem where the corresponding service (POP or IMAP) does not bind the ssl socket correctly. you may establish the tcp-connection but not be able to perform the ssl handshake.

Using openssl to reproduce this shows the following:

 C:\OpenSSL>openssl.exe s_client -connect mail02.dc.nuvotex.de:995
 WARNING: can't open config file: C:\OpenSSL\bin\openssl.cfg
 Loading 'screen' into random state - done
 CONNECTED(0000017C)
 16360:error:140790E5:SSL routines:ssl23_write:ssl handshake failure:.\ssl\s23_lib.c:177:
 ---
 no peer certificate available
 ---
 No client certificate CA names sent
 ---
 SSL handshake has read 0 bytes and written 361 bytes
 ---
 New, (NONE), Cipher is (NONE)
 Secure Renegotiation IS NOT supported
 Compression: NONE
 Expansion: NONE
 No ALPN negotiated
 ---

You may reassign, delete, reassign, burn-down, reset, dowhatever with the certificate and related assignment. Using a wildcard-certificate may seem like it’s related to the certificate.
You may also check access to the certificate’s private key or start the exchange services from the command line in debug mode. none of all leads to the solution.

Connect from localhost

So, when trying to analyse the problem i did – just for curiosity – the test and connect from localhost (via openssl) to the pop service. And hell yeah, there’s the certificate:

The command for this request is: openssl.exe s_client -connect localhost:995

It’s very important to use localhost (or 127.0.0.1 or it’s ipv6 equivalent) – fact is:
Using the hostname (or one of the IPs of the NICs) won’t work and reproduces the issue from above.

Tell me – what’s up with this

The reason for this behaviour seems to be related to exchange health monitoring that may disable components of the exchange server if these failed once.
So, open the EMS and do a Get-ServerComponentState

[PS] C:\Windows\system32>Get-ServerComponentState -Identity MAIL02
 Server Component State
 ------ --------- -----
 [...]
 MAIL02.dc.nuvotex.de ImapProxy Active
 MAIL02.dc.nuvotex.de OabProxy Active
 MAIL02.dc.nuvotex.de OwaProxy Active
 MAIL02.dc.nuvotex.de PopProxy Inactive
 [...]

As you can see: The PopProxy-Component has been disabled and therefore the server refuses “external” connections.
To reset the state or just reenable the component use the following command:

[PS] C:\Windows\system32>Set-ServerComponentState -Identity MAIL02 -Component PopProxy -Requester HealthAPI -State Active

And whoop-whooop, run boy run.

Important notice:
The solution for the components has not been my own idea. Most of the time (99%) has been to figure out that the service has this strange behaviour and to be so annoyed to test from localhost

Source for the solution: https://social.technet.microsoft.com/Forums/exchange/en-US/5f1a2cee-19ad-43e6-b281-bb7f094d8c09/pop-works-via-localhost-but-not-from-other-networked-machines?forum=exchangesvrclients

Posted in general

windows storagepools and half failing hardware

working in the digital world is quite interesting sometimes.
a customer had an interesting issue raising on 23th of december. the server has been quite stuck – almost.

the server is a hypervisors and hosts just a few virtual machines.
the operating system (hypervisor itself) has been stable and working properly (operating on a simple raid 1 with two hdds).
for the virtual machines store we’ve been using a storage pool with tiered storage (2x hdd, 2x ssd). the constellation and resulting performance is quite awesome, aside the fact that quite every modern storage system is greedy for disks disks and even more disks. but this is just a side note.

in this case: acessing the virtual disk on the storage system has had a resulting transfer-rate of around 500kb/s – which is around 0,1% of the normal speed (in normal operation we’re shoveling 300mb/s without reaching the limits). i saw in the windows event logs that the storage subsystem tried to issue reset commands to the hba (in this case an lsi) and reported problems accessing the drives. so i focused on the possibility of a failed hba (which is quite strange as HBAs and RAID controllers are really that part of components that don’t fail quite often).

with the help of the customer as remote hands for some simple testing we couldn’t find the solution. so i’ve had to get myself on-site… traveling a few hundred kilometers later i’ve changed the hba as first try – without any improvement. [which is not really unexpected as hba’s tend to get accused of failure while beeing working all the time :-)]

digging around i plugged the drives to the onboard s-ata ports – no improvement.
so i decided to go some more hardline: plug out one drive a time and check how the system itself behaves.
when plugging the first drive out the transfer rates increased to normal behaviour.
quite nice on first try :-)

so – i swapped the drive (an ssd drive) with an hdd, did some magic on the command line to convert the hdd to an ssd (temporarily) and rebuild the virtual disk [i like systems that are ready for the next component to fail – in this case: using the hdd [10k rpm] as ssd is imho better than violating the redundancy].

but – what’s been the failure?
thinking about it the failure is quite obvious. the drive has not been really behaving like a digital device SHOULD behave.
it’s been half failing. so the drive reported to the operating system, all states have been fine and so on. but when accessing data on this drive it got slow. it got REALLY slow. but it hasn’t been slow enough to fail and to get the operating system to fail the drive and do proper failure reporting.

in one point very interesting to see how important it is for systems to fail properly and don’t try to live as long as possible.
it’s like programming. if you’re state is not well defined: go to hell and fail. don’t try to deliver a service if you cannot to this with all your power.
on the other side annoying as a long day traveling through snow and digging around with people afraid of driving on snowy roads is also quite frustrating :-)

Posted in server Tagged with: , , , , ,

[server] vm stuck at startup (starting 10%)

Today occured a quite interesting error:
I tried to power up a virtual machine but the vm stuck at starting (10%).

stuck

The early starting phase checks resource dependencies, gains required handles etc.

Good that i’m not the first one who experienced this (concerning the resolution) obvious error.

Source: http://blogs.technet.com/b/mspfe/archive/2013/03/18/hyper-v-virtual-machines-freeze-at-10-when-starting-up.aspx

Hopefully not something you will ever have to deal with, but as a series of general rules to live by:

1) Failure of a network resource that hosts Hyper-V components may prevent virtual machines from starting

2) Always remember to unmount ISO files from VMs when finished with them

In this case:
An iso was mounted to the virtual machine (or better: should be). Unfortunately the path pointed to an outdated smb share (the filer is still online, just the share moved).

Solution in this case:
Remove the ISO (to stop the vm without reboot: see the other post about killing the vm – especially useful if you want to avoid to reboot your whole cluster :-)) and try again.

Posted in server

[radius] authentication rejected from windows radius server

If it happens that your radius server rejects client authentications with this message:

radius

Security-Auditing 6273
[…]
Ursache: Authentifizierungsfehler aufgrund der Nichtübereinstimmung von Benutzeranmeldeinformationen. Der angegebene Benutzername ist keinem vorhandenen Benutzerkonto zugeordnet, oder das Kennwort war falsch.

 

You’d propably expect the typical dump-user problem :-)
If this error occurs on all user’s and you didn’t really change anything: Check your pki certs.

Propably the certificate of your radius is ouf of date (or just invalid because of any other reason).
The result is that authentication cannot be performed anymore. Unfortanely you don’t see this from the server’s event logs.

Solution: Renew your certificate and login’s will be accepted again.

 

 

 

Posted in server

[server] kill stuck hyper-v virtual machine in cluster

when a virtual machine hangs, you can just kill the process using the task manager.

To locate the matching process in the task manager (hosting process is vmwp.exe) you can check the command-line args or the user account.

the user account matches the ID of the virtual machine.

kill

 

Fun fact of the day:

when the virtual machines runs in a windows cluster, the cluster detects the “crash” of the virtual machine and restarts it. when several (default: 2) errors occur, the next host is chosen to host the vm. so you can have a good time an follow the machine on the hosts :-)

Solution:

Disable the role in the cluster manager and kill the VM.

killCluster

 

Damn robust processes.

Posted in server

[server] scale-out-fileserver memory usage

today we were wondering why our new fileservers are consuming quite much memory (up to 95% of available memory).
the fileservers are windows scale-out fileservers and configured to use 75% of available memory – in our case: 24G

Testing with failover showed that moving virtual disk to another node showed, that the source node cleared most of the memory immediately in this step.

using rammap i saw that most of the related usage has been taken from nonpaged memory (which is normally allocated if someone wants to ensure that accessing the data is really fast).

rammap

i hit on google and checked for more information to ensure my assumption is correct:
http://blogs.msdn.com/b/clustering/archive/2013/07/19/10286676.aspx

You can allocate up to 20% with Windows Server 2012 and 80% with Windows Server 2012 R2 of the total physical RAM for CSV write-through cache, which will be consumed from non-paged pool memory. Note: It is recommended not to exceed allocating 64 GB

Okay, looks good, looks quite intended.
Hitting the perfmon confirmed the assumption furthermore:
perfmon

End of story:
Don’t get worried if caching does exactly what you would expect it to do – using uncontented resources as much as possible!

Posted in general

Microsoft.Exchange.Security.Authentication.TokenMungingException

X-OWA-Error: Microsoft.Exchange.Security.Authentication.TokenMungingException
X-OWA-Version: 15.0.913.21
X-FEServer: NTX-MAIL02
X-BEServer: NTX-MAIL01
Date: 24.09.2014 13:08:57


Dieser schöner Fehler kommt wenn man versucht auf die OWA zuzugreifen.
Vorgeschichte: Es war Anfangs nur ein Gruppenpostfach das keinen aktiven Benutzer hinterlegt hatte. Nun wird der Account aber direkt benötigt:
Passwort gesetzt -> Benutzer aktiviert -> OWA freigeschalten
Outlook über Gateway funktioniert tadellos -> OWA geht es maximal auf die ECP des Benutzers bei der owa kommts zu diesem Fehler:
                TokenMungingException
Über die EMS:
Get-User -Identity “user” | FL LinkedMasterAccount

ihr ist NT Authority\Self hinterlegt den Wert auf NULL setzen und schon geht auch die Owa vollständig:
Set-User -Identity "user" -LinkedMasterAccount $null






Posted in general

[server] video playback on a server 2008 r2

new day new challenge. okay this was an easy one:

we’ve a customer which needs to load video files into a software. the software runs on the software and tries to generate a preview of the files / retrieve some information.
clicking on the video popped up an error. quite obviously the error was related to the missing video support on the server.

#step 1:
install the desktop experience feature -> media player is available

fine so far. but now? playing the movies didn’t work. i tested VLC, worked. okay, seems to be related to some codec stuff as the rendering pipeline seems to be working.
i tried to play an mpeg4 (in media player) – worked.

got it?

 

#step 2:
install an mpeg2 codec. so that’s tricky. you need to get an mpeg2 codec in order for media player to playback mpeg2 video.
you can try codec packs or Stinky’s MPEG-2 Codec.

 

that’s it. media player shows the video and therefore the application (which utililzes the win api) can import the movies also.

Posted in general

[Server] creating scheduled tasks with GPO

i needed to create a scheduled task on our domain to control energy profiles on a bunch of machines. the idea behind is, to lower energy consumption to a minimum at night and get the hell best performance at day. so – creating a scheduled task via gpo is quiet easy (in this case the task is for the computer): Computerconfiguration => Settings => Control panel => Scheduled Tasks I created the tasks and used the “select user / group” button to select the account which will be used for execution gpowrong Unfortunately the task didn’t appear on the desired hosts. But why? Searching the web about this topic doesn’t provide that much information – but i’ve been lucky: http://social.technet.microsoft.com/Forums/en-US/60d638eb-818b-490a-a9aa-a07f4677dbed/create-scheduled-task-on-server-2012r2-with-gpo?forum=winserverGP I figured out this issue in my case. I’m not sure exactly what fixed it – but I believe it’s because I was setting the runas to BUILTIN\SYSTEM and I should have just left it as default “run only when user was logged in” – as that defaults to the system account. So instead I typed in simply “SYSTEM”. I also created a totally new GPO using a 2012R2 server (instead of creating it from my usually routine which is to use my desktop – Windows 7). Not sure if that made a difference. So – all i needed to do is: Change the username vom BUILTIN\SYSTEM to SYSTEM and there it is.

gpocorrect task

Posted in general