Discussion:
Screen Scraping, Spy++ & the GDI
(too old to reply)
Mark Everett
2004-01-12 21:11:08 UTC
Permalink
Hi all,

I am writing a program which needs to obtain data from various windows
applications.

I have had success with the first application using FindWindow and
GetWindowText as well as sending messages to listviews and listboxes
to retrieve their contents.

I used Spy++ to find the window class and then wrote a program to find
the elements handle and then communicate with it via window messages.
The problem is that with some of the applications all that Spy++ picks
up is the handle to the main window. A couple of children exist but
they do not appear to be anything familiar. Therefore I aren't sure
how to grab the data. One example is an item which looks like a
listview but all I can find as children to the window is a "web
window" class.

Also what about text which is drawn onto the window - I guess this is
done using the GDI. Is it actually possible to find out text content
of this kind of display? I take it once rendered this information is
not available. I am guessing it is the GDI or maybe a DirectDraw
surface as if it was a label it Spy++ would be able to find it would
it not?

If anyone has any ideas on how I can scrape these items then I would
appreciate it. The data is gathered from an internet based source on
all these applications but port sniffing seems to reveal although most
use HTTP the data is encrypted and I have no idea how to decrypt it -
I take it it's pretty much impossible unless your a hacking expert?

Thanks for your time
Mark
Bonj
2004-01-13 14:13:48 UTC
Permalink
Spy++ can only gain information on things that have an
HWND. A windowless ActiveX control, or something like a VB
label control, doesn't have an hwnd. Instead, it just
paints on to the hdc of its container. Therefore, Spy++
can have no concept that it is a unique entity.
-----Original Message-----
Hi all,
I am writing a program which needs to obtain data from
various windows
applications.
I have had success with the first application using
FindWindow and
GetWindowText as well as sending messages to listviews
and listboxes
to retrieve their contents.
I used Spy++ to find the window class and then wrote a
program to find
the elements handle and then communicate with it via
window messages.
The problem is that with some of the applications all
that Spy++ picks
up is the handle to the main window. A couple of
children exist but
they do not appear to be anything familiar. Therefore I
aren't sure
how to grab the data. One example is an item which looks
like a
listview but all I can find as children to the window is
a "web
window" class.
Also what about text which is drawn onto the window - I
guess this is
done using the GDI. Is it actually possible to find out
text content
of this kind of display? I take it once rendered this
information is
not available. I am guessing it is the GDI or maybe a
DirectDraw
surface as if it was a label it Spy++ would be able to
find it would
it not?
If anyone has any ideas on how I can scrape these items
then I would
appreciate it. The data is gathered from an internet
based source on
all these applications but port sniffing seems to reveal
although most
use HTTP the data is encrypted and I have no idea how to
decrypt it -
I take it it's pretty much impossible unless your a
hacking expert?
Thanks for your time
Mark
.
Mark Everett
2004-01-13 19:45:01 UTC
Permalink
Hi Bonj,

Thanks for the reply!

Right that's what I thought to be honest. If someone is using a label
control or DrawText directly or something similar then I take it there
is no way that I can scrape that information?

I originally thought a label would have it's own hwnd but thinking
about it there isn't really any need. Are there any other techniques
possible that I can use to gather this information?

Cheers
Mark
Post by Bonj
Spy++ can only gain information on things that have an
HWND. A windowless ActiveX control, or something like a VB
label control, doesn't have an hwnd. Instead, it just
paints on to the hdc of its container. Therefore, Spy++
can have no concept that it is a unique entity.
Previous message removed to save space!
Bonj
2004-01-16 23:00:15 UTC
Permalink
Post by Mark Everett
Right that's what I thought to be honest. If someone is using a label
control or DrawText directly or something similar then I take it there
is no way that I can scrape that information?
Not unless you wrote something to, a) take a snapshot of the window to an
image, and b) OCR the text off it - and you'd have to be pretty sick to be
able to pull that off :-)
Post by Mark Everett
I originally thought a label would have it's own hwnd but thinking
about it there isn't really any need. Are there any other techniques
possible that I can use to gather this information?
If there's absolutely NOTHING else that changes but the static text, and you
don't have control over the app by having written it yourself, then I
wouldn't have thought it possible.
Post by Mark Everett
Cheers
Mark
Post by Bonj
Spy++ can only gain information on things that have an
HWND. A windowless ActiveX control, or something like a VB
label control, doesn't have an hwnd. Instead, it just
paints on to the hdc of its container. Therefore, Spy++
can have no concept that it is a unique entity.
Previous message removed to save space!
Tim Robinson
2004-01-17 00:10:08 UTC
Permalink
Post by Bonj
Post by Mark Everett
Right that's what I thought to be honest. If someone is using a label
control or DrawText directly or something similar then I take it there
is no way that I can scrape that information?
Not unless you wrote something to, a) take a snapshot of the window to an
image, and b) OCR the text off it - and you'd have to be pretty sick to be
able to pull that off :-)
To have a guaranteed way of capturing most text output to the screen, you'd
need to hook GDI in some way. Either by using Detours to replace calls to
DrawText etc. with your own, or by writing a mirror display driver.
--
Tim Robinson (MVP, Windows SDK)
http://www.themobius.co.uk/
Loading...