Tuesday, December 14, 2010

Pull the alt / title tag from xkcd picture from command line

How to pull the alt (actually the title tag), from xkcd with the command line

[root@queerlikerice ~]# curl xkcd.com | grep -E -oh 'title=".* alt=' | grep 'title' | grep -E -oh '".*?"'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 8375 100 8375 0 0 137k 0 --:--:-- --:--:-- --:--:-- 538k
"And if you labeled your axes, I could tell you exactly how MUCH better."



Bash cli:
curl xkcd.com | grep -E -oh 'title=".* alt=' | grep 'title' | grep -E -oh '".*?"'

2 comments:

oylenshpeegul said...

Nice! I just spotted this on twitter (thanks, @climagic). You can ask curl not to do that progress meter with -s.

Today, the alt text has HTML entities in it. You could use Perl to convert those

perl -MHTML::Entities -MLWP::Simple -E 'say decode_entities $+{t} if $_ = get "http://xkcd.com" and /title="(?[^"]+)" alt=/'

Alternatively, you could still use curl for the download

curl -s xkcd.com | perl -MHTML::Entities -nE 'say decode_entities $+{t} if /title="(?[^"]+)" alt=/'

oylenshpeegul said...

Ack! HTML ate my regular expression. That capture named t is (?<t>...), so

perl -MHTML::Entities -MLWP::Simple -E 'say decode_entities $+{t} if $_ = get "http://xkcd.com" and /title="(?<t>[^"]+)" alt=/'

and

curl -s xkcd.com | perl -MHTML::Entities -nE 'say decode_entities $+{t} if /title="(?<t>[^"]+)" alt=/'

Sorry about that.