Notice: Home alone tonight?
Topic: Spam scrambler
+FuckAlms !vX8K53rFBI — 13.6 years ago #18,066
I'm trying to make a function that takes urls that are formatted incorrectly (a href) and turns it into a mailto address using the given domain and dropping any additional address information.
so
a href=http://www.gofuckashitupyourar.se/boners.php
becomes
mailto:abuse@gofuckashitupyourar.se
for reference:
#5 prevent botspam html links '/a href="/', #6 Linkify URLs. '@\b(?<!\[)(https?|ftp)://(www\.)?([A-Z0-9.-]+)(/)?([A-Z0-9/&#+%~=_|?.,!:;-]*[A-Z0-9/&#+%=~_|])?@i', #7 Linkify text in the form of <a href="http://example.org" title="http://example.org" rel="nofollow">text</a>. '@\[(https?|ftps?|mailto)://([A-Z0-9/&#+%~=_|?.,!:;-]+) (.+?)\]@i',
corresponds to:
'<a href="mailto:admin@">spam</a>', #5 '<a href="$0" rel="nofollow">$1://$2$3$4$5</a>', #6 '<a href="$1://$2" title="$1://$2" rel="nofollow">$3</a>', #7
+Anonymous B — 13.6 years ago, 22 minutes later[T] [B] #220,868

Let me know if you need a hand
+Sarcasm.zip !36Q3VF.UPM — 13.6 years ago, 1 minute later, 23 minutes after the original post[T] [B] #220,869
Clever.
+Anonymous D — 13.6 years ago, 5 hours later, 6 hours after the original post[T] [B] #220,945
+Anonymous E — 13.6 years ago, 3 days later, 3 days after the original post[T] [B] #223,334
Depends on your langugage/engine, but here's one using two substitutions with GNU sed in ERE mode.
echo
http://www.gofuckashitupyourar.se/boners.php | sed -r -n -e 's#(https?|ftp)://([^/]*).*#\2#;s/.*\.([^\.]*\.[^\.]*$)/\1/p;'
And the same thing in a slightly less robust single subtitution:
echo
http://www.gofuckashitupyourar.se/boners.php | sed -r -n -e 's#(https?|ftp)://.*\.([^\./]*\.[^\./]*).*#\2#p'
·FuckAlms !vX8K53rFBI (OP) — 13.6 years ago, 7 hours later, 4 days after the original post[T] [B] #223,451
@previous (E)
The languaage is php
+Anonymous F — 13.6 years ago, 13 minutes later, 4 days after the original post[T] [B] #223,454
Ask Syntax, he claims to have invented php
+Anonymous G — 13.6 years ago, 6 minutes later, 4 days after the original post[T] [B] #223,458
@previous (F)
You obviously haven't been paying attention, Syntax invented the computer whilst out free climbing the entire grand canyon.
·FuckAlms !vX8K53rFBI (OP) — 13.6 years ago, 5 minutes later, 4 days after the original post[T] [B] #223,464
@223,454 (F)
@previous (G)
ok, that's enough. this thread isn't about "Syntax" or his life. this thread is about foiling the infernal cuntbots I have to deal with on OTK.
·Anonymous F — 13.6 years ago, 12 minutes later, 4 days after the original post[T] [B] #223,471
@223,458 (G)
I'm afraid that it's you who hasn't been paying attention. Syntax created the grand canyon back when he was personally advising the Earth on how to form.
+Anonymous H — 13.6 years ago, 13 minutes later, 4 days after the original post[T] [B] #223,480
lol regex
·Anonymous G — 13.6 years ago, 34 minutes later, 4 days after the original post[T] [B] #223,502
@223,464 (FuckAlms !vX8K53rFBI)
What's OTK?
·Anonymous E — 13.6 years ago, 10 minutes later, 4 days after the original post[T] [B] #223,508
PHP version:
<?php
$url = 'http://www.gofuckashitupyourar.se/boners.php';
$mailto = preg_replace('#(https?|ftp)://.*\.([^\./]*\.[^\./]*).*#', "mailto:abuse@$2", $url);
echo $mailto;
echo "\n";
?>
But I don't have an interpreter on this box to test, so YMMV.
·Anonymous G — 13.6 years ago, 1 minute later, 4 days after the original post[T] [B] #223,509
@previous (E)
What's YMMV?
·Anonymous E — 13.6 years ago, 29 minutes later, 4 days after the original post[T] [B] #223,528
@previous (G)
> What's YMMV?
It's an acronym. It expands to "Ask google and stop wasting other people's time."
·Anonymous G — 13.6 years ago, 2 minutes later, 4 days after the original post[T] [B] #223,532
@previous (E)
What's <?php?
·FuckAlms !vX8K53rFBI (OP) — 13.6 years ago, 23 hours later, 5 days after the original post[T] [B] #224,097
@223,508 (E)
Alright, but it doesn't take into account that I can only identify an offending url by the fact that it is input as full html rather than just the url. Namely, the
<a href=
Also, the parsing is done through an array, so things will need to be structured a bit differently:
$markup = array ( # foil bots dumping raw html by breaking links '/a href=#(https?|ftp)://.*\.([^\./]*\.[^\./]*).*#');function parse($text) { global $markup; static $html; $text = htmlspecialchars($text); $text = str_replace("\r", '', $text); if(!$html){ $html = array ( '<a href="mailto:abuse@$2", $url>spam</a>');
·Anonymous E — 13.6 years ago, 22 hours later, 6 days after the original post[T] [B] #224,876
@previous (FuckAlms !vX8K53rFBI)
I don't think you understand how this works. The first character of the pattern, #, sets up the delimiter. I chose not to use the usual / because / would be found in the pattern. The "standard" fallbacks are # and |, but I tend to prefer #.
Also, using global is awful and should be avoided.
Also, declaring a static local variable inside a function is horrible and the fact that PHP lets you get away with this is no excuse. Just don't.
Other than that I can't figure out what your sample code is supposed to be doing. A larger context would be useful.
Or, just stop using PHP. It's a horrible language that should not be used by anyone for any reason.
·FuckAlms !vX8K53rFBI (OP) — 13.6 years ago, 1 day later, 1 week after the original post[T] [B] #225,812
@previous (E)
Larger context? TC says it's too much context, so have a pastebin:
http://pastebin.com/s4ZCcNYV
The stuff I'm trying to add to is at line 550.
(Edited 1 minute later.)
·Anonymous E — 13.5 years ago, 2 days later, 1 week after the original post[T] [B] #228,713
@previous (FuckAlms !vX8K53rFBI)
After line 607 insert the following line:
'#(https?|ftp)://.*\.([^\./]*\.[^\./]*).*#',
After line 633, which is now 634 due to the above inserted line, insert the following line:
'mailto:abuse@$2',
Note that this regex will pick up things which are not inside anchor tags, which you may not want. If this is a problem it could be revised.
+Anonymous I — 13.5 years ago, 11 hours later, 1 week after the original post[T] [B] #229,214
@previous (E)
If you want it to only slurp anchors of a very specific form make the pattern this:
'#(<a href=["
\]?)(https?|ftp)://.*\.([^\./]*\.[^\./]*)[^">'\'']*#'
And make the replacement this:
'$1mailto:abuse@$3'
In all probability you will want it to work on any anchor, not just ones which look exactly like <a href=, so...
'#(<a.*href=["\']?)(https?|ftp)://.*\.([^\./]*\.[^\./]*)[^">\' ]*#'
Which will allow (and preserve) other attributes before and after the href attribute.
·Anonymous G — 13.5 years ago, 5 minutes later, 1 week after the original post[T] [B] #229,215
'mailto:abuse@$2', vs '$1mailto:abuse@$3'
Makes perfect sense really. You do the math
·Anonymous I — 13.5 years ago, 9 hours later, 1 week after the original post[T] [B] #229,391
@previous (G)
Not sure if serious or just confused.
Since I added a capture pattern for the beginning of the anchor (and whatever non-href attributes preceded href) it is necessary to expand that capture in the substitution. Since I added a capture what was formerly the 2nd capture becomes the 3rd.
The only unknown, to me, is whether PHP's parser is smart enough to realize that $1mailto is $1 followed by mailto and not an identifier by itself. Typically identifiers may not start with numbers, but PHP does a lot of things that are not typical (or wise). In addition the $1 inside a PHP substitution stanza is sort-of magic in that it's not a variable an will expand even when interpolation is disabled, so I'm just guessing that it will behave as it should.
Did I mention that testing is only for uncool, ugly people?
·Anonymous G — 13.5 years ago, 15 minutes later, 1 week after the original post[T] [B] #229,401
@previous (I)
Infernal cuntbots will expand even when interpolation is disabled.
·Anonymous I — 13.5 years ago, 2 hours later, 1 week after the original post[T] [B] #229,551
Okay, so I had to actually put it in a PHP script to make it work and it now relies on one PCRE feature so it's not portable, but....
'#(<a.*href=["\']?)(https?|ftp)://([^/ ">\']*?)\.?([a-zA-Z0-9-]+\.[a-zA-Z0-9-]+\.?)([/ ">\'][^ ">\']*)#'
'$1mailto:abuse@$4'
Test cases:
<a href=
http://one.two>http://one.two</a> [P]
<a href=
http://zero.one.two>http://zero.one.two</a> [P]
<a href=
http://one.two/relative/path>http://one.two/relative/path</a> [P]
<a href=
http://zero.one.two/relative/path>http://zero.one.two/relative/path</a> [P]
[<a href=
http://one.two/relative/path]" rel="nofollow">http://one.two/relative/path">http://one.two/relative/path</a></a> [P]
[<a href=
http://zero.one.two/relative/path]" rel="nofollow">http://zero.one.two/relative/path">http://zero.one.two/relative/path</a></a> [P]
<a class="something" href="http://one.two/relative/path" rel="nofollow">http://one.two/relative/path">http://one.two/relative/path</a></a> [P]
<a class="something" href="http://zero.one.two/relative/path" rel="nofollow">http://zero.one.two/relative/path">http://zero.one.two/relative/path</a></a> [P]
<a class="something" href="
http://one.two/relative/path" target="_blank">
http://one.two/relative/path</a> [P]
<a class="something" href="
http://zero.one.two/relative/path" target="_blank">
http://zero.one.two/relative/path</a> [P]
<a href='
http://one.two/relative/path'>http://one.two/relative/path</a> [P]
<a href='
http://zero.one.two/relative/path'>http://zero.one.two/relative/path</a> [P]
<a href='http://one.two/relative/path' target="_blank">
http://one.two/relative/path</a> [P]
<a href='http://zero.one.two/relative/path' target="_blank">
http://zero.one.two/relative/path</a> [P]
<a href=http://one.two/relative/path target="_blank">
http://one.two/relative/path</a> [P]
<a href=http://zero.one.two/relative/path target="_blank">
http://zero.one.two/relative/path</a> [P]
<a class="something" href=
http://one.two/relative/path>http://one.two/relative/path</a> [P]
<a class="something" href=
http://zero.one.two/relative/path>http://zero.one.two/relative/path</a> [P]
·Anonymous G — 13.5 years ago, 8 minutes later, 1 week after the original post[T] [B] #229,554
@previous (I)
> Did I mention that testing is only for uncool, ugly people?·Anonymous I — 13.5 years ago, 18 seconds later, 1 week after the original post[T] [B] #229,555
Since I had to go through the trouble of making a test case to get this fixed, here it is in all its glory
http://pastebin.com/WAd1FVZV
Oooh-aahh.
·Anonymous I — 13.5 years ago, 35 seconds later, 1 week after the original post[T] [B] #229,556
@229,554 (G)
> >Did I mention that testing is only for uncool, ugly people?
Anon has spotted the joke: I am BOTH of those.
(Edited 1 minute later.)
·Anonymous G — 13.5 years ago, 5 minutes later, 1 week after the original post[T] [B] #229,563
@previous (I)
Well, that works. But does it require a proto part?
·Anonymous I — 13.5 years ago, 24 minutes later, 1 week after the original post[T] [B] #229,598
@previous (G)
> Well, that works. But does it require a proto part?
The pastebin's version does not.
·Anonymous G — 13.5 years ago, 1 hour later, 1 week after the original post[T] [B] #229,647
@previous (I)
I feel like I've learned something, but I'm not sure what it is. Thanks!
·Anonymous I — 13.5 years ago, 3 days later, 2 weeks after the original post[T] [B] #232,340
Bug:
URLs that are domains only that are unquoted that are not followed by any attributes or whitespace before the close tag (>) will fail to match properly and all text up to the next > after the open anchor tag will be improperly replaced.
The fix is this very-slightly-altered regex:
'#(<a.*href=["\']?)((https?|ftp)://)?([^/ ">\']*?)\.?([a-zA-Z0-9-]+\.[a-zA-Z0-9-]+\.?)([/ "\']?[^ ">\']*)#',
Which has been minimally tested and appears not to break any other cases.
·Anonymous G — 13.5 years ago, 14 minutes later, 2 weeks after the original post[T] [B] #232,356
@previous (I)
Bugs?
> More testing needed·Anonymous I — 13.5 years ago, 34 minutes later, 2 weeks after the original post[T] [B] #232,398
@previous (G)
> Bugs?
> >More testing needed
Release early, release often and let the users test for you. Maximum efficiency.
+Anonymous J — 13.5 years ago, 5 hours later, 2 weeks after the original post[T] [B] #232,643
@previous (I)
Because users are exceedingly proficient at breaking things.
·FuckAlms !vX8K53rFBI (OP) — 13.5 years ago, 1 day later, 2 weeks after the original post[T] [B] #233,826
now it's just doing the "abuse@" part and nothing else for the hyperlink
http://otakutalk.org/topic/1253Start a new topic to continue this conversation.
Or browse the latest topics.