Notice: Welcome to TinyChan, an account has automatically been created and assigned to you, you don't have to register or log in to use the board, but don't clear your cookies unless you have set a memorable name and password. Alternatively, you can restore your ID. The use of this site requires cookies to be enabled; please cease browsing this site if you don't consent.

TinyChan

Topic: Spam scrambler

+FuckAlms !vX8K53rFBI13.6 years ago #18,066

I'm trying to make a function that takes urls that are formatted incorrectly (a href) and turns it into a mailto address using the given domain and dropping any additional address information.
so
a href=http://www.gofuckashitupyourar.se/boners.php

becomes
mailto:abuse@gofuckashitupyourar.se


for reference:
  #5 prevent botspam html links  '/a href="/',  #6 Linkify URLs.  '@\b(?<!\[)(https?|ftp)://(www\.)?([A-Z0-9.-]+)(/)?([A-Z0-9/&#+%~=_|?.,!:;-]*[A-Z0-9/&#+%=~_|])?@i',  #7 Linkify text in the form of <a href="http://example.org" title="http://example.org" rel="nofollow">text</a>.  '@\[(https?|ftps?|mailto)://([A-Z0-9/&#+%~=_|?.,!:;-]+) (.+?)\]@i',

corresponds to:
    '<a href="mailto:admin@">spam</a>', #5    '<a href="$0" rel="nofollow">$1://$2$3$4$5</a>', #6    '<a href="$1://$2" title="$1://$2" rel="nofollow">$3</a>', #7

+Anonymous B13.6 years ago, 22 minutes later[T] [B] #220,868

qGWnMzz.jpgLet me know if you need a hand

+Sarcasm.zip !36Q3VF.UPM13.6 years ago, 1 minute later, 23 minutes after the original post[T] [B] #220,869

Clever.

+Anonymous D13.6 years ago, 5 hours later, 6 hours after the original post[T] [B] #220,945

@220,868 (B)
Lol

+Anonymous E13.6 years ago, 3 days later, 3 days after the original post[T] [B] #223,334

Depends on your langugage/engine, but here's one using two substitutions with GNU sed in ERE mode.


echo http://www.gofuckashitupyourar.se/boners.php | sed -r -n -e 's#(https?|ftp)://([^/]*).*#\2#;s/.*\.([^\.]*\.[^\.]*$)/\1/p;'

And the same thing in a slightly less robust single subtitution:

echo http://www.gofuckashitupyourar.se/boners.php | sed -r -n -e 's#(https?|ftp)://.*\.([^\./]*\.[^\./]*).*#\2#p'

·FuckAlms !vX8K53rFBI (OP) — 13.6 years ago, 7 hours later, 4 days after the original post[T] [B] #223,451

@previous (E)
The languaage is php

+Anonymous F13.6 years ago, 13 minutes later, 4 days after the original post[T] [B] #223,454

Ask Syntax, he claims to have invented php

+Anonymous G13.6 years ago, 6 minutes later, 4 days after the original post[T] [B] #223,458

@previous (F)
You obviously haven't been paying attention, Syntax invented the computer whilst out free climbing the entire grand canyon.

·FuckAlms !vX8K53rFBI (OP) — 13.6 years ago, 5 minutes later, 4 days after the original post[T] [B] #223,464

@223,454 (F)
@previous (G)
ok, that's enough. this thread isn't about "Syntax" or his life. this thread is about foiling the infernal cuntbots I have to deal with on OTK.

·Anonymous F13.6 years ago, 12 minutes later, 4 days after the original post[T] [B] #223,471

@223,458 (G)
I'm afraid that it's you who hasn't been paying attention. Syntax created the grand canyon back when he was personally advising the Earth on how to form.

+Anonymous H13.6 years ago, 13 minutes later, 4 days after the original post[T] [B] #223,480

lol regex

·Anonymous G13.6 years ago, 34 minutes later, 4 days after the original post[T] [B] #223,502

@223,464 (FuckAlms !vX8K53rFBI)
What's OTK?

·Anonymous E13.6 years ago, 10 minutes later, 4 days after the original post[T] [B] #223,508

PHP version:

<?php
$url = 'http://www.gofuckashitupyourar.se/boners.php';

$mailto = preg_replace('#(https?|ftp)://.*\.([^\./]*\.[^\./]*).*#', "mailto:abuse@$2", $url);

echo $mailto;
echo "\n";
?>

But I don't have an interpreter on this box to test, so YMMV.

·Anonymous G13.6 years ago, 1 minute later, 4 days after the original post[T] [B] #223,509

@previous (E)
What's YMMV?

·Anonymous E13.6 years ago, 29 minutes later, 4 days after the original post[T] [B] #223,528

@previous (G)

> What's YMMV?

It's an acronym. It expands to "Ask google and stop wasting other people's time."

·Anonymous G13.6 years ago, 2 minutes later, 4 days after the original post[T] [B] #223,532

@previous (E)
What's <?php?

·FuckAlms !vX8K53rFBI (OP) — 13.6 years ago, 23 hours later, 5 days after the original post[T] [B] #224,097

@223,508 (E)
Alright, but it doesn't take into account that I can only identify an offending url by the fact that it is input as full html rather than just the url. Namely, the
<a href=


Also, the parsing is done through an array, so things will need to be structured a bit differently:
$markup = array (  # foil bots dumping raw html by breaking links  '/a href=#(https?|ftp)://.*\.([^\./]*\.[^\./]*).*#');function parse($text) {  global $markup;  static $html;  $text = htmlspecialchars($text);  $text = str_replace("\r", '', $text);  if(!$html){    $html = array (    '<a href="mailto:abuse@$2", $url>spam</a>');

·Anonymous E13.6 years ago, 22 hours later, 6 days after the original post[T] [B] #224,876

@previous (FuckAlms !vX8K53rFBI)

I don't think you understand how this works. The first character of the pattern, #, sets up the delimiter. I chose not to use the usual / because / would be found in the pattern. The "standard" fallbacks are # and |, but I tend to prefer #.

Also, using global is awful and should be avoided.
Also, declaring a static local variable inside a function is horrible and the fact that PHP lets you get away with this is no excuse. Just don't.

Other than that I can't figure out what your sample code is supposed to be doing. A larger context would be useful.

Or, just stop using PHP. It's a horrible language that should not be used by anyone for any reason.

·FuckAlms !vX8K53rFBI (OP) — 13.6 years ago, 1 day later, 1 week after the original post[T] [B] #225,812

@previous (E)
Larger context? TC says it's too much context, so have a pastebin:
http://pastebin.com/s4ZCcNYV
The stuff I'm trying to add to is at line 550.

(Edited 1 minute later.)


·Anonymous E13.5 years ago, 2 days later, 1 week after the original post[T] [B] #228,713

@previous (FuckAlms !vX8K53rFBI)

After line 607 insert the following line:

'#(https?|ftp)://.*\.([^\./]*\.[^\./]*).*#',

After line 633, which is now 634 due to the above inserted line, insert the following line:

'mailto:abuse@$2',

Note that this regex will pick up things which are not inside anchor tags, which you may not want. If this is a problem it could be revised.

+Anonymous I13.5 years ago, 11 hours later, 1 week after the original post[T] [B] #229,214

@previous (E)

If you want it to only slurp anchors of a very specific form make the pattern this:

'#(<a href=["\]?)(https?|ftp)://.*\.([^\./]*\.[^\./]*)[^">'\'']*#'


And make the replacement this:

'$1mailto:abuse@$3'

In all probability you will want it to work on any anchor, not just ones which look exactly like <a href=, so...

'#(<a.*href=["\']?)(https?|ftp)://.*\.([^\./]*\.[^\./]*)[^">\' ]*#'

Which will allow (and preserve) other attributes before and after the href attribute.

·Anonymous G13.5 years ago, 5 minutes later, 1 week after the original post[T] [B] #229,215

'mailto:abuse@$2', vs '$1mailto:abuse@$3'

Makes perfect sense really. You do the math

·Anonymous I13.5 years ago, 9 hours later, 1 week after the original post[T] [B] #229,391

@previous (G)
Not sure if serious or just confused.

Since I added a capture pattern for the beginning of the anchor (and whatever non-href attributes preceded href) it is necessary to expand that capture in the substitution. Since I added a capture what was formerly the 2nd capture becomes the 3rd.

The only unknown, to me, is whether PHP's parser is smart enough to realize that $1mailto is $1 followed by mailto and not an identifier by itself. Typically identifiers may not start with numbers, but PHP does a lot of things that are not typical (or wise). In addition the $1 inside a PHP substitution stanza is sort-of magic in that it's not a variable an will expand even when interpolation is disabled, so I'm just guessing that it will behave as it should.

Did I mention that testing is only for uncool, ugly people?

·Anonymous G13.5 years ago, 15 minutes later, 1 week after the original post[T] [B] #229,401

0514479040.gif@previous (I)
Infernal cuntbots will expand even when interpolation is disabled.

·Anonymous I13.5 years ago, 2 hours later, 1 week after the original post[T] [B] #229,551

Okay, so I had to actually put it in a PHP script to make it work and it now relies on one PCRE feature so it's not portable, but....

'#(<a.*href=["\']?)(https?|ftp)://([^/ ">\']*?)\.?([a-zA-Z0-9-]+\.[a-zA-Z0-9-]+\.?)([/ ">\'][^ ">\']*)#'

'$1mailto:abuse@$4'

Test cases:

<a href=http://one.two>http://one.two</a> [P]
<a href=http://zero.one.two>http://zero.one.two</a> [P]

<a href=http://one.two/relative/path>http://one.two/relative/path</a> [P]
<a href=http://zero.one.two/relative/path>http://zero.one.two/relative/path</a> [P]

[<a href= http://one.two/relative/path]" rel="nofollow">http://one.two/relative/path">http://one.two/relative/path</a></a> [P]
[<a href= http://zero.one.two/relative/path]" rel="nofollow">http://zero.one.two/relative/path">http://zero.one.two/relative/path</a></a> [P]

<a class="something" href="http://one.two/relative/path" rel="nofollow">http://one.two/relative/path">http://one.two/relative/path</a></a> [P]
<a class="something" href="http://zero.one.two/relative/path" rel="nofollow">http://zero.one.two/relative/path">http://zero.one.two/relative/path</a></a> [P]

<a class="something" href="http://one.two/relative/path" target="_blank">http://one.two/relative/path</a> [P]
<a class="something" href="http://zero.one.two/relative/path" target="_blank">http://zero.one.two/relative/path</a> [P]

<a href='http://one.two/relative/path'>http://one.two/relative/path</a> [P]
<a href='http://zero.one.two/relative/path'>http://zero.one.two/relative/path</a> [P]

<a href='http://one.two/relative/path' target="_blank">http://one.two/relative/path</a> [P]
<a href='http://zero.one.two/relative/path' target="_blank">http://zero.one.two/relative/path</a> [P]

<a href=http://one.two/relative/path target="_blank">http://one.two/relative/path</a> [P]
<a href=http://zero.one.two/relative/path target="_blank">http://zero.one.two/relative/path</a> [P]


<a class="something" href=http://one.two/relative/path>http://one.two/relative/path</a> [P]
<a class="something" href=http://zero.one.two/relative/path>http://zero.one.two/relative/path</a> [P]

·Anonymous G13.5 years ago, 8 minutes later, 1 week after the original post[T] [B] #229,554

shjito.jpg@previous (I)
> Did I mention that testing is only for uncool, ugly people?

·Anonymous I13.5 years ago, 18 seconds later, 1 week after the original post[T] [B] #229,555

Since I had to go through the trouble of making a test case to get this fixed, here it is in all its glory

http://pastebin.com/WAd1FVZV

Oooh-aahh.

·Anonymous I13.5 years ago, 35 seconds later, 1 week after the original post[T] [B] #229,556

@229,554 (G)

> >Did I mention that testing is only for uncool, ugly people?

Anon has spotted the joke: I am BOTH of those.

(Edited 1 minute later.)


·Anonymous G13.5 years ago, 5 minutes later, 1 week after the original post[T] [B] #229,563

1317110876451.jpg@previous (I)
Well, that works. But does it require a proto part?

·Anonymous I13.5 years ago, 24 minutes later, 1 week after the original post[T] [B] #229,598

@previous (G)

> Well, that works. But does it require a proto part?

The pastebin's version does not.

·Anonymous G13.5 years ago, 1 hour later, 1 week after the original post[T] [B] #229,647

@previous (I)
I feel like I've learned something, but I'm not sure what it is. Thanks!

·Anonymous I13.5 years ago, 3 days later, 2 weeks after the original post[T] [B] #232,340

Bug:

URLs that are domains only that are unquoted that are not followed by any attributes or whitespace before the close tag (>) will fail to match properly and all text up to the next > after the open anchor tag will be improperly replaced.

The fix is this very-slightly-altered regex:

'#(<a.*href=["\']?)((https?|ftp)://)?([^/ ">\']*?)\.?([a-zA-Z0-9-]+\.[a-zA-Z0-9-]+\.?)([/ "\']?[^ ">\']*)#',

Which has been minimally tested and appears not to break any other cases.

·Anonymous G13.5 years ago, 14 minutes later, 2 weeks after the original post[T] [B] #232,356

b9254cc50d74bf370746e500190b0337.gif@previous (I)
Bugs?
> More testing needed

·Anonymous I13.5 years ago, 34 minutes later, 2 weeks after the original post[T] [B] #232,398

@previous (G)

> Bugs?
> >More testing needed

Release early, release often and let the users test for you. Maximum efficiency.

+Anonymous J13.5 years ago, 5 hours later, 2 weeks after the original post[T] [B] #232,643

@previous (I)
Because users are exceedingly proficient at breaking things.

·FuckAlms !vX8K53rFBI (OP) — 13.5 years ago, 1 day later, 2 weeks after the original post[T] [B] #233,826

now it's just doing the "abuse@" part and nothing else for the hyperlink
http://otakutalk.org/topic/1253

Start a new topic to continue this conversation.
Or browse the latest topics.

:

You are required to fill in a captcha for your first 5 posts. Sorry, but this is required to stop people from posting while drunk. Please be responsible and don't drink and post!
If you receive this often, consider not clearing your cookies.



Please familiarise yourself with the rules and markup syntax before posting.