[geeklog-devel] COM_makeClickableLinks

Sami Barakat furiousdog at gmail.com
Tue Jul 29 16:49:34 EDT 2008


Hey,

I have tried looking into this and I have come up with a partial
solution. From my understanding the problem is when a url has a  
at the end which is getting parsed along with the url. I ask because I
think Gmail has filtered out some of them. Anyway the following regex

([^"]?)(((ht|f)tps?):(\/\/)|www\.)([a-z0-9%&_\-\+,;=:@~#\/.\?\[\]]+)(?<![ ])

Seems to work fairly well. Here is the test code that I am using.

echo '<pre>';
$string = "normal link http://www.url.com PASS\n";
echo htmlentities(COM_makeClickableLinks($string));
$string = "link with   and quotes \"http://www.url.com \" PASS\n";
echo htmlentities(COM_makeClickableLinks($string));
$string = "complicated link
\"www.sub.url.com/folder/index.php?id=foo&user=bar \"
PASS\n";
echo htmlentities(COM_makeClickableLinks($string));
$string = "problem link \"www.url.com/words \" FAIL\n";
echo htmlentities(COM_makeClickableLinks($string));
echo '</pre>';

This produces

normal link <a href="http://www.url.com">www.url.com</a> PASS
link with   and quotes "<a
href="http://www.url.com">www.url.com</a> " PASS
complicated link "<a
href="http://sub.url.com/folder/index.php?id=foo&user=bar">sub.url.com/folder/index.php?id=foo&user=bar</a> "
PASS
problem link "<a href="http://url.com/word">url.com/word</a>s " FAIL

As you can see the first 3 work, the problem occurs when a url ends
with any of the characters: '&' or 'n' or 'b' or 's' or 'p' or ';'

So www.url.com/ps would return <a href="http://url.com/">url.com/</a>ps

This is due to the last bit of the regex "(?<![ ])" if I tried
just doing (?<! ) but it does not work at all because the
previous statement is being too greedy. There is also an issue with
the www. being removed, but thats not too much of a problem at the
moment.

Also the COM_makeClickableLinks function can be simplified by removing
the str_replace statment resulting in simply this

function COM_makeClickableLinks( $text )
{
    $text = preg_replace(
'/([^"]?)(((ht|f)tps?):(\/\/)|www\.)([a-z0-9%&_\-\+,;=:@~#\/.\?\[\]]+)(?<![ ])/is',
'\\1<a href="http://\\6">\\6</a>', $text );
    return $text;
}


in the original regex I was unsure why the "(\/|[+0-9a-z])" part was
included. I dont think its necessary so I took it out, maybe there was
a particular case that required it which Im overlooking.

Anyhow I will have another crack at it later on, it really is a tough
one, but this is as far as ive got so far.

Sami

2008/7/28 Michael Jervis <mjervis at gmail.com>:
> All (especially Sami!),
>
> There is a bug in the subject function. If it finds
> "http://www.url.com" we end up with &nbsp<a
> href=";http://www.url.com&nbsp">;http://www.url.com&nbsp</a>;
>
> Which isn't good.
>
> The original regexp in COM_MakeClickableLinks is:
>
> /([^"]?)((((ht|f)tps?):(\/\/)|www\.)[a-z0-9%&_\-\+,;=:@~#\/.\?\[\]]+(\/|[+0-9a-z]))/is
>
> I think the first match ([^"]?) is spurious, it matches anything other
> than  " before a link. So bhttp://www.foo.com" matches, but
> "http://www.foo.com doesn't.
>
> So that gives:
> /((((ht|f)tps?):(\/\/)|www\.)[a-z0-9%&_\-\+,;=:@~#\/.\?\[\]]+(\/|[+0-9a-z]))/is
>
> Resulting in:
>  <a href="http:///www.url.com&nbsp">http://www.url.com&nbsp</a>
>
> So, need to add an "ignore trailing  " bit to the clause. Closest
> I can get is:
> ((((ht|f)tps?):(\/\/)|www\.)[a-z0-9%&_\-\+,;=:@~#\/.\?\[\]]+(\/|[+0-9a-z]))(?= )
>
> Which results in:
>  <a href="http:///www.url.com">http://www.url.com</a>
> However, unless there were quotes round the link, it won't match! So
> "http://www.foo.com" matches and is correctly processed, but
> http://www.foo.com is not matched.
>
> My head is now hurt. Any suggestions?
>
> --
> Michael Jervis
> mjervis at gmail.com
> 504B03041400000008008F846431E3543A820800000006000000060000007765
> 62676F642B4F4D4ACF4F0100504B010214001400000008008F846431E3543A82
> 0800000006000000060000000000000000002000000000000000776562676F64
> 504B05060000000001000100340000002C0000000000
> _______________________________________________
> geeklog-devel mailing list
> geeklog-devel at lists.geeklog.net
> http://eight.pairlist.net/mailman/listinfo/geeklog-devel
>



More information about the geeklog-devel mailing list