[chemfp] tab-only instead of tab-or-space

Andrew Dalke dalke at dalkescientific.com
Wed Aug 24 09:46:30 EDT 2011


Hi Rajarshi,

Thanks for your input.

On Aug 24, 2011, at 3:20 PM, Rajarshi Guha wrote:

> Wouldn't a tab only format be easier - simply because it's just one

> thing to check for?


In Python it's (slightly) easier to do:

fields = line.split() # any runs of whitespace is allowed
# and a\t\tb is two fields

than to

fields = line.split("\t") # only tabs are allows,
# and a\t\tb is three fields

The C code I wrote (which I realized I haven't looked at for
many months) allows any run of spaces and tabs as a separator:

s = line+fp_field_len;
/* The only legal thing here is a space or a tab. */
/* There might be some other character, including a NUL */
/* XXX Why do I allow multiple whitespace ? Check the spec! */
ws_len = strspn(s, " \t"); // \v? \f? \r?
if (ws_len == 0) {
switch (s[0]) {
case '\0': return CHEMFP_BAD_ID;
case '\v':
case '\f': return CHEMFP_UNSUPPORTED_WHITESPACE;
case '\r': if (s[tmp_id_len+1] != '\n') return CHEMFP_UNSUPPORTED_WHITESPACE;
break;
}


It looks like I've never been sure of what I should do here!

Changing this to a single "\t" only does make the C code simpler.


> I certainly support single tab as the field separator


Have you come across anyone who ran into problems confusing
tabs and spaces?

Working with awk might be an issue, since quoting "\t" after
the -F is tricky, but then I realized that the only people
who would have a problem is those with some other whitespace
in the fields, and they would have experience in the annoyances
of dealing with that.




Andrew
dalke at dalkescientific.com




More information about the chemfp mailing list