[chemfp] tab-only instead of tab-or-space
Andrew Dalke
dalke at dalkescientific.com
Wed Aug 24 09:46:30 EDT 2011
Hi Rajarshi,
Thanks for your input.
On Aug 24, 2011, at 3:20 PM, Rajarshi Guha wrote:
> Wouldn't a tab only format be easier - simply because it's just one
> thing to check for?
In Python it's (slightly) easier to do:
fields = line.split() # any runs of whitespace is allowed
# and a\t\tb is two fields
than to
fields = line.split("\t") # only tabs are allows,
# and a\t\tb is three fields
The C code I wrote (which I realized I haven't looked at for
many months) allows any run of spaces and tabs as a separator:
s = line+fp_field_len;
/* The only legal thing here is a space or a tab. */
/* There might be some other character, including a NUL */
/* XXX Why do I allow multiple whitespace ? Check the spec! */
ws_len = strspn(s, " \t"); // \v? \f? \r?
if (ws_len == 0) {
switch (s[0]) {
case '\0': return CHEMFP_BAD_ID;
case '\v':
case '\f': return CHEMFP_UNSUPPORTED_WHITESPACE;
case '\r': if (s[tmp_id_len+1] != '\n') return CHEMFP_UNSUPPORTED_WHITESPACE;
break;
}
It looks like I've never been sure of what I should do here!
Changing this to a single "\t" only does make the C code simpler.
> I certainly support single tab as the field separator
Have you come across anyone who ran into problems confusing
tabs and spaces?
Working with awk might be an issue, since quoting "\t" after
the -F is tricky, but then I realized that the only people
who would have a problem is those with some other whitespace
in the fields, and they would have experience in the annoyances
of dealing with that.
Andrew
dalke at dalkescientific.com
More information about the chemfp
mailing list