Skip to content

Commit

Permalink
Factored radix number filter, initial commit.
Browse files Browse the repository at this point in the history
  • Loading branch information
michael-db committed May 19, 2022
0 parents commit 05868bb
Show file tree
Hide file tree
Showing 32 changed files with 1,620 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*~
fr
README.html
19 changes: 19 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Copyright (c) 2022 Michael Breen (https://mbreen.com)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
14 changes: 14 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
.PHONY: test
test: fr
@export DIGITS= # clean default test environment
@ls test/*.test|xargs -n 1 test/run-test

fr: fr.c
@VERSION="$$(./get-version.sh)"; \
echo "Making $@, version=\"$${VERSION}\""; \
sed 's@// VERSION@"\\nVersion: '"$${VERSION}"'\\n"@' $^ | \
cc -o $@ -x c - -Wall -Wextra -std=c99 -pedantic \
-Wmissing-prototypes -Wstrict-prototypes -O2 -Wconversion

install: test
sudo cp fr /usr/bin
59 changes: 59 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# A Factored Radix Number Filter

`fr` transforms numbers to and from quasi-decimal versions of
higher base numbers.

## What is a factored radix number?

A standard number system like base 20 is defined by a radix.
A factored radix number system FR(P,S) is defined by two radices,
for example, FR(2,10).

FR(2,10) is similar to base 20:
- It uses an alphabet of 20 digits (P*S=20),
e.g., `0123456789abcdefghij`.
- A number in base 20 can be represented in the same number of digits
in FR(2,10).

However, FR(2,10) is also similar to decimal:
- Any number that looks like decimal retains its decimal value.
For example, `426` in FR(2,10) has the decimal value 426
(whereas `426` in base 20 is decimal 1646).
- Non-decimal digits are optional and used only to reduce the
number of digits.

Suppose a number is to be represented using at most 2 digits.
Using decimal, 100 values are possible.
Like base 20, FR(2,10) extends this range to 400 values.
However, unlike base 20, the sequence runs
`0`,..,`99`, as for decimal, and then `0a`,..,`jj`
(leading zeros are significant).

Where a number in FR(2,10) has a non-decimal digit,
the decimal value is found as in this example:

```
d7 FR(2,10)
37 "stem": non-decimal digits replaced with mod 10 value
10 "prefix", base 2: a..j -> 1, 0..9 -> 0, so b7 -> 10
2 prefix converted to base 10
237 prefix and stem concatenated: decimal equivalent of b7
```

For a comprehensive account, see <https://mbreen.com/fr>


## To install and run

The filter is written in C (C99) to keep things easy.
On a typical installation of Linux this should work:
```
sudo make install # create /usr/bin/fr
fr # get a usage message
```

To see the full sequence of 2-digit FR(2,10) numbers mentioned above
and their decimal equivalents (using bash):
```
paste <(seq 0 399 |FRDIGITS=DCA fr 2) <(seq 0 399) |less
```
241 changes: 241 additions & 0 deletions fr.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,241 @@
// Copyright (c) 2022 Michael Breen (https://mbreen.com)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static const char* program_name;

// Usage message split on the program name.
static const char* const usage[] = {
"Usage: ", " n [P [S]]\n"
"Convert numbers to or from factored radix form (see\n"
"https://mbreen.com/fr).\n"
" n Desired number of digits, 0..99.\n"
" 0: maximum: output numbers in base S.\n"
" 1: minimum: output canonical FR(P,S).\n"
" P Prefix radix. Default: 2.\n"
" S Stem radix. Default: 10.\n\n"
"The environment variable FRDIGITS must be set to one of:\n"
" DMA Decimal Morphology Alphabet\n"
" DCA Decade-Congruent Alphabet\n"
" or an explicit alphabet (printable ASCII), e.g.,\n"
" 0123456789ABCDEFGHIJ\n\n"
"Examples (POSIX shell):\n"
" export FRDIGITS=DMA\n"
" # FR(5,10) datestamp:\n"
" echo 2021-12-25 | ", " 1 5\n"
" # sort FR(2,10) by intermediate conversion to decimal:\n"
" printf '0z1\\n99q\\n' | ", " 0 | sort -n | ", " 1\n"
" # Display the alphabet (ensure FRDIGITS is set):\n"
" 2>&1 ", " 0 99 |sed -n 's/^Alphabet: //p' |fold -w10\n\n"
"Notes:\n"
" * Insignificant leading zeros should be stripped from\n"
" decimal (or base S) numbers passed to the program.\n"
" * Punctuation characters not present in the alphabet\n"
" are treated as separators between numbers.\n"
" * Thus only whole numbers are supported. Fractional\n"
" numbers should be scaled to remove the '.' or ','.\n"
// VERSION
};

// Output the program name followed by msg and '\n' to stderr.
static void error_msg(const char *msg) {
fputs(program_name, stderr);
fputs(": ", stderr);
fputs(msg, stderr);
fputs("\n", stderr);
}

// Exit with Usage, optionally preceded by an error message.
static void exit_usage(const char* msg) {
if (msg)
error_msg(msg);
size_t i = 0;
for (; i < sizeof(usage)/sizeof(usage[0]) - 1; ++i) {
fputs(usage[i], stderr);
fputs(program_name, stderr);
}
fputs(usage[i], stderr);
exit(1);
}

static const char* const alphabets[] = {
"DMA", "0123456789" "cjzwfsbvxq" "nltmhgdrkp"
"CJZWFSBVXQ" "NLTMHGDRKP" "uiyeaUIYEA",
"DCA", "0123456789" "abcdefghij" "klmnopqrst"
"ABCDEFGHIJ" "KLMNOPQRST" "vwxyzVWXYZ",
};

static const char* get_alphabet(void) {
const char *digchars = getenv("FRDIGITS");
if (!digchars || !digchars[0])
exit_usage("environment variable FRDIGITS undefined");
for (size_t i = 1; i < sizeof(alphabets)/sizeof(alphabets[0]);
i += 2) {
if (!strcmp(digchars, alphabets[i - 1])) {
return alphabets[i];
}
}
return digchars;
}

// Report an error in the alphabet and exit.
static void err_digits(const char* msg) {
error_msg(msg);
fputs("Alphabet: ", stderr);
fputs(get_alphabet(), stderr);
fputs("\n\n", stderr);
exit_usage(NULL);
}

static void overflow(void) {
error_msg("overflow: too many digits");
exit(2);
}

#define isalpha(x) (((c)|0x20) >= 'a' && ((c)|0x20) <= 'z')
#define isdecimal(x) ((unsigned) (x) - '0' < 10)

// Convert an argument of 1 or 2 decimal digits to a number.
static int arg2num(char *arg) {
if (!isdecimal(arg[0]))
exit_usage("arguments must be 1 or 2 decimal digits");
int val = arg[0] - '0';
if (isdecimal(arg[1]) && !arg[2])
val = val * 10 + arg[1] - '0';
else if (arg[1])
exit_usage("arguments must be 1 or 2 decimal digits");
return val;
}

// Report printable/DEL/non-ASCII character as invalid and exit.
static void invalid_input_digit(int c) {
fputs(program_name, stderr);
fputs(": ", stderr);
fputs("input character ",stderr);
if (c < 0x7f) {
fputs("'", stderr);
fputc(c, stderr);
fputs("' not in alphabet\n", stderr);
} else {
fputs("outside ASCII range\n", stderr);
}
exit(1);
}

// Numeric value of digit or radix. For a digit, -1 means "none".
typedef signed char digval;

#define MAX_LEN 100
typedef struct {
digval radix;
int len;
digval digits[MAX_LEN]; // least significant first
} numbr;

// Make *num = *num * factor + addend.
static void mult_add(numbr *num, digval factor, digval addend) {
int j, carry = addend;
for (j = 0; carry || (factor && j < num->len); ++j) {
if (j < num->len)
carry += num->digits[j] * factor;
else if (j >= MAX_LEN)
overflow();
num->digits[j] = (digval) (carry % num->radix);
carry /= num->radix;
}
num->len = j;
}

// Make *num = *num / divisor and return the remainder.
static digval div_mod(numbr *num, digval divisor) {
int remainder = 0;
for (int j = num->len; j;) {
remainder = remainder * num->radix + num->digits[--j];
num->digits[j] = (digval) (remainder / divisor);
num->len -= !num->digits[j] && j + 1 == num->len;
remainder %= divisor;
}
return (digval) remainder;
}

int main(int argc, char **argv) {
program_name = argv[0];
for (const char *ip = program_name; *ip; ++ip)
if (*ip == '/')
program_name = ip + 1;
if (argc < 2 || argc > 4)
exit_usage(NULL);
int width = arg2num(argv[1]);
numbr prefix = {.len = 0, .radix = 2};
if (argc > 2) {
prefix.radix = (digval) arg2num(argv[2]);
}
numbr stem = {.len = 0, .radix = 10};
if (argc > 3) {
stem.radix = (digval) arg2num(argv[3]);
}
if (prefix.radix < 2 || stem.radix < 2)
exit_usage("minimum radix 2 (P and S)");
const char *digit_chars = get_alphabet();
digval digit_vals[0x80];
for (int j = 0; j < 0x80; ++j)
digit_vals[j] = -1;
for (digval v = 0; (size_t) v < strlen(digit_chars); ++v) {
int i = digit_chars[v];
if (i < ' ' || i > '~')
err_digits("alphabet must be printable ASCII");
if (digit_vals[i] != -1)
err_digits("digit repeated in alphabet");
digit_vals[i] = v;
}
if ((int) strlen(digit_chars) < prefix.radix * stem.radix)
err_digits("alphabet must have at least P*S characters");
int digit_count = 0;
for (;;) {
int c = getchar();
digval val = c & 0x80 ? -1 : digit_vals[c];
if (val >= 0 && val < prefix.radix * stem.radix) {
// Accumulate number.
mult_add(&stem, stem.radix,
(digval) (val % stem.radix));
mult_add(&prefix, prefix.radix,
(digval) (val / stem.radix));
if (++digit_count > MAX_LEN)
overflow();
} else {
if (isalpha(c) || isdecimal(c) || c >= 0x7f)
invalid_input_digit(c);
// Output accumulated number, if any.
if (digit_count) {
// Preserve any leading '0's. Add '0's if necessary.
while (digit_count > stem.len || !stem.len)
stem.digits[stem.len++] = 0;
// Reduce stem as desired. This can overshoot,
// making the prefix longer than the stem.
if (width && (stem.digits[stem.len-1] || prefix.len))
while (stem.len > prefix.len && stem.len > width)
mult_add(&prefix, stem.radix,
stem.digits[--stem.len]);
// Increase stem as desired or correct an overshoot.
while (prefix.len && (!width || prefix.len > stem.len
|| stem.len < width)) {
if (stem.len >= MAX_LEN)
overflow();
stem.digits[stem.len++] =
div_mod(&prefix, stem.radix);
}
for (int j = stem.len - 1; j >= 0; --j) {
int v = stem.digits[j];
if (j < prefix.len)
v += prefix.digits[j] * stem.radix;
putchar(digit_chars[v]);
}
digit_count = stem.len = prefix.len = 0;
}
if (c == EOF)
return 0;
putchar(c);
}
}
}
17 changes: 17 additions & 0 deletions get-version.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/usr/bin/env bash
# Copyright (c) 2022 Michael Breen (https://mbreen.com)
# Output a program source-version string derived from the git state.
# If the working directory corresponds to a version tag that has
# been pushed to origin then output "ORIGIN-URL VERSION", e.g.,
# "https://example.com/my-repo v1.2".
set -e

TAG=$( (git describe --tags --match='v[0-9]*' --dirty 2>/dev/null \
|| echo -dirty)|sed s/.*-dirty/devel/)

SOURCE=$( (2>/dev/null git ls-remote --tags origin "$TAG" \
|| >&2 echo "$0: Failed to check origin repo," \
"using working dir as version source.") |
grep -q . && git ls-remote --get-url origin || pwd)

echo "$SOURCE $TAG"
2 changes: 2 additions & 0 deletions test/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.out
*.err
21 changes: 21 additions & 0 deletions test/alphabet-error.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/usr/bin/env bash

PATH=$(dirname "$(readlink -f "$0")")/..:$PATH

>&2 echo "alphabet contains repeated 'h'"
echo | FRDIGITS=0123456789abcdefghhj fr 1

>&2 echo "alphabet character not printable ASCII (0x31 unit separator)"
echo | FRDIGITS=$(printf "0123456789abcdefghi\x1f") fr 1

>&2 echo "alphabet character not printable ASCII (DEL)"
echo | FRDIGITS=$(printf "0123456789\x7fbcdefghij") fr 1

# This also tests the case where the illegal character is
# beyond the range of used characters.
>&2 echo "alphabet character not printable ASCII (multibyte)"
echo | FRDIGITS=$(printf "0123456789abcdefghij\u00b5") fr 1

>&2 echo "alphabet one character too short"
echo | FRDIGITS=$(printf "0123456789abcdefghi") fr 1

Loading

0 comments on commit 05868bb

Please sign in to comment.