Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster search #12

Open
samozrejmost opened this issue Feb 11, 2015 · 4 comments
Open

faster search #12

samozrejmost opened this issue Feb 11, 2015 · 4 comments

Comments

@samozrejmost
Copy link

Hello,

because I sometimes analyze memory dumps with hexcurse, I needed a faster search - so I implemented the Boyer–Moore string search algorithm to the hexSearch() function. It works fine for me, but it's a bit crude now (for example it doesn't work for edited files), but if you are interested, I can send you the full "patch".

Sample code:

// algorithm based on the wikipedia example
off_t hexSearchBM(FILE *fp, int pat[], off_t startfp, size_t patlen)
{
    if (! (pat && patlen > 0)) return -1;

    size_t      i, m = 1;
    int         j = patlen - 1;
    size_t      delta1 [ ALPHABET_LEN ];
    size_t      *delta2 = (size_t *)malloc(patlen * sizeof(size_t));

    char        *buf;
    char        *patt = (char *) malloc(patlen);

    if (posix_memalign((void **)&buf, getpagesize(), BUF_L) != 0)
        return -1;

    if (! (delta2 && patt)) return -1;

    make_delta1(delta1, pat, patlen);
    make_delta2(delta2, pat, patlen);
    for (i = 0; i < patlen; i++) patt[i] = (unsigned char) pat[i];

    size_t  n, rem_bytes = 0, bytes_to_read = BUF_L, full_length;
    off_t   pos1, pos2;

    fseeko(fp, startfp, SEEK_SET);          /* begin from loc     */

    pos1 = pos2 = startfp;
    while (true) {
        n = fread(&buf[rem_bytes], 1, bytes_to_read, fp);
        full_length = n + rem_bytes;
        if (n == 0 || full_length < patlen) break;
        pos2 = pos1 + full_length;

        i = patlen - 1;
        while (i < full_length) {
            j = patlen - 1;
            while (j >= 0 && (buf[i] == patt[j])) {
                --i;
                --j;
            }
            if (j < 0) {
                free(buf); free(patt); free(delta2);
                return (pos1 + i + 1);
            }

            m = max(delta1[(unsigned char) buf[i]], delta2[j]);
            i += m;
        }

        rem_bytes = full_length + m - i - 1;
        if (rem_bytes > full_length) rem_bytes = full_length;

        bytes_to_read = BUF_L - rem_bytes;
        memmove(buf, &buf[full_length - rem_bytes], rem_bytes);
        pos1 = pos2 - rem_bytes;
    }

    free(buf);
    free(patt);
    free(delta2);

    return -1;
}
@ghost
Copy link

ghost commented Feb 12, 2015

Very cool, definitely send along the full patch!

@samozrejmost
Copy link
Author

Sending the patch (pastebin), I diffed the src directories this way:

diff -uprN -x '*.o' -x 'Makefile*' -x '*.Po' hexcurse-master-org/src hexcurse-master/src

Changelog:

  • wacceptch() was giving SIGSEGV for very long search strings, changed ch[17] to ch[81]
  • implemented faster search (about 10 times faster)
  • now it's also possible to monitor progress of a search and interrupt it by hitting Esc
  • changed the behaviour of F5 key - it automatically searches for the next occurence of the search string - is it OK?

I tested it on my Ubuntu 10.10 desktop (32bit) and Ubuntu 14.04 server (64bit).

I hope it will be useful ;) !

@ghost
Copy link

ghost commented Feb 24, 2015

Haven't forgot about this! Plan on integrating things this week!

ghost pushed a commit that referenced this issue Jul 3, 2015
@prso
Copy link

prso commented Jul 29, 2021

For what it's worth I have integrated this patch with some improvements in the branch improve-search of my own fork.

Also I sent a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants