BOOST 1.33.0 Regex试用手记

类别:编程语言 点击:0 评论:0 推荐:

BOOST 1..33.0 快出来了,并重写了regex,增加了

*对unicode支持

*对ATL MFC CString的支持

***********

迫不及待,先下了一个来看看.

源码下载:

=========

boost地址:

cvs -d:pserver:[email protected]:/cvsroot/boost login
cvs -z9 -d:pserver:[email protected]:/cvsroot/boost co -P boost

ICU地址:(boost 1.33.0的regex的unicode解决方案是基于IBM的unicode库ICU)

http://www.ibm.com/software/globalization/icu/

源码编译:

=============

编译环境是vc7.1+vc7.1自带的C++ STL,进入到BOOST_ROOT\libs\regex\build

bjam -sICU_PATH=d:\icu32 -sTOOLS=vc-7_1 stage

Unicode支持测试:

================

看了一下icu的dll,boost regex动态连接的三个dll总体积居然达到10M,心情不好,放弃测试。

ATL MFC支持:

===============

在vc7.1里面,新开个win32 console,加入下面代码:

/*
*
* Copyright (c) 2004
* John Maddock
*
* Use, modification and distribution are subject to the
* Boost Software License, Version 1.0. (See accompanying file
* LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
*
*/

/*
*   LOCATION:    see http://www.boost.org for most recent version.
*   FILE         mfc_example.cpp
*   VERSION      see <boost/version.hpp>
*   DESCRIPTION: examples of using Boost.Regex with MFC and ATL string types.
*/

#define TEST_MFC

#ifdef TEST_MFC

#include <boost/regex/mfc.hpp>
#include <cstringt.h>
#include <atlstr.h>
#include <assert.h>
#include <tchar.h>
#include <iostream>

#ifdef _UNICODE
#define cout wcout
#endif

//
// Find out if *password* meets our password requirements,
// as defined by the regular expression *requirements*.
//
bool is_valid_password(const CString& password, const CString& requirements)
{
    return boost::regex_match(password, boost::make_regex(requirements));
}

//
// Extract filename part of a path from a CString and return the result
// as another CString:
//
CString get_filename(const CString& path)
{
    boost::tregex r(__T("(?:\\A|.*\\\\)([^\\\\]+)"));
    boost::tmatch what;
    if(boost::regex_match(path, what, r))
    {
        // extract $1 as a CString:
        return CString(what[1].first, what.length(1));
    }
    else
    {
        throw std::runtime_error("Invalid pathname");
    }
}

CString extract_postcode(const CString& address)
{
    // searches throw address for a UK postcode and returns the result,
    // the expression used is by Phil A. on www.regxlib.com:
    boost::tregex r(__T("^(([A-Z]{1,2}[0-9]{1,2})|([A-Z]{1,2}[0-9][A-Z]))\\s?([0-9][A-Z]{2})$"));
    boost::tmatch what;
    if(boost::regex_search(address, what, r))
    {
        // extract $0 as a CString:
        return CString(what[0].first, what.length());
    }
    else
    {
        throw std::runtime_error("No postcode found");
    }
}

void enumerate_links(const CString& html)
{
    // enumerate and print all the <a> links in some HTML text,
    // the expression used is by Andew Lee on www.regxlib.com:
    boost::tregex r(__T("href=[\"\']((http:\\/\\/|\\.\\/|\\/)?\\w+(\\.\\w+)*(\\/\\w+(\\.\\w+)?)*(\\/|\\?\\w*=\\w*(&\\w*=\\w*)*)?)[\"\']"));
    boost::tregex_iterator i(boost::make_regex_iterator(html, r)), j;
    while(i != j)
    {
        std::cout << (*i)[1] << std::endl;
        ++i;
    }
}

void enumerate_links2(const CString& html)
{
    // enumerate and print all the <a> links in some HTML text,
    // the expression used is by Andew Lee on www.regxlib.com:
    boost::tregex r(__T("href=[\"\']((http:\\/\\/|\\.\\/|\\/)?\\w+(\\.\\w+)*(\\/\\w+(\\.\\w+)?)*(\\/|\\?\\w*=\\w*(&\\w*=\\w*)*)?)[\"\']"));
    boost::tregex_token_iterator i(boost::make_regex_token_iterator(html, r, 1)), j;
    while(i != j)
    {
        std::cout << *i << std::endl;
        ++i;
    }
}

//
// Take a credit card number as a string of digits,
// and reformat it as a human readable string with "-"
// separating each group of four digits:
//
const boost::tregex e(__T("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z"));
const CString human_format = __T("$1-$2-$3-$4");

CString human_readable_card_number(const CString& s)
{
    return boost::regex_replace(s, e, human_format);
}


int main()
{
    // password checks using regex_match:
    CString pwd = "abcDEF---";
    CString pwd_check = "(?=.*[[:lower:]])(?=.*[[:upper:]])(?=.*[[:punct:]]).{6,}";
    bool b = is_valid_password(pwd, pwd_check);
    assert(b);
    pwd = "abcD-";
    b = is_valid_password(pwd, pwd_check);
    assert(!b);

    // filename extraction with regex_match:
    CString file = "abc.hpp";
    file = get_filename(file);
    assert(file == "abc.hpp");
    file = "c:\\a\\b\\c\\d.h";
    file = get_filename(file);
    assert(file == "d.h");

    // postcode extraction with regex_search:
    CString address = "Joe Bloke, 001 Somestreet, Somewhere,\nPL2 8AB";
    CString postcode = extract_postcode(address);
    assert(postcode = "PL2 8NV");

    // html link extraction with regex_iterator:
    CString text = "<dt><a href=\"syntax_perl.html\">Perl Regular Expressions</a></dt><dt><a href=\"syntax_extended.html\">POSIX-Extended Regular Expressions</a></dt><dt><a href=\"syntax_basic.html\">POSIX-Basic Regular Expressions</a></dt>";
    enumerate_links(text);
    enumerate_links2(text);

    CString credit_card_number = "1234567887654321";
    credit_card_number = human_readable_card_number(credit_card_number);
    assert(credit_card_number == "1234-5678-8765-4321");
    return 0;
}

#else

#include <iostream>

int main()
{
    std::cout << "<NOTE>MFC support not enabled, feature unavailable</NOTE>";
    return 0;
}

#endif

设置编译环境:

=============

*include路径里面包含$(BOOST_ROOT);%(ICU_PATH)\include,都在vc7.1相关include目录之后。

设置编译属性:

============

*使用unicode字符集

*使用/Zc:wchar_t(注意:vc7.1默认编译boost时候,wchar_t是作为元数据处理的,所以,如果要支持unicode,而不是mbcs时候,请使用此编译项编译工程)

*使用多线程调试dll /MDd(请不要使用其他的,如果你不明白这个是什么意思)

*设置宏BOOST_REGEX_DYN_LINK(默认情况下,regex是静态连接,如果想动态连接,就设置此宏)

编译连接“顺利”通过。

编译命令行为:

/Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "BOOST_REGEX_DYN_LINK" /D "_UNICODE" /D "UNICODE" /Gm /EHsc /RTC1 /MDd /Zc:wchar_t /Yu"stdafx.h" /Fp"Debug/capture.pch" /Fo"Debug/" /Fd"Debug/vc70.pdb" /W3 /nologo /c /Wp64 /ZI /TP

连接命令行为:

/OUT:"Debug/capture.exe" /INCREMENTAL /NOLOGO /DEBUG /PDB:"Debug/capture.pdb" /SUBSYSTEM:CONSOLE /MACHINE:X86   kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib

BOOST 1.33.0 regex changelog

===================== 

Boost 1.33.0.

Completely rewritten expression parsing code, and traits class support; now conforms to the standardization proposal. Added support for (?imsx-imsx) constructs. Added support for lookbehind expressions (?<=positive-lookbehind) and (?<!negative-lookbehind). Added support for conditional expressions (?(assertion)true-expresion|false-expression). Added MFC/ATL string wrappers. Added Unicode support; based on ICU. Changed newline support to recognise \f as a line separator (all character types), and \x85 as a line separator for wide characters / Unicode only.

Boost 1.32.1.

Fixed bug in partial matches of bounded repeats of '.'.

Boost 1.31.0.

Completely rewritten pattern matching code - it is now up to 10 times faster than before. Reorganized documentation. Deprecated all interfaces that are not part of the regular expression standardization proposal. Added regex_iterator and regex_token_iterator . Added support for Perl style independent sub-expressions. Added non-member operators to the sub_match class, so that you can compare sub_match's with strings, or add them to a string to produce a new string. Added experimental support for extended capture information. Changed the match flags so that they are a distinct type (not an integer), if you try to pass the match flags as an integer rather than match_flag_type to the regex algorithms then you will now get a compiler error.

[end]

本文地址:http://com.8s8s.com/it/it22496.htm