Friday, July 11, 2014

Perl virtual tables for DBD::SQLite : ready to test

Followup to my previous article : a first draft of Perl virtual table support for DBD::SQLite is available at https://github.com/DBD-SQLite/DBD-SQLite/tree/vtab .

This is still alpha software, but it shows the idea; I still don't know when this work will be mature enough for a CPAN publication.

Two examples of virtual table modules are bundled with the distribution :
  • FileContent : implements a virtual column that exposes file contents. This is especially useful
    in conjunction with a SQLite FTS fulltext index; see the doc in Fulltext_search.pod
  • PerlData :  binds a virtual table to a Perl array within the Perl program. This can be used for simple import/export operations, for debugging purposes, for joining data from different
    sources, etc.
I'm currently thinking of one more example , which would be fun to play with : a virtual table that would proxy to another DBI connection. Then we could join data from various sources, using SQLite's features to do the joining work. Sounds quite exciting, but at this point this is just an idea.

Thanks to Salvador Fandiño who pointed me to https://metacpan.org/pod/SQLite::VirtualTable, which is similar in idea but more meant to embed a Perl interpreter inside a sqlite application, rather than the other way around; this code helped me build the DBD::SQLite version.

Of course any comments/suggestions are welcome.

Friday, July 4, 2014

project : Perl virtual tables for DBD::SQLite

During the year I have more and more management tasks and less time for programming. So for the holidays I wanted a change and decided to engage into a really "hardcore programming"  project, namely to add support for Perl virtual tables within the DBD::SQLite database driver. SQLite has this notion of "virtual tables" which look like regular tables but are implemented through callback routines. This project  implies some C programming, using Perl XS API, and the delicate part is to design some appropriate glue between SQLite's notion of "object-oriented", through extensible C structures and callbacks, and Perl's object-oriented features.

At the beginning I wasn't even sure if such a project would be feasible, but now it is slowly taking shape and I'm pretty confident that it will eventually reach something usable. The concept is quite similar to Perl's tied variables, where a published API is reused for accessing many different kinds of data; except that here the published API is SQL instead of hashes or array operators. As a result, we could have virtual tables bound to the filesystem, to the Win32::Registry, to some configuration data, or any other accessible resource. This will open a new field for lots of creative ideas.

My main motivation for doing this work is to be able to build a framework for collections of documents, using SQLite for the fulltext index, and using the filesystem for storing document content : this will be a much more powerful replacement for my very old File::Tabular::Web::Attachment::Indexed module. That module is still heavily used at Geneva courts of law, but now we have 10 years of data, and the old architecture is clearly showing its limits.

For the virtual tables project I need some test cases, so if anybody has ideas about Perl-accessible data to be published as an SQL table, I'm interested.




Monday, June 23, 2014

Perl smartmatch : what now ?

Sorry to wake up an old discussion, but ... does anybody have a clear idea of what is going to happen to smartmatch ?

Our team maintains dozens of internal applications and modules containing "given/when" and smartmatch statements. Most of this code was written between 2007 and 2012 -- remember, at that time smartmatch was an official feature, never mentioned as being "experimental", so we happily used it in many places. The reasons for using smartmatch were quite modest :
  • match a scalar against an array
  • match 2 scalars, without a warning when one of the scalars is undef
  • more readable switch statements, thanks to "given/when"
 When 5.18 came out, I was quite worried about the regression of smartmatch to  "experimental" status, but I was confident that things would be settled in 5.20, so I decided not to upgrade (we still use 5.14). Now 5.20 is out .. and nothing has changed about smartmatch, without even a clue about how this is going to evolve.

Our servers cannot easily upgrade to 5.20, because this would throw warnings all over the place. I tried to find a way to globally turn off these warnings (like set PERL5OPT=-M-warnings=experimental::smartmatch, or  PERL5OPT=-M=experimental::smartmatch), but this doesn't work because the "no warnings" pragma is lexically scoped, so global settings are not taken into account.

So my options are :
  1. don't change anything, don't upgrade, and wait for 5.22, hoping that some reasonable form of smartmatch will be reintroduced into the core
  2. revise all source files, adding a line "use experimental qw/smartmatch/;" at the beginning of each lexical scope ... but I have no guarantee that this will still work in future versions
  3. revise all source files, removing the given/when/smartmatch statements and replacing them with plain old Perl, or with features from some CPAN modules like match::smart or Smart::Match ... but it would be a pity to engage in such work if regular smartmatch comes back in a future version of Perl.
As you can see, none of these options is really satisfactory, so I would be interested in hearing if other companies are in the same situation and how they decided to handle it.

By the way, I love the new Perl way of introducing new features as "experimental", until they become stable and official ... but this only works well when the experimental status is declared from the start. The problem with smartmatch is that it had been official for several years, before being retrograted to experimental. Agreed, the full semantics of smartmatch as published in 10.0 had inconsistencies, but throwing away the whole thing is a bit too harsh -- I'm sure that many users like me would be happy with a reasonable subset of rules for matching common cases.
  
Thanks in advance, Laurent Dami
 

Saturday, May 25, 2013

surprising interaction betwen list context and range operator

I was preparing a quiz for Perl programmers, and wanted a question on the difference between arrays and lists. So the question was : what do you get in $x and $y after

my @tab = ( 1, 3, 6..9 );
my $x   = @tab;
my $y   = ( 1, 3, 6..9 );


where I expected $x to be 6 (the number of items) and $y to be 9 (the last element), as documented in perldata. But before distributing the quiz I thought it was worth a test ... and indeed it was! To my surprise, $y turns out to be an empty string. It took me some time to find out why : with

my $y = ( 1, 3, 6, 7, 8, 9 );

the result is 9 indeed. But with

my $y = ( 1, 3, 6..9 );


the interpreter does not build the range in  list context, and then keep the last element. What happens instead is that it throws away the beginning of the list (there is even a warning about that), and then evaluates

my $y = 6..9;


in scalar context; where 6..9 is no longer a range, but a flip-flop operator. Now since the left-hand side is true, that operator is supposed to be true, right ? Well, no, because 6 is a constant, and in that case, perlop tells us that the flip-flop is  "considered true if it is equal (==) to the current input line number (the $. variable)". So $y ends up being an empty string, while

my $six  = 6;
my $nine = 9;
my $y    = $six..$nine;


would yield 1E0!

I couldn't be that nasty to the interviewed programmers, so in the end that question will not be part of the quiz.

Sunday, February 17, 2013

Slices of method calls within an object

Several years ago I complained that object accessors prevent you from using some common Perl idioms; in particular, you couldn't take a slice of several attributes within an object.

Now I just discovered the Want module; this is great for playing with lvalue subroutines. With the help of this module, I was finally able to use slices of method calls within an object : see https://metacpan.org/module/Method::Slice . There are some caveats in comparison with a usual hash slice, but nevertheless the main point is there : you can extract some values :

  my @coordinates = mslice($point, qw/x y/);

or even assign to them:

  (mslice($point, qw/x y/)) = (11, 22);

This was written just for fun ... but in the end I might use it in real code, because in some situations, I find slices to be more compact and high-level than multiple assignment statements.

Sunday, December 2, 2012

How to test if something is a Perl class ?

For Data::Domain I needed a way to test if something is a Perl class. Since UNIVERSAL is the mother of all classes, it seemed to make sense to define the test as

defined($_) && !ref($_) && $_->isa('UNIVERSAL')

Other people did it through $_->can('can') or UNIVERSAL::can($_, 'can'). Both ways used to work fine, but this is no longer true since June 22, 2012 (bleadperl commit 68b4061) : now just any string matches these conditions.

At first I found this change a bit odd, but somehow it makes sense because any string will answer to the 'can' and 'isa' methods. Also, bless({}, 'Foo') and 'Foo' now return consistent answers for ->isa() and for ->can(), which was not the case before. So let's agree that this change was a good step.


But now it means that  while every object or class is a UNIVERSAL, the reverse is not true : things that are UNIVERSAL are not necessarily objects or classes. Hum ... but what "is a class", really ?

Moose type 'ClassName' defines this through Class::Load::is_class_loaded, which returns true if this is a package with a $VERSION, with an @ISA, or with at least one method. By this definition, an empty package is not a class. However, perldoc says that a class is "just a package", with no restriction.

So after some thoughts I ended up with this implementation :

defined($_) && !ref($_) && $_->can($_)

This returns true for any defined package, false otherwise, and works both before and after June 22.

Thoughts ?




Saturday, December 1, 2012

Hash key order : beware of implicit assumptions

Perl hashes are not ordered, so one is not supposed to make assumptions about the key order. I thought I did not ... but Perl 5.17.6 showed me that I was wrong !

About two weeks ago I started receiving report about test failures which were totally incomprehensible to me. Since I work on Windows, I had no bleadperl environment, so it was hard to guess what was wrong just from the test reports. Andreas König kindly opened a ticket in which he spotted that the problem was related to a recent change in bleadperl : now Perl not only makes no guarantee about the key order, it even guarantees that the key order will be different through several runs of the same program!

This convinced me of investing some time to get a bleadperl environment on my Windows machine : VMware player + a virtual Unbutu + perlbrew did the job perfectly. Now I could start working on the bug.

The point were I was making an implicit assumption was a bit nasty, so I thought it was worth writing this post to share it : the code in SQL::Abstract::More more or less went like this :

  my $ops   = join "|", map quotemeta, keys %hash;
  my $regex = qr/^($ops)?($rest_of_regex)/;

See the problem ? The regex starts with an alternation derived from the hash keys. At first glance one would think that the order of members in the alternation is not important ... except when one member is a prefix of the other, because the first member wins. For example, matching "absurd" against qr/^(a|ab|abc)?(.*)/ is not the same as qr/^(abc|ab|a)?(.*)/ : in one case $1 will contain 'a', in the other case it will contain 'ab'.

To fix the problem, the code above was rewritten to put the longest keys first, and everything is fine again.