Set your log format

ezPAARSE allows its users to specify their proxy log format by using the HTTP header Log-Format-xxx, where xxx is the model of the proxy (for example: ezproxy, apache, bibliopam).

The different syntaxes mirror those used by the proxy. It is thus often as easy as directly copy-and-pasting the format declared in your proxy configuration. Beware though, settings are not included in their entirety on ezPAARSE’s side.

EZProxy syntax

  • %h: host IP, from where the request originates
  • %u: login used during authentication
  • %l: distant user name, obtained with identd (always “-”)
  • %b: bytes transfered
  • %U: requested URL (e.g. http://www.somedb.com/).
  • %m: request’s method (e.g. GET, POST).
  • %r: complete request (e.g. GET http://www.somedb.com HTTP/1.0).
  • %t: date/time of the request. The format can be specified in the Date-Format request.
  • %s: HTTP request status code

Regular expressions generated for the EZProxy fields

Each of the above parameters is converted into a regular expression:

  • %h (host) : ([a-zA-Z0-9\.\-]+(?:, ?[a-zA-Z0-9\.\-]+)*)
  • %u (login) : ([a-zA-Z0-9@\.\-_%,=]+)
  • %l (identd) : ([a-zA-Z0-9\-]+|\-)
  • %b (size) : ([0-9]+)
  • %U (url) : ([^ ]+)
  • %m (method) : ([A-Z]+)
  • %r (url) : [A-Z]+ ([^ ]+) [^ ]+
  • %t (datetime) : \[([^\]]+)\]
  • %s (status) : ([0-9]+)

Apache Syntax

  • %h : host IP (from where the request originates)
  • %u : login used during authentication
  • %l : distant user name, obtained with identd (always “-”)
  • %b : bytes transfered
  • %U : requested URL (e.g. http://www.somedb.com/).
  • %r : complete request (e.g. GET http://www.somedb.com HTTP/1.0).
  • %t : date/time of the request. The format can be specified in the Date-Format request.
  • %>s : HTTP request status code

Squid Syntax

  • %ts : resquest’s timestamp (in seconds).
  • %tu : timestamp’s milliseconds
  • %tr : server’s response time
  • %tl : date/time of the request. The format can be specified in the Date-Format request.
  • %>a : host IP, from where the request originates
  • %<a : IP address for the last connection
  • %<A : domain name from the request
  • %lp : port number from the request
  • %Ss : squid status for the request (TCP_MISS, ..).
  • %>Hs : HTTP request status code for the request
  • %<st : response size (headers included).
  • %rm : request method (e.g. GET, POST).
  • %rv : protocol version number
  • %ru : requested URL (e.g. http://www.somedb.com/).
  • %[un : login used to authenticate
  • %Sh : squid hierarchical status (DEFAULT_PARENT, ..).
  • %mt : MIME type of the content
  • %ui : distant user name, obtained with identd

Personalized parameters

Using custom settings allows the retrieval of information from the log lines that do not match with EZProxy standards. Conversely, it is possible to ignore some information that you don’t want to see appear in the consultation events generated by ezPAARSE.

There are three ways of expressing a parameter:

  • %{field_name}<regexp>: retrieves the field corresponding to the specified regexp and adds it to the consultation event with the given field name
  • %{field_name}: retrieves an alphanumeric string (hyphens permitted) and adds it to the consultation event with the specified field name
  • %<regexp>: ignores the part of the log line that matches the corresponding regexp.

Some examples with regular expressions

The following example would capture a datetime formatted as YYYY/MM/DD:hh:mm:ss

%{datetime}<\d{4}/\d{2}/\d{2}:\d{2}:\d{2}:\d{2}>

The following example would recognize a tabulation used as a separator between two fields(%h and %u)

%h%<\t>%u

NB: you should use non-capturing parentheses (?:x) for complex regexps.

Some examples of specific fields

Field Forms Format
%{X-FORWARDED-FOR}i 61.117.43.242
209.85.238.58, 10.0.0.99
%{X-FORWARDED-FOR}<[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}(?:, [0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3})\*>
%{ezproxy-groups}i Default+inpl
Default+915
%{ezproxy-groups}<[a-zA-Z0-9\\+]+>
%{ezproxy-session}i pvJ0HWGo6eWhhVv
UX0Yi0agVZQwHNs
%{ezproxy-session}
**or**
%{ezproxy-session}<[a-zA-Z0-9]+>

Example of a request

curl -X POST --proxy "" --no-buffer -H 'Log-Format-ezproxy: %h %<[-]> %u [%t] "%r" %s %b' --data-binary @test/dataset/sd.2012-11-30.300.log  http://127.0.0.1:59599 -v

Some concrete cases

Proxy Line type Possible format
EZproxy 80.80.80.80 - oBzrStkEVAeUDeA [20/Nov/2011:17:45:50 +0100] "GET http://www.sciencedirect.com:80/science/journal/aip/00121606 HTTP/1.1" 200 162009 %h %l %u %t "%r" %s %b
[18/Nov/2012:00:00:34 +0100] 40.30.25.122 40.30.25.122 5mpcyan6 http://link.springer.com:80/article/10.1007/s00262-008-0620-4/fulltext.html 116636 liV9RqGobWNKrdD %t %h %u %U %b %{session}
Apache 50.50.50.50 - uid=aaa2561c,ou=people,dc=uep-tlfy,dc=fr [27/Mar/2012:06:52:44 +0200] "GET /http/www.sciencedirect.com/science/article/pii/S1875389212003823 HTTP/1.1" 200 45022 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.79 Safari/535.11" %h %l %u %t "%r" %>s %b %<.\*>
Squid 1319061710.284 2102 90.90.90.90 TCP_MISS/200 309401 GET http://www.sciencedirect.com/science/article/pii/S0166218X11003477 cousteau DIRECT/198.81.200.2 text/html %ts.%03tu %6tr %>a %Ss/%03>Hs %<st %rm %ru %[un %Sh/%<a %mt
istproxy.inrialpes.fr:443 123.123.123.123 - tartempion [10/Apr/2012:09:38:21 +0200] "GET http://www.sciencedirect.com/science/article/pii/S0166218X11003477 HTTP/1.1" 302 20 "-" "Mozilla/5.0 (Windows NT 6.1; rv:11.0) Gecko/20100101 Firefox/11.0" %<A:%lp %>a %ui %[un [%tl] "%rm %ru HTTP/%rv" %>Hs %<st %<.\*>

Video Demonstration

This screencast demonstrate how to use the GUI to discover the log file format in order to let ezPAARSE correctly process your log files