Difference between revisions of "The Flex Lexical Analyzer/Exercise 8 - Message list and simple accounting"

From Wiki**3

< The Flex Lexical Analyzer
(The Problem (in Portuguese))
 
(3 intermediate revisions by the same user not shown)
Line 12: Line 12:
  
 
Exemplo de entrada:
 
Exemplo de entrada:
<text>
+
<source lang="text">
 
Results are out!             Lucke vader@empire.net                23:38  902 KB
 
Results are out!             Lucke vader@empire.net                23:38  902 KB
 
Adorable kitties             Mary jaba@hut.com                    21:21  1000 KB
 
Adorable kitties             Mary jaba@hut.com                    21:21  1000 KB
Bad Romance John Me, "Tom & Jerry"                5:08  10 KB
+
Bad Romance     John Me, "Tom & Jerry"                5:08  10 KB
Telephone John       everyone@list.net                9:31  201 KB
+
Telephone     John         everyone@list.net                9:31  201 KB
Telephone John       everyone@list.net                9:34  101 KB
+
Telephone     John         everyone@list.net                9:34  101 KB
Re: Telephone John people@work.com                  9:45  901 KB
+
Re: Telephone     John people@work.com                  9:45  901 KB
All The Pretty Little Horses Mary cute@gmail.com                  12:38  1 MB
+
All The Pretty Little Horses     Mary cute@gmail.com                  12:38  1 MB
Fwd: Adorable kitties       Mary cute@gmail.com                  01:21  1 MB
+
Fwd: Adorable kitties             Mary cute@gmail.com                  01:21  1 MB
</text>
+
</source>
  
 
Saída correspondente:
 
Saída correspondente:
<text>
+
<source lang="text">
 
Mary, 3048
 
Mary, 3048
 
9, 1203
 
9, 1203
</text>
+
</source>
  
 
== Solution ==
 
== Solution ==
Line 33: Line 33:
 
Download this code: [[media:list-accounting.l|list-accounting.l]].
 
Download this code: [[media:list-accounting.l|list-accounting.l]].
  
<text>
+
<source lang="text">
 
%option stack 8bit noyywrap
 
%option stack 8bit noyywrap
 
%{
 
%{
Line 104: Line 104:
  
 
%%
 
%%
</text>
+
</source>
  
 
The main function appears in the final section of the lexical specification. The actions performed by this function are the following: call the lexical analyzer; extract the map entries corresponding to the maximum size by sender and maximum size by hour; and, finally, present those entries in the standard output.
 
The main function appears in the final section of the lexical specification. The actions performed by this function are the following: call the lexical analyzer; extract the map entries corresponding to the maximum size by sender and maximum size by hour; and, finally, present those entries in the standard output.
<cpp>
+
<source lang="c++">
 
int main() {
 
int main() {
 
   typedef SIMT::value_type SIVT;
 
   typedef SIMT::value_type SIVT;
Line 120: Line 120:
 
   return 0;
 
   return 0;
 
}
 
}
</cpp>
+
</source>
  
 
Note that we are using features from C++11, namely, lambda functions. These are ultimately not necessary, but they make the code more compact.
 
Note that we are using features from C++11, namely, lambda functions. These are ultimately not necessary, but they make the code more compact.

Latest revision as of 13:39, 4 March 2019

The Problem (in Portuguese)

Crie um analisador lexical (especificação para a ferramenta Flex) que aceite uma lista de cabeçalhos de mensagens e que indique na saída o remetente que produz mais tráfego (e o valor do tráfego em KB) e qual a hora de maior tráfego (e o valor do tráfego em KB).

Cada uma das linhas tem os seguintes campos (separados por um ou mais caracteres de tabulação):

  1. 1º, 2º e 3ª campos (cadeias de caracteres): respectivamente, assunto, designação do remetente e designações dos destinatários (separados por vírgulas). Estes campos não podem conter tabulações. Quaisquer outros caracteres (excepto mudanças de linha) são permitidos. Se um nome estiver entre aspas, pode conter qualquer carácter (incluindo tabulações), excepto a mudança de linha (aspas no interior da cadeia de caracteres são representadas por \").
  2. 4º campo: duração (formato hora:minuto – hora: 1 ou 2 algarismos, minuto: sempre 2 algarismos). Exemplos: 2:35, 12:56, 20:22, 2:30, 0:27.
  3. o último campo indica o tamanho da mensagem e a unidade (KB ou MB = 1024 KB).

Codifique todas as funções auxiliares. Pode utilizar estruturas de dados C++ e da STL (e.g. std::string, std::vector, std::map, etc.). Por simplicidade, considere os caracteres acentuados iguais aos não acentuados.

Exemplo de entrada:

Results are out!	            Lucke		vader@empire.net                 23:38  902 KB
Adorable kitties	            Mary		jaba@hut.com                     21:21  1000 KB
Bad Romance			    John		Me, "Tom & Jerry"                 5:08  10 KB
Telephone			    John	        everyone@list.net                 9:31  201 KB
Telephone			    John	        everyone@list.net                 9:34  101 KB
Re: Telephone			    John		people@work.com                   9:45  901 KB
All The Pretty Little Horses	    Mary		cute@gmail.com                   12:38  1 MB
Fwd: Adorable kitties	            Mary		cute@gmail.com                   01:21  1 MB

Saída correspondente:

Mary, 3048
9, 1203

Solution

Download this code: list-accounting.l.

%option stack 8bit noyywrap
%{
#include <map>
#include <string>
#include <iostream>
#include <algorithm>

static std::string subject, sender, recipient;
static std::string strlit, *current;

static int hour = -1, minutes = -1;
static int size = -1, multiplier = 1;

typedef std::map<std::string, int> SIMT;
typedef std::map<int,         int> IIMT;

static SIMT sizesBySender;
static IIMT sizesByHour;

inline void yyerror(const char *msg) { std::cerr << msg << std::endl; }

%}

%x X_FROM X_TO X_HOURS X_MINUTES X_SIZE X_STRING

%%

<INITIAL>\"     strlit  = yytext; current = &subject;   yy_push_state(X_STRING);
<X_FROM>\"      strlit  = yytext; current = &sender;    yy_push_state(X_STRING);
<X_TO>\"        strlit  = yytext; current = &recipient; yy_push_state(X_STRING);

<INITIAL>\t+    BEGIN(X_FROM); sender = "";
<INITIAL>.      subject += *yytext;
<INITIAL>\n     yyerror("newline in subject");

<X_FROM>\t+     BEGIN(X_TO); recipient = "";
<X_FROM>.       sender += *yytext;
<X_FROM>\n      yyerror("newline in sender");

<X_TO>\t+       BEGIN(X_HOURS);
<X_TO>.         recipient += *yytext;
<X_TO>\n        yyerror("newline in recipient");

<X_HOURS>[[:digit:]]{1,2}       hour = strtol(yytext, NULL, 10);
<X_HOURS>":"                    if (hour == -1) yyerror("illegal time: no hour specified"); else BEGIN(X_MINUTES);
<X_HOURS>\t+                    yyerror("illegal time: no minutes specified");
<X_HOURS>.|\n                   yyerror("illegal time: bad number (hour)");
<X_MINUTES>\t+                  if (minutes == -1) yyerror("illegal time: no minutes specified"); else BEGIN(X_SIZE);
<X_MINUTES>[[:digit:]]{2}       minutes = strtol(yytext, NULL, 10);
<X_MINUTES>.|\n                 yyerror("illegal time: bad number (minutes)");

<X_SIZE>[ \t]+                  ; /* ignore */
<X_SIZE>[[:digit:]]+            size = strtol(yytext, NULL, 10);
<X_SIZE>"KB"                    multiplier = 1;
<X_SIZE>"MB"                    multiplier = 1024;
<X_SIZE>.                       yyerror("bad character for size");
<X_SIZE>\n                      {
  size *= multiplier;
  sizesBySender[sender] += size;
  sizesByHour[hour] += size;
  BEGIN(INITIAL); subject = "";
}

<X_STRING>\\\"  strlit += yytext;
<X_STRING>\"    strlit += yytext; yy_pop_state(); *current += strlit;
<X_STRING>.     strlit += yytext;
<X_STRING>\n    yyerror("newline in string");


%%

The main function appears in the final section of the lexical specification. The actions performed by this function are the following: call the lexical analyzer; extract the map entries corresponding to the maximum size by sender and maximum size by hour; and, finally, present those entries in the standard output.

int main() {
  typedef SIMT::value_type SIVT;
  typedef IIMT::value_type IIVT;
  yylex();
  SIMT::iterator sender = std::max_element(sizesBySender.begin(), sizesBySender.end(),
                                           [](SIVT &a, SIVT &b){return a.second < b.second;});
  IIMT::iterator hour   = std::max_element(sizesByHour.begin(),   sizesByHour.end(),
                                           [](IIVT &a, IIVT &b){return a.second < b.second;});
  std::cout << sender->first << ", " << sender->second << std::endl;
  std::cout << hour->first   << ", " << hour->second   << std::endl;
  return 0;
}

Note that we are using features from C++11, namely, lambda functions. These are ultimately not necessary, but they make the code more compact.

The command for supporting these features in GCC is the following:

flex list-accounting.l
g++ -o list-accounting lex.yy.c -std=c++11

To execute the program, and assuming that a list is file mylist.txt, do the following:

./list-accounting < mylist.txt