Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions include/mata/nfa/algorithms.hh
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,8 @@ Nfa complement_classical(const Nfa& aut, const Mata::Util::OrdVector<Symbol>& sy
* i.e., if the final intersection of smaller complement of bigger is empty.
*/
bool is_included_naive(const Nfa& smaller, const Nfa& bigger, const Alphabet* alphabet = nullptr, Run* cex = nullptr);
//naive is not a good word, maybe textbook, or explicit, construct_first
//the alphabet parameter is useless in inclusion
Comment on lines 70 to +72
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe _basic or _simple? But I do not mind _naive. Why do you think it is not ideal? We talk about the implementations/algorithms as naive in our discussions all the time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussions are private, code is public. I would go for textbook.

Copy link
Collaborator

@Adda0 Adda0 Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the algorithm, however basic, has a name, we can use the name instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the explicit. I have read a textbook with the antichain-based inclusion checking. Basically, you could append textbook to every function name in mata.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like naive because it is not really clear what it is outside our group and it is somehow pejorative.
I would like textbook, I think everybody will know, I don't think it can be really confused with antichains. Explicit is ok too, but it requires a lot of context to get the meaning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think explicit is little bit better. textbook seems a little bit weird.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like basic/simple, but textbook is also ok for me. I agree with Lukas, naive sounds pejorative, explicit does not really give me any information.

Comment on lines 70 to +72
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alphabet parameter is not useless. If you already have the alphabet, you can pass it to the function and save yourself creating the alphabet in the function again. Inclusion check can be rewritten, but for the current implementation, the alphabet parameter is an optimization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need an alphabet in inclusion?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because inclusion is computed using a complement of an automaton, which in our interface requires the alphabet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In inlcusion, the complement can be with respect to the symbols used in the other automaton, not with respect to an external alphabet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, the alphabet can be inherited. There might be an issue when you construct the complement wrt an alphabet that do not contain all symbols from the automaton before complement (but who knows).

Copy link
Member

@jurajsic jurajsic Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @Adda0 said (and it is clearly explained in the comment for this function), alphabet is there as an optimization if you already have it. Otherwise, you have to go trough the whole delta of both automata to create the alphabet and that could be costly. But maybe not, you are going to do complement anyway, that will probably take much more time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, the efficient antichain inclusion does not need the alphabet at all, the explicit one will do much more constly operations than collecting alphabet anyway.


/**
* Inclusion implemented by antichain algorithms.
Expand All @@ -79,6 +81,7 @@ bool is_included_naive(const Nfa& smaller, const Nfa& bigger, const Alphabet* al
* i.e., if the final intersection of smaller complement of bigger is empty.
*/
bool is_included_antichains(const Nfa& smaller, const Nfa& bigger, const Alphabet* alphabet = nullptr, Run* cex = nullptr);
//antichains -> antichain. Everywhere here.

/**
* Universality check implemented by checking emptiness of complemented automaton
Expand All @@ -88,6 +91,7 @@ bool is_included_antichains(const Nfa& smaller, const Nfa& bigger, const Alphabe
* @return True if the complemented automaton has non empty language, i.e., the original one is not universal
*/
bool is_universal_naive(const Nfa& aut, const Alphabet& alphabet, Run* cex);
//naive -> textbook/explicit

/**
* Universality checking based on subset construction with antichain.
Expand All @@ -98,6 +102,22 @@ bool is_universal_naive(const Nfa& aut, const Alphabet& alphabet, Run* cex);
*/
bool is_universal_antichains(const Nfa& aut, const Alphabet& alphabet, Run* cex);

//the alphabet parameter should be optional? The automaton can have its alphabet too. Run should be optional.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that Run should be optional.

Making alphabet optional would mean that we can (in this order), check for existence of alphabet in:

  1. function parameter,
  2. automaton member variable alphabet, and
  3. construct alphabet from transitions of the automaton (this is optional and probably not wanted - it seems strange to check for universality from symbols on transitions of the checked automaton only - therefore, we could fail in the case that neither 1) or 2) have alphabet specified .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last time we said that automaton would always have an alphabet, so 3 would not apply

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I am not sure if it is possible to implement such an alphabet. It will need to be investigated further.

And 3. would still apply. We would check for universality for only symbols in the automaton. Therefore, it is up to us whether that seems OK or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, alphabet should be optional, if it is not there, use alphabet of the automaton. Otherwise I would throw error.

//actually, what about to use this, with more optional parameters through structures
//struct univ_params
//{
// Alphabet* alphabet = NULL,
// Run* cex = NULL,
//};
// bool is_universal_antichains(const Nfa& aut, univ_params={NULL,NULL});
// and call as
// bool is_universal_antichains(const Nfa& aut);
// bool is_universal_antichains(const Nfa& aut,{.alphabet = bla});
// bool is_universal_antichains(const Nfa& aut,{.cex = bli});
// bool is_universal_antichains(const Nfa& aut,{.alphabet = bla, .cex = bli});

Comment on lines +106 to +118
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only we had named parameters in C++. The feature you describe here is a C++20 feature called designated initializers. The problem here is that you have to create the struct ad-hoc each time you call the function, and you pollute the interface by introducing these structs. You can have only one univ_params in the whole program (within the same namespace). If the struct was declared inside a class, this issue would be solved, but this is not the case here. It depends on whether we are OK with these limitations or not.

Note that this is basically a hack in C++ to create named parameters this way and we even had a problem with designated initializers in Noodler when we upgraded to C++20. In the end, we had to remove them. It should work fine, but for some reason, it failed to compile for us there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failed compilation sounds like som e weird glitch in the matrix, lets pretend that it did not happen? :)

I somehow think it could be nice with the structs and I guess we would not need so many different ones.

What do the others think?

Copy link
Collaborator

@Adda0 Adda0 Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failed compilation sounds like som e weird glitch in the matrix, lets pretend that it did not happen? :)

Yeah, Let us. It will be solved when it comes to that. It should not play a role in our decision-making process, I hope.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we employ the proposed key-value structure instead of yet another data type?

Copy link
Collaborator Author

@kilohsakul kilohsakul Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, maybe a good idea? The time needed to fetch the value should be negligible, right?

Actually, I would mainly like to use the key-value for the things like state-to-state map, state-to-string map used in functions like concatenation, intersection etc. Because those things are really just some additional stuff.
That would eliminate a lot of discussion about optional parameters and how to pass them.

Here, the dispatch functions are a little different, those strings are basic parameters of the function, wouldn't it be weird to pass them within a universal key value store? I don' t have a strong opinion.

Copy link
Collaborator

@Adda0 Adda0 Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am strongly against using a key value store for parameters for operations on the automaton. What would you even do for operations on two automata, each with their own key value store? Using the key value store here would be unreasonable.

Furthermore, if we use the key value store for state-to-state maps (newly StateRenaming), Mata would leak memory. Every such operation would require the caller to deallocate the heap-allocated map after the operation. The key-value store should be used only by the user of Mata as a personal storage for their own data, not data generated by the library. Otherwise, the memory leaks will arise, and costly allocations will have to be made repeatedly. Continuous freeing of allocated memory would also be necessary to prevent allocating too much memory at any time during the run of any program using Mata.

An alternative approach where using key-value store for state renaming would be applicable is to create some sort of global pool of these structures, which will get deallocated when the program ends. Something like that might become a necessity for alphabets (which play a similar role in Mata as data in key-value store memory management-wise). However, the problem of continuous freeing of allocated memory still prevails.

That would still not warrant the use of the key-value store for optional parameters for operations on automata, however. Using those would be cumbersome at best. We need to pass the parameters truly as parameters to the function, in whatever form that may be, in my opinion.

It would be great if we could design some sort of single key-value store type for optional parameters for all functions. They would specify the type of operation the following parameters are for, and the function would look for the values there in case the parameter of a pointer to this specific key-value store type is not null pointer. Something like this:

enum class OperationType {
    Minimize,
    Determinize,
    Trim,
    // ...
}

Nfa& Mata::Nfa::Nfa::minimize(OptionalParameters* optional_parameters = nullptr) {
    OptionalParametersValues* optional_paremeters_values = nullptr;
    if (optional_parameters != nullptr) {
        optional_paremeters_values = optional_parameters[OperationType::Minimize];        
    }
    // ...
    if (optional_parameters_values != nullptr) {
        if (optional_parameters_values->use_reduce) {
             this.reduce();
        }
    }
    return *this;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, we could implement singleton-like configure structure (or structure with class parameters), where one could set configurations up and not use function parameters? The idea is something like following:

struct Configuration {
   string minimization_strategy = "default";
   string determinization_strategy = "default";
   void* cex = nullptr;
   void* alphabet = nullptr;
   ...
}
bool is_universal_antichains(const Nfa& aut);
Configuration.alphabet = bla;
bool is_universal_antichains(const Nfa& aut);
Configuration.cex = bli;
bool is_universal_antichains(const Nfa& aut);
Configuration.alphabet = blabla;
Configuration.cex = blibli;
bool is_universal_antichains(const Nfa& aut);

The benefits are that (1) configuration parameters are at one place only, (2) you can set some parameters for bigger blocks of code and not keep writing lots of parameters, ...
The cons are that (1) you write more code, (2) you might need to apply config to single scope only and then reset it, which could again results to some problems.

//comment missing
//Do we really need this?
Simlib::Util::BinaryRelation compute_relation(
const Nfa& aut,
const StringMap& params = {{"relation", "simulation"}, {"direction", "forward"}});
Expand All @@ -112,6 +132,7 @@ Simlib::Util::BinaryRelation compute_relation(
* @param[out] prod_map Mapping of pairs of the original states (lhs_state, rhs_state) to new product states.
* @return NFA as a product of NFAs @p lhs and @p rhs with ε-transitions preserved.
*/
// all parameters needed/used?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed some time ago that all functions performing some kind of renaming of states should return the renaming map. But it seems to me that we never use these. I think we have no use case for the renaming so far. There might be some in the future, but we do not have one at the moment. Do we want to keep the renaming maps, then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can ask at the mejtyng what the uses can be, and remove it if there is nothing concrete.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe really just putting the one struct of the special type as the optional parameter there, with some arbitrary initial choice of optional parameters as members? That would allow us to change the insides of the struct later without having to modify the interface?

Copy link
Collaborator

@Adda0 Adda0 Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that here we are talking about output parameters. When we pass the parameters, we normally do not want to be able to change the parameter class inside the function (we want to pass it by const reference). Hence, mixing the input, output, and input/output parameters in a single class would be confusing and error-prone, I believe.

But you are right that this would solve the issue with changing interfaces.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as in the above comment, for these maps, I would probably put them to the key-value store

Copy link
Collaborator

@Adda0 Adda0 Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #277 (comment) for what problems I find with this approach for output parameters, as stated further in the linked comment.

Nfa intersection_eps(const Nfa& lhs, const Nfa& rhs, bool preserve_epsilon, const std::set<Symbol>& epsilons,
std::unordered_map<std::pair<State,State>, State> *prod_map = nullptr);

Expand All @@ -127,6 +148,7 @@ Nfa intersection_eps(const Nfa& lhs, const Nfa& rhs, bool preserve_epsilon, cons
* @param[out] rhs_result_states_map Map mapping rhs states to result states.
* @return Concatenated automaton.
*/
//Are both maps needed?
Nfa concatenate_eps(const Nfa& lhs, const Nfa& rhs, const Symbol& epsilon, bool use_epsilon = false,
StateToStateMap* lhs_result_states_map = nullptr, StateToStateMap* rhs_result_states_map = nullptr);

Expand Down
3 changes: 3 additions & 0 deletions include/mata/nfa/builder.hh
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ namespace Mata::Nfa::Builder {
/**
* Create an automaton accepting only a single @p word.
*/
//should we drop the create_ from all the names?
Nfa create_single_word_nfa(const std::vector<Symbol>& word);
Comment on lines +20 to 21
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed it already during the PR introducing these functions and again during creating the NFA builder.

The argument for keeping the name is that if you add using namespace Mata::Nfa::Builder; to your file, you would call the functions only as single_word_nfa(), which might not express that this is a function building the specific NFA all that clearly.

The argument against keeping create_ is that when you call the function with the namespaces included (Nfa::Builder::single_word_nfa()), it still reads nicely and is shorter.

I am fine with either. Nevertheless, I prefer keeping the create_ prefix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the create prefix

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me even just calling single_word_nfa(w) expresses the meaning well. Matter of taste, I like to cut off unnecessary things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the create prefix, but if it is now in Builder namespace, I am ok with dropping them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said before, I would prefer Build:: namespace and no create_, however, for me unity is more important, and if one function does not make sense without create, then I would keep the prefix for all functions.


/**
Expand Down Expand Up @@ -45,10 +46,12 @@ Nfa create_sigma_star_nfa(Alphabet* alphabet = new OnTheFlyAlphabet{});
/** Loads an automaton from Parsed object */
// TODO this function should the same thing as the one taking IntermediateAut or be deleted
Nfa construct(const Mata::Parser::ParsedSection& parsec, Alphabet* alphabet, StringToStateMap* state_map = nullptr);
//construct does not cary much info, what is better? In the comment, don't know what Parsed object is, overall I don't get what this function does.

Comment on lines 48 to 50
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It constructs NFA from . The exact name for the function would be construct_nfa_from(<something>, ...). We need to refactor these construct() functions anyway (as a part of the parser / intermediate representation refactorization PR), but while we are at it, we can rename the resulting function constructing NFA from something to something more appropriate.

But I think the construct is a great name for this type of functions. We can append more to this prefix, but I would keep this prefix. Ideally, I would call the function construct_from(<something>, ...) or construct_nfa_from(<somehing>, ...) which reads nicely at the point of calling the function. This naming helps me to understand the code immensely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think construct is OK, construct_from does not bring any information. One can check easily what ParsedSection is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally like naming conventions of pandas and other Python libraries that would use read_section or from_parsed_section.

Btw. There Is information in from 😉

Copy link
Collaborator Author

@kilohsakul kilohsakul Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Construct is saying that the function is constructing something, but deosn't say what. Kind of useless name in fact. At least construct_nfa, no? But isn't that function actually doing the parsing? In that case it could be parse_nfa? (although it may be weird to parse a parsed section, but that is perhaps another problem)

Then, I don't like that we use those synonyms - make, create, construct, without any clear system. That could be another argument for just dropping them when they are not necessary.

Copy link
Collaborator

@vhavlena vhavlena Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on that. But where is the line? Make is bad, get is good? We should create some system.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasted energy, I am for purging them. Sorry for that I am probably getting into a fundamentalist mode while writing these comments :). Instead of spending energy on creating a system, that will be artificial and so endlessly discussed, not respected, enforced, reexplained, lets concentrate on figuring out each individual name so that it is informative. For me, get create make construct says almost nothing, except that the function is creating and returning some thing, which I probably easily know anyway from the context, from how it is used, from return type.
Allahu Akbar!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to adapt system from other libraries, that would name these kinds of fucntions as from_<FORMAT> and to_<FORMAT>. So this could be Builder::from_parsed_section(), or Builder::nfa_from_parsed_section().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to adapt system from other libraries, that would name these kinds of fucntions as from_ and to_. So this could be Builder::from_parsed_section(), or Builder::nfa_from_parsed_section().

I think this could work for Mata very well, too. We already have OrdVector::to_vector(), so we could have Nfa::Nfa::to_mata() and Nfa::Nfa::from(ParsedSection), etc.

/** Loads an automaton from Parsed object */
Nfa construct(const Mata::IntermediateAut& inter_aut, Alphabet* alphabet, StringToStateMap* state_map = nullptr);

//comment missing
template<class ParsedObject>
Nfa construct(const ParsedObject& parsed, Mata::StringToSymbolMap* symbol_map = nullptr,
StringToStateMap* state_map = nullptr) {
Expand Down
23 changes: 23 additions & 0 deletions include/mata/nfa/delta.hh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,19 @@ namespace Mata::Nfa {
/**
* Structure represents a move which is a symbol and a set of target states of transitions.
*/
//HOW TO RENAME STUFF
//delta/coniferation (a set/sequence of cones)/cone
//choice,
//iconary,
//lexicon,
//choice/fan/fanout/cone/fork
//symbolbrach, sourcebranch
//srcbranch, charbranch
//srcfork,symfork,tgtfork
//srctree,symtree
//srctree,symtree,tgttree
//delta,srcpost,sympost
//AND THE WINNER IS: delta, srcpost, sympost. Huraaa
class Move {
public:
Symbol symbol{};
Expand Down Expand Up @@ -97,6 +110,10 @@ public:
class Delta {
private:
std::vector<Post> posts;
//this internal vector of things is done in a different way in each of the structures, post, delta, move.
//would be nice to make it uniform
//the delta way is the cleanest, no? a private data member
//lets do it that way?

Comment on lines +115 to 117
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, as explained in FluentCpp article on private inheritance. Each approach has its advantages. We include basically all the functions in OrdVector here without overriding any of them. Using private inheritance is the simplest solution where you write the least amount of code.

Using private data members gives you the same amount of control over what is in the interface in the end as private inheritance, but you have to write more boilerplate code.

Nevertheless, working with private data members might seem easier to understand and extend in case we decide the class is not primarily a specialization of the base container, but an independent class composing of the base container and much more. As of now, the classes are more of a specialization than anything.

I do not mind either option. If more of us find private data members easier to work with, let us use them.

Copy link
Collaborator Author

@kilohsakul kilohsakul Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing that some of these classes will be extended in some version of mata, for instance to add some registers to automata.

Even if just inheritance is ok, it is weird to use three different ways for doing it for the three levels of delta, feels like a chaos, no?
Then, if we want to choose one, then we should probably choose the most standard and robust one, which is the member data I think. ? The added boiler plate code wold be small.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added boilerplate code is extensive: about 20 functions for each class with a class member, only calling the underlying member variable methods. But I am OK with the boilerplate in case we choose this route. I agree that it is probably the most standard approach.

Anyone else has any opinions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Post, each line with "using super::f" is replaced by the call of the method f of the member, the same number of lines, but longer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, the using-declaration is more concise, but more importantly, it includes all overloads of the function. Therefore, for each overload of a function in the base class, you need only one using-declaration in the derived class. It is a subtle different, but it comes in handy for class constructors, assignment operators, etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these options offend me

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, two question:
Do we want that to be consistent across delta, post, and move (or their new names)?
If not, then fuck it, but I would be more for having it consistent, otherwise I will keep thinking about it whenever I see it.
If yes, which option? I guess members, since it feels robust and uncomplicated ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go with consistency and members when we are talking about Mata. It is the simplest approach, most robust, easier to modify, even if more verbose, cluttering the code files. But we have a separate file for delta, and we can split it even further if it comes to that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, let's be consistent and use members.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong feelings either way.

public:
inline static const Post empty_post; // When posts[q] is not allocated, then delta[q] returns this.
Expand Down Expand Up @@ -125,6 +142,7 @@ public:
// But it feels fragile, before doing something like that, better think and talk to people.
Post& get_mutable_post(State q);

//Needs explanation. Maybe refactoring, replacing by several more sensible functions
void defragment(const BoolVector& is_staying, const std::vector<State>& renaming);

// Get a constant reference to the post of a state. No side effects.
Expand All @@ -134,6 +152,7 @@ public:

void clear() { posts.clear(); }

//Still fishy, what do we mean by size here? Probably should be just the size of the vector. However, below we call it num_of_states.
void increase_size(size_t n) {
assert(n >= posts.size());
posts.resize(n);
Expand All @@ -143,9 +162,12 @@ public:
* @return Number of states in the whole Delta, including both source and target states.
*/
size_t num_of_states() const { return posts.size(); }
//should it just be size() ?


void add(State state_from, Symbol symbol, State state_to);
void add(const Trans& trans) { add(trans.src, trans.symb, trans.tgt); }
//more add functions below, should be together?
void remove(State src, Symbol symb, State tgt);
void remove(const Trans& trans) { remove(trans.src, trans.symb, trans.tgt); }

Expand Down Expand Up @@ -192,6 +214,7 @@ public:
/**
* Iterator over transitions. It iterates over triples (lhs, symbol, rhs) where lhs and rhs are states.
*/
//So this should become the thing discussed, delta.transitions.begin() ..., with internal transition, * will return a const reference to it
struct const_iterator {
private:
const std::vector<Post>& post;
Expand Down
35 changes: 35 additions & 0 deletions include/mata/nfa/nfa.hh
Original file line number Diff line number Diff line change
Expand Up @@ -106,18 +106,21 @@ public:
/**
* Clear transitions but keep the automata states.
*/
//method of delta?
void clear_transitions();

/**
* Add a new (fresh) state to the automaton.
* @return The newly created state.
*/
//obsolete?
State add_state();

/**
* Add state @p state to @c delta if @p state is not in @c delta yet.
* @return The requested @p state.
*/
//obsolete?
State add_state(State state);

/**
Expand All @@ -126,6 +129,7 @@ public:
* This includes the initial and final states as well as states in the transition relation.
* @return The number of states.
*/
//rename, again, to num_of_states?
size_t size() const;

/**
Expand All @@ -145,6 +149,7 @@ public:
*
* The whole NFA is cleared, each member is set to its zero value.
*/
//used?
void clear();

/**
Expand All @@ -165,6 +170,8 @@ public:
*/
Util::OrdVector<Symbol> get_used_symbols() const;

//this is to be debordelized, the variants were here just to play with data structures
//should be a method of delta
Mata::Util::OrdVector<Symbol> get_used_symbols_vec() const;
std::set<Symbol> get_used_symbols_set() const;
Mata::Util::SparseSet<Symbol> get_used_symbols_sps() const;
Expand Down Expand Up @@ -201,6 +208,7 @@ public:
* @return Set of useful states.
* TODO: with the new get_useful_states, we can delete this probably.
*/
//keep just the new one
StateSet get_useful_states_old() const;

BoolVector get_useful_states() const;
Expand All @@ -215,6 +223,7 @@ public:
* @param[out] state_map Mapping of trimmed states to new states.
* TODO: we can probably keep just trim_reverting, much faster. But the speed difference and how it is achieved is interesting. Keeping as a demonstration for now.
*/
//to be compared on some good data
Copy link
Member

@tfiedor tfiedor Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these function should be called in some tests in tests-performance/src. Hence, in newer PR, there should be some data on their performance (I hope)

void trim_inplace(StateToStateMap* state_map = nullptr);
void trim_reverting(StateToStateMap* state_map = nullptr);
void trim(StateToStateMap* state_map = nullptr) { trim_inplace(state_map); }
Expand Down Expand Up @@ -246,19 +255,22 @@ public:
*
* The operation has constant time complexity.
*/
//remove get_ ?
size_t get_num_of_trans() const { return static_cast<size_t>(std::distance(delta.begin(), delta.end())); }

/**
* Get transitions as a sequence of @c Trans.
* @return Sequence of transitions as @c Trans.
*/
//this will become obsolete, we will have iterators in delta
TransSequence get_trans_as_sequence() const;

/**
* Get transitions from @p state_from as a sequence of @c Trans.
* @param state_from[in] Source state_from of transitions to get.
* @return Sequence of transitions as @c Trans from @p state_from.
*/
//this will become obsolete, we will have iterators in delta
TransSequence get_trans_from_as_sequence(State state_from) const;

/**
Expand All @@ -269,6 +281,7 @@ public:
* @param state_from[in] Source state for transitions to get.
* @return List of transitions leading from @p state_from.
*/
//this will become obsolete, we will have iterators in delta
const Post& get_moves_from(const State state_from) const {
assert(state_from < size());
return delta[state_from];
Expand All @@ -280,13 +293,15 @@ public:
* @return Sequence of @c Trans transitions leading to @p state_to.
* (!slow!, traverses the entire delta)
*/
//this function somehow does not fit with the planned iterator interface, but I don't know what to do
TransSequence get_transitions_to(State state_to) const;

/**
* Unify transitions to create a directed graph with at most a single transition between two states.
* @param[in] abstract_symbol Abstract symbol to use for transitions in digraph.
* @return An automaton representing a directed graph.
*/
//why the get_ everywhere?
Nfa get_one_letter_aut(Symbol abstract_symbol = 'x') const;

Comment on lines +304 to 306
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is an idiom for getter functions. Those are functions that do not perform an operation on the automaton which would change the current state of the class and only return you something either directly stored in the class or computed.

In this case, the name could be also something like construct_one_letter_nfa(). For the other get_ functions in here, I think the get_ prefix is reasonable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, getter should be called get.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get is fine. We should be consistent.

Copy link
Collaborator Author

@kilohsakul kilohsakul Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, lets be consistent, but how.

  • This one for instance creates a new automaton by modifying every transition of the original, a big computation. Should it then be rather create_ or something?
  • Will we also use get_deterministic, get_minimal, etc? Feels somehow ugly.
  • We use those get, make, create, construct without a clear system. It will be hard to agree on a system that makes sense and works in all situations and is liked by everybody. It will keep generating discussions. I think it is much easier to just remove those words whenever no confusion can happen. Here, one_letter_aut is a clear description of the thing returned by the function, it does not contain noise, nothing to wander about ... ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a possible approach, too. It goes against C++ idioms, but other languages implement and encourage @kilohsakul's suggested approach instead. For example, in Rust, you have methods such as Class::from(<something>) and Class::from_something_else(<something_else> (because you do not have function overloading in Rust for safety reasons), not Class::construct_from(<something>. Similarly, you use Class::new(<initial parameters>) and Class::new_special_type_1(<special type one default parameters>) for constructor functions. There is no get_, make_, create_, etc. It is just a different naming convention.

I am personally fine with either. Note that it goes against C++ idioms, but it is acceptable even in C++, I think. As of know, I believe it is just up to our preferences and taste. I do not believe either option brings any significant advantages that would highly overshadow the other approach.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is a matter of taste. I still like my taste better :). My arguments are, shorter, simpler (no need to wander about which of the four words to use), less discussion, less noise.

The "getters" mentioned above, I thought they are little things that access private data members and do not compute anything much, is that right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That depends. What you are talking about is the classical meaning of a getter, as automatically generated for classes in C#, or manually in Python, for example.

The second meaning is anything that only returns something new from the data in the class, without modifying the class data. The smaller the computation that has to be done in the “getter”, the better. But for functions computing something larger, a different prefix might be more appropriate, such as compute_, make_, create_, etc.

The decision you made in another conversation however resolves this by completely removing all the prefixes. Then there is no confusion possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disaggree with just one_letter_aut. It conveys fucking nothing, it is compound noun that should be reserved for variables, structures or classes, not for actions, methods and especially not for member methods. If you want something shorter then you could use as_one_letter_aut or to_one_letter_aut. But method name should convey action not structure (with exceptions, of course; but this is not the case).

Other than that, getters should be getters and hence have get_ to convey they are getters (C++ does not support properties so private variables has to be explicitly accessed by getters and setters). Now, this is not getter, so it shouldn't have get_ prefix but something else (create, as, to, build, convert_to, ...).

Besides, I honestly think, this function could be moved somewhere else (to builders maybe?).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally agree with everything you said. If we move this method to builder as a normal function, its intent should be clear and the name should be easy to agree upon: Nfa one_letter_nfa(const Nfa& nfa) or Nfa one_letter_nfa_from(const Nfa& nfa), for example.

/**
Expand Down Expand Up @@ -338,6 +353,7 @@ public:
void print_to_mata(std::ostream &output) const;

// TODO: Relict from VATA. What to do with inclusion/ universality/ this post function? Revise all of them.
// One usage in universality, should disappear and this function should be removed.
StateSet post(const StateSet& states, const Symbol& symbol) const;

struct const_iterator {
Expand Down Expand Up @@ -374,6 +390,7 @@ public:
* @return Returns reference element of transition list with epsilon transitions or end of transition list when
* there are no epsilon transitions.
*/
//These things should be methods of delta?
Post::const_iterator get_epsilon_transitions(State state, Symbol epsilon = EPSILON) const;

/**
Expand All @@ -383,13 +400,15 @@ public:
* @return Returns reference element of transition list with epsilon transitions or end of transition list when
* there are no epsilon transitions.
*/
//These things should be methods of delta?
static Post::const_iterator get_epsilon_transitions(const Post& post, Symbol epsilon = EPSILON);

/**
* @brief Expand alphabet by symbols from this automaton to given alphabet
*
* The value of the already existing symbols will NOT be overwritten.
*/
//A method of delta as well?
void add_symbols_to(OnTheFlyAlphabet& target_alphabet) const;
}; // struct Nfa.

Expand All @@ -398,9 +417,11 @@ public:
* @param[in] nfa NFA with symbols to fill @p alphabet with.
* @param[out] alphabet Alphabet to be filled with symbols from @p nfa.
*/
//A method of delta as well?
void fill_alphabet(const Mata::Nfa::Nfa& nfa, Mata::OnTheFlyAlphabet& alphabet);

// Adapted from: https://www.fluentcpp.com/2019/01/25/variadic-number-function-parameters-type/.
//Please explain in a comment here what this is.
template<bool...> struct bool_pack{};
/// Checks for all types in the pack.
template<typename... Ts> using conjunction = std::is_same<bool_pack<true,Ts::value...>, bool_pack<Ts::value..., true>>;
Expand All @@ -413,6 +434,7 @@ template<typename... Ts> using AreAllNfas = typename conjunction<std::is_same<Ts
* @param[in] nfas NFAs to create alphabet from.
* @return Created alphabet.
*/
//What is this, what are these types? Is it sane?
template<typename... Nfas, typename = AreAllNfas<Nfas...>>
inline OnTheFlyAlphabet create_alphabet(const Nfas&... nfas) {
Mata::OnTheFlyAlphabet alphabet{};
Expand All @@ -428,6 +450,7 @@ inline OnTheFlyAlphabet create_alphabet(const Nfas&... nfas) {
* @param[in] nfas Vector of NFAs to create alphabet from.
* @return Created alphabet.
*/
//We are to refactor alphabets, right?
OnTheFlyAlphabet create_alphabet(const ConstAutRefSequence& nfas);

/**
Expand All @@ -449,6 +472,7 @@ OnTheFlyAlphabet create_alphabet(const ConstAutPtrSequence& nfas);
* @param[in] nfas Vector of pointers to NFAs to create alphabet from.
* @return Created alphabet.
*/
//Do we need all the variants? If yes, can we have one with iterators, or a template?
OnTheFlyAlphabet create_alphabet(const AutPtrSequence& nfas);

/// Do the automata have disjoint sets of states?
Expand All @@ -460,8 +484,10 @@ bool are_state_disjoint(const Nfa& lhs, const Nfa& rhs);
* @param[out] cex Counter-example path for a case the language is not empty.
* @return True if the language is empty, false otherwise.
*/
//should be a method?
bool is_lang_empty(const Nfa& aut, Run* cex = nullptr);

//could be in-place method
Nfa uni(const Nfa &lhs, const Nfa &rhs);

/**
Expand All @@ -482,6 +508,8 @@ Nfa uni(const Nfa &lhs, const Nfa &rhs);
* @param[out] prod_map Mapping of pairs of the original states (lhs_state, rhs_state) to new product states.
* @return NFA as a product of NFAs @p lhs and @p rhs with ε-transitions preserved.
*/

//preserve_epsilon optional too? Do we need both options?
Nfa intersection(const Nfa& lhs, const Nfa& rhs,
bool preserve_epsilon = false, std::unordered_map<std::pair<State, State>, State> *prod_map = nullptr);

Expand All @@ -497,6 +525,7 @@ Nfa intersection(const Nfa& lhs, const Nfa& rhs,
* @return Concatenated automaton.
*/
// TODO: check how fast is using just concatenate over epsilon and then call remove_epsilon().
// choice of optional parameters using structure?
Nfa concatenate(const Nfa& lhs, const Nfa& rhs, bool use_epsilon = false,
StateToStateMap* lhs_result_states_map = nullptr, StateToStateMap* rhs_result_states_map = nullptr);

Comment on lines +528 to 531
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean structure and using designed initializers? Refer to my comment about designated initializers elsewhere. I am not sure if it would be useful here all that much. Why would you want to use designated initializers here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are three, I might want to specify arbitrary one of the three? Though use_epsilon can probably be removed, so just two then. Still could be more readable with the struct.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is still the same: You would have to have a special parameter struct only for this single function (no other function uses lhs_result_state_renaming and rhs_state_renaming maps). Similarly for other functions with specific parameters. You will have multitude of such structs: ConcatenateParameters, IntersectionParameters, TrimParameters, etc.

What can be done now is a review of the current code and creating a draft of how many of such structs there might be and how much we can reuse them. Then we can better decide what the disadvantages/advantages of either approach might be.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, so many different structs would be ugly. I was hoping that many functions can share the same thing. This one from concatenation can probably be reused for union, maybe even intersection, other binary functions.

Btw, the naming style - what about concatenation, after the returned thing? Matter of taste of course, do you like it or not?

Copy link
Collaborator

@Adda0 Adda0 Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can either be consistent and have both the function and method share the same name, or use verbs for methods (or not, if we remove the prefixes: we would call the method Nfa::Nfa::concatenated_with(const Nfa& second_nfa) in this case) and use names of the operation (as nouns) for the functions. I am fine with either option and used to both. I like verbs for functions better, but that is just a matter of taste.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the _e(d)_with(). It is elegant.

Expand All @@ -513,6 +542,8 @@ Nfa concatenate(const Nfa& lhs, const Nfa& rhs, bool use_epsilon = false,
* @param[in] sink_state The state into which new transitions are added.
* @return True if some new transition was added to the automaton.
*/
//method complete
//alphabet optional, sink stata too?
bool make_complete(Nfa& aut, const Alphabet& alphabet, State sink_state);
Comment on lines +545 to 547
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with all three suggestions.


/**
Expand All @@ -532,6 +563,7 @@ bool make_complete(Nfa& aut, const Alphabet& alphabet, State sink_state);
* @param[in] sink_state The state into which new transitions are added.
* @return True if some new transition was added to the automaton.
*/
//eventually only one variant, the version with the optional alphabet (some enum alphabet type) ?
bool make_complete(Nfa& aut, const Util::OrdVector<Symbol>& symbols, State sink_state);

Comment on lines +566 to 568
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version was created because we had a set of symbols already and did not want to have to create an alphabet from it ad-hoc. But it might be a reasonable change in this case. We have EnumAlphabet serving exactly this use case already.

/**
Expand All @@ -555,6 +587,7 @@ inline bool make_complete(Nfa& aut, const Alphabet& alphabet) { return make_comp
* - "minimize": "true"/"false" (whether to compute minimal deterministic automaton for classical algorithm);
* @return Complemented automaton.
*/
//method
Nfa complement(const Nfa& aut, const Alphabet& alphabet,
const StringMap& params = {{"algorithm", "classical"}, {"minimize", "false"}});

Comment on lines +590 to 593
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, if it also does determinization, then it is not in place, so it should be a function? Have a method that only complements the acceptance, complement_acceptance/complement_final or something, and a function complement that also determinizes and returns, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds even better to me.

Expand All @@ -573,6 +606,7 @@ Nfa complement(const Nfa& aut, const Alphabet& alphabet,
* - "minimize": "true"/"false" (whether to compute minimal deterministic automaton for classical algorithm);
* @return Complemented automaton.
*/
//Do we need this? How many algorithms do we have? If yes, maybe these syntactic sugar functions could really have their own file.
Nfa complement(const Nfa& aut, const Util::OrdVector<Symbol>& symbols,
const StringMap& params = {{"algorithm", "classical"}, {"minimize", "false"}});

Expand Down Expand Up @@ -741,6 +775,7 @@ Run encode_word(const StringToSymbolMap& symbol_map, const std::vector<std::stri
} // namespace Mata::Nfa.

namespace std {
//some comments needed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a standard boilerplate code allowing you to compute hash of a transition. We can add a comment explaining that the hash is computed as a combination of hashes from source state, symbol and target state.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please

template <>
struct hash<Mata::Nfa::Trans> {
inline size_t operator()(const Mata::Nfa::Trans& trans) const {
Expand Down
Loading