Can we estimate the distinct paths given a group of nodes and connections

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












If we have a system of 30,000 nodes and 126,000 connections, where there is not necessarily a path from any given node to another, are we able to get an estimate for the number of distinct paths (or indirect connections) in the system?



To give a small example, say we have the following nodes and connections:



a <-> b



b <-> c



x <-> y



This gives us the following paths (total = 8):



a-b, a-c, b-a, b-c, c-b, c-a, x-y, y-x



Given a system of 30k nodes and 126k connections, can we estimate the total "paths". I assume that the actual number will depend highly on how interconnected all the nodes are but is there way to calculate based on an even distribution what the total number might be?



I am not a mathematician, so I'm sorry if my terminology is wildly off, but hopefully I have put across what I am looking for.







share|cite|improve this question















  • 1




    interesting problem. Are your connections always symmetric (i.e $a to b implies b to a$)?
    – gt6989b
    Jul 17 at 15:21






  • 1




    @gt6989b Yes, connections are all 2-way
    – Jacob Regan
    Jul 17 at 15:22






  • 1




    Some thoughts to ponder. If you know the number and size of connected components (in your example, $3$ and $2$) then you can compute directly. To model this, consider a random graph on $n$ vertices and independently distribute edges with probability $p$. Then, there are an expected number of $m = pn(n-1)/2$ edges, so you can work out $p$ if you know $m$ and $n$. You can look at limiting theorems in random graphs to figure out the number/expected distribution of connected components
    – gt6989b
    Jul 17 at 15:25










  • Terminology notes: paths usually refer to sequences of nodes that each have a connection, e.g. (a,b,c) would be a path. What you are asking for would usually be referred to as something like "pairs of connected nodes".
    – user145640
    Jul 17 at 15:27










  • As gt6989b noted, once you can figure out the connected components (see en.wikipedia.org/wiki/Connected_component_%28graph_theory%29) of your network, you can calculate that number exactly. Depending one what your network consists of, using a random graph may yield incorrect results. Is there a way to find out automatically and within a reasonable time (<1s), to what other nodes a given node is directly connected? If yes, it should be relatively easy to find the connected components.
    – Ingix
    Jul 18 at 13:15














up vote
1
down vote

favorite












If we have a system of 30,000 nodes and 126,000 connections, where there is not necessarily a path from any given node to another, are we able to get an estimate for the number of distinct paths (or indirect connections) in the system?



To give a small example, say we have the following nodes and connections:



a <-> b



b <-> c



x <-> y



This gives us the following paths (total = 8):



a-b, a-c, b-a, b-c, c-b, c-a, x-y, y-x



Given a system of 30k nodes and 126k connections, can we estimate the total "paths". I assume that the actual number will depend highly on how interconnected all the nodes are but is there way to calculate based on an even distribution what the total number might be?



I am not a mathematician, so I'm sorry if my terminology is wildly off, but hopefully I have put across what I am looking for.







share|cite|improve this question















  • 1




    interesting problem. Are your connections always symmetric (i.e $a to b implies b to a$)?
    – gt6989b
    Jul 17 at 15:21






  • 1




    @gt6989b Yes, connections are all 2-way
    – Jacob Regan
    Jul 17 at 15:22






  • 1




    Some thoughts to ponder. If you know the number and size of connected components (in your example, $3$ and $2$) then you can compute directly. To model this, consider a random graph on $n$ vertices and independently distribute edges with probability $p$. Then, there are an expected number of $m = pn(n-1)/2$ edges, so you can work out $p$ if you know $m$ and $n$. You can look at limiting theorems in random graphs to figure out the number/expected distribution of connected components
    – gt6989b
    Jul 17 at 15:25










  • Terminology notes: paths usually refer to sequences of nodes that each have a connection, e.g. (a,b,c) would be a path. What you are asking for would usually be referred to as something like "pairs of connected nodes".
    – user145640
    Jul 17 at 15:27










  • As gt6989b noted, once you can figure out the connected components (see en.wikipedia.org/wiki/Connected_component_%28graph_theory%29) of your network, you can calculate that number exactly. Depending one what your network consists of, using a random graph may yield incorrect results. Is there a way to find out automatically and within a reasonable time (<1s), to what other nodes a given node is directly connected? If yes, it should be relatively easy to find the connected components.
    – Ingix
    Jul 18 at 13:15












up vote
1
down vote

favorite









up vote
1
down vote

favorite











If we have a system of 30,000 nodes and 126,000 connections, where there is not necessarily a path from any given node to another, are we able to get an estimate for the number of distinct paths (or indirect connections) in the system?



To give a small example, say we have the following nodes and connections:



a <-> b



b <-> c



x <-> y



This gives us the following paths (total = 8):



a-b, a-c, b-a, b-c, c-b, c-a, x-y, y-x



Given a system of 30k nodes and 126k connections, can we estimate the total "paths". I assume that the actual number will depend highly on how interconnected all the nodes are but is there way to calculate based on an even distribution what the total number might be?



I am not a mathematician, so I'm sorry if my terminology is wildly off, but hopefully I have put across what I am looking for.







share|cite|improve this question











If we have a system of 30,000 nodes and 126,000 connections, where there is not necessarily a path from any given node to another, are we able to get an estimate for the number of distinct paths (or indirect connections) in the system?



To give a small example, say we have the following nodes and connections:



a <-> b



b <-> c



x <-> y



This gives us the following paths (total = 8):



a-b, a-c, b-a, b-c, c-b, c-a, x-y, y-x



Given a system of 30k nodes and 126k connections, can we estimate the total "paths". I assume that the actual number will depend highly on how interconnected all the nodes are but is there way to calculate based on an even distribution what the total number might be?



I am not a mathematician, so I'm sorry if my terminology is wildly off, but hopefully I have put across what I am looking for.









share|cite|improve this question










share|cite|improve this question




share|cite|improve this question









asked Jul 17 at 15:17









Jacob Regan

412




412







  • 1




    interesting problem. Are your connections always symmetric (i.e $a to b implies b to a$)?
    – gt6989b
    Jul 17 at 15:21






  • 1




    @gt6989b Yes, connections are all 2-way
    – Jacob Regan
    Jul 17 at 15:22






  • 1




    Some thoughts to ponder. If you know the number and size of connected components (in your example, $3$ and $2$) then you can compute directly. To model this, consider a random graph on $n$ vertices and independently distribute edges with probability $p$. Then, there are an expected number of $m = pn(n-1)/2$ edges, so you can work out $p$ if you know $m$ and $n$. You can look at limiting theorems in random graphs to figure out the number/expected distribution of connected components
    – gt6989b
    Jul 17 at 15:25










  • Terminology notes: paths usually refer to sequences of nodes that each have a connection, e.g. (a,b,c) would be a path. What you are asking for would usually be referred to as something like "pairs of connected nodes".
    – user145640
    Jul 17 at 15:27










  • As gt6989b noted, once you can figure out the connected components (see en.wikipedia.org/wiki/Connected_component_%28graph_theory%29) of your network, you can calculate that number exactly. Depending one what your network consists of, using a random graph may yield incorrect results. Is there a way to find out automatically and within a reasonable time (<1s), to what other nodes a given node is directly connected? If yes, it should be relatively easy to find the connected components.
    – Ingix
    Jul 18 at 13:15












  • 1




    interesting problem. Are your connections always symmetric (i.e $a to b implies b to a$)?
    – gt6989b
    Jul 17 at 15:21






  • 1




    @gt6989b Yes, connections are all 2-way
    – Jacob Regan
    Jul 17 at 15:22






  • 1




    Some thoughts to ponder. If you know the number and size of connected components (in your example, $3$ and $2$) then you can compute directly. To model this, consider a random graph on $n$ vertices and independently distribute edges with probability $p$. Then, there are an expected number of $m = pn(n-1)/2$ edges, so you can work out $p$ if you know $m$ and $n$. You can look at limiting theorems in random graphs to figure out the number/expected distribution of connected components
    – gt6989b
    Jul 17 at 15:25










  • Terminology notes: paths usually refer to sequences of nodes that each have a connection, e.g. (a,b,c) would be a path. What you are asking for would usually be referred to as something like "pairs of connected nodes".
    – user145640
    Jul 17 at 15:27










  • As gt6989b noted, once you can figure out the connected components (see en.wikipedia.org/wiki/Connected_component_%28graph_theory%29) of your network, you can calculate that number exactly. Depending one what your network consists of, using a random graph may yield incorrect results. Is there a way to find out automatically and within a reasonable time (<1s), to what other nodes a given node is directly connected? If yes, it should be relatively easy to find the connected components.
    – Ingix
    Jul 18 at 13:15







1




1




interesting problem. Are your connections always symmetric (i.e $a to b implies b to a$)?
– gt6989b
Jul 17 at 15:21




interesting problem. Are your connections always symmetric (i.e $a to b implies b to a$)?
– gt6989b
Jul 17 at 15:21




1




1




@gt6989b Yes, connections are all 2-way
– Jacob Regan
Jul 17 at 15:22




@gt6989b Yes, connections are all 2-way
– Jacob Regan
Jul 17 at 15:22




1




1




Some thoughts to ponder. If you know the number and size of connected components (in your example, $3$ and $2$) then you can compute directly. To model this, consider a random graph on $n$ vertices and independently distribute edges with probability $p$. Then, there are an expected number of $m = pn(n-1)/2$ edges, so you can work out $p$ if you know $m$ and $n$. You can look at limiting theorems in random graphs to figure out the number/expected distribution of connected components
– gt6989b
Jul 17 at 15:25




Some thoughts to ponder. If you know the number and size of connected components (in your example, $3$ and $2$) then you can compute directly. To model this, consider a random graph on $n$ vertices and independently distribute edges with probability $p$. Then, there are an expected number of $m = pn(n-1)/2$ edges, so you can work out $p$ if you know $m$ and $n$. You can look at limiting theorems in random graphs to figure out the number/expected distribution of connected components
– gt6989b
Jul 17 at 15:25












Terminology notes: paths usually refer to sequences of nodes that each have a connection, e.g. (a,b,c) would be a path. What you are asking for would usually be referred to as something like "pairs of connected nodes".
– user145640
Jul 17 at 15:27




Terminology notes: paths usually refer to sequences of nodes that each have a connection, e.g. (a,b,c) would be a path. What you are asking for would usually be referred to as something like "pairs of connected nodes".
– user145640
Jul 17 at 15:27












As gt6989b noted, once you can figure out the connected components (see en.wikipedia.org/wiki/Connected_component_%28graph_theory%29) of your network, you can calculate that number exactly. Depending one what your network consists of, using a random graph may yield incorrect results. Is there a way to find out automatically and within a reasonable time (<1s), to what other nodes a given node is directly connected? If yes, it should be relatively easy to find the connected components.
– Ingix
Jul 18 at 13:15




As gt6989b noted, once you can figure out the connected components (see en.wikipedia.org/wiki/Connected_component_%28graph_theory%29) of your network, you can calculate that number exactly. Depending one what your network consists of, using a random graph may yield incorrect results. Is there a way to find out automatically and within a reasonable time (<1s), to what other nodes a given node is directly connected? If yes, it should be relatively easy to find the connected components.
– Ingix
Jul 18 at 13:15










1 Answer
1






active

oldest

votes

















up vote
0
down vote













I'm basing this on Jacob's rough explanation on the background of the question in the comments.



I'll describe what I'd to if this was a one-time effort (or only needed to be done infrequently like once a year).



First you need to create a list of all your nodes. Then you need to apply a graph traversal algorithm (https://en.wikipedia.org/wiki/Graph_traversal). I don't think you need anything more fancy than breath-first-search oder depths-first-search and I don't think it matters in this case which one you take. The only thing that needs to be done is to use data structures that make it easy to check if a given node is contained inside a possibly large 'list' of nodes (so use hash sets or something similar).



Starting with any node, this finds all nodes that are (directly or indirectly) connected to it. The list of nodes thus found (including the starting node) form your first connected component $C_1$.



If there are nodes not in $C_1$, take one of them as starting node and do it all again, producing $C_2$. Continue until your components contain all nodes.



In your database, add a field "Component" and make it 1 for all nodes in $C_1$, 2 for all nodes in $C_2$, etc. Then when you later want to check if 2 nodes are connected, just get their Component values and compare them: If they are the same, they are connected, otherwise they are not connected.






share|cite|improve this answer





















    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "69"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2854604%2fcan-we-estimate-the-distinct-paths-given-a-group-of-nodes-and-connections%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    I'm basing this on Jacob's rough explanation on the background of the question in the comments.



    I'll describe what I'd to if this was a one-time effort (or only needed to be done infrequently like once a year).



    First you need to create a list of all your nodes. Then you need to apply a graph traversal algorithm (https://en.wikipedia.org/wiki/Graph_traversal). I don't think you need anything more fancy than breath-first-search oder depths-first-search and I don't think it matters in this case which one you take. The only thing that needs to be done is to use data structures that make it easy to check if a given node is contained inside a possibly large 'list' of nodes (so use hash sets or something similar).



    Starting with any node, this finds all nodes that are (directly or indirectly) connected to it. The list of nodes thus found (including the starting node) form your first connected component $C_1$.



    If there are nodes not in $C_1$, take one of them as starting node and do it all again, producing $C_2$. Continue until your components contain all nodes.



    In your database, add a field "Component" and make it 1 for all nodes in $C_1$, 2 for all nodes in $C_2$, etc. Then when you later want to check if 2 nodes are connected, just get their Component values and compare them: If they are the same, they are connected, otherwise they are not connected.






    share|cite|improve this answer

























      up vote
      0
      down vote













      I'm basing this on Jacob's rough explanation on the background of the question in the comments.



      I'll describe what I'd to if this was a one-time effort (or only needed to be done infrequently like once a year).



      First you need to create a list of all your nodes. Then you need to apply a graph traversal algorithm (https://en.wikipedia.org/wiki/Graph_traversal). I don't think you need anything more fancy than breath-first-search oder depths-first-search and I don't think it matters in this case which one you take. The only thing that needs to be done is to use data structures that make it easy to check if a given node is contained inside a possibly large 'list' of nodes (so use hash sets or something similar).



      Starting with any node, this finds all nodes that are (directly or indirectly) connected to it. The list of nodes thus found (including the starting node) form your first connected component $C_1$.



      If there are nodes not in $C_1$, take one of them as starting node and do it all again, producing $C_2$. Continue until your components contain all nodes.



      In your database, add a field "Component" and make it 1 for all nodes in $C_1$, 2 for all nodes in $C_2$, etc. Then when you later want to check if 2 nodes are connected, just get their Component values and compare them: If they are the same, they are connected, otherwise they are not connected.






      share|cite|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        I'm basing this on Jacob's rough explanation on the background of the question in the comments.



        I'll describe what I'd to if this was a one-time effort (or only needed to be done infrequently like once a year).



        First you need to create a list of all your nodes. Then you need to apply a graph traversal algorithm (https://en.wikipedia.org/wiki/Graph_traversal). I don't think you need anything more fancy than breath-first-search oder depths-first-search and I don't think it matters in this case which one you take. The only thing that needs to be done is to use data structures that make it easy to check if a given node is contained inside a possibly large 'list' of nodes (so use hash sets or something similar).



        Starting with any node, this finds all nodes that are (directly or indirectly) connected to it. The list of nodes thus found (including the starting node) form your first connected component $C_1$.



        If there are nodes not in $C_1$, take one of them as starting node and do it all again, producing $C_2$. Continue until your components contain all nodes.



        In your database, add a field "Component" and make it 1 for all nodes in $C_1$, 2 for all nodes in $C_2$, etc. Then when you later want to check if 2 nodes are connected, just get their Component values and compare them: If they are the same, they are connected, otherwise they are not connected.






        share|cite|improve this answer













        I'm basing this on Jacob's rough explanation on the background of the question in the comments.



        I'll describe what I'd to if this was a one-time effort (or only needed to be done infrequently like once a year).



        First you need to create a list of all your nodes. Then you need to apply a graph traversal algorithm (https://en.wikipedia.org/wiki/Graph_traversal). I don't think you need anything more fancy than breath-first-search oder depths-first-search and I don't think it matters in this case which one you take. The only thing that needs to be done is to use data structures that make it easy to check if a given node is contained inside a possibly large 'list' of nodes (so use hash sets or something similar).



        Starting with any node, this finds all nodes that are (directly or indirectly) connected to it. The list of nodes thus found (including the starting node) form your first connected component $C_1$.



        If there are nodes not in $C_1$, take one of them as starting node and do it all again, producing $C_2$. Continue until your components contain all nodes.



        In your database, add a field "Component" and make it 1 for all nodes in $C_1$, 2 for all nodes in $C_2$, etc. Then when you later want to check if 2 nodes are connected, just get their Component values and compare them: If they are the same, they are connected, otherwise they are not connected.







        share|cite|improve this answer













        share|cite|improve this answer



        share|cite|improve this answer











        answered Jul 18 at 14:53









        Ingix

        2,205125




        2,205125






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2854604%2fcan-we-estimate-the-distinct-paths-given-a-group-of-nodes-and-connections%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            What is the equation of a 3D cone with generalised tilt?

            Color the edges and diagonals of a regular polygon

            Relationship between determinant of matrix and determinant of adjoint?