"Does the Collatz sequence eventually reach 1 for all positive integer initial values?"
This simple formula is embedding a whole universe.
Generate the Universe
To generate the data for the universe, I created a quick program that creates two kind of files :
graphml file so the graph can be imported by drawing/analysis tools like yEd or Gephi
NODES.csv which contains all the nodes of the graph, and RELATIONS.csv that contain all relations between nodes
The program takes two arguments : the lower (-l) and upper (-u) bounds of the seeds. Then, for each seed, compute nodes and relationships around them.
Load the graph
If it's a graph, why not playing with it through Neo4J, the graph database ?
Therefore, simply :
// Load nodesLOADCSVWITHHEADERSFROM'file:///NODES.csv'ASlineCREATE(:Number{id:line.id});MATCH(n:Number)RETURNn;// Add unicity constraint, index nodesCREATECONSTRAINTuniqueNumberIFNOTEXISTSON(n:Number)ASSERTn.idISUNIQUE;// Link nodes with edges// Increment weight between node each time// they get a new connectionLOADCSVWITHHEADERSFROM"file:///RELATIONS.csv"ASrowMATCH(i:Number),(j:Number)WHEREi.id=row.sourceANDj.id=row.targetMERGE(i)-[r:LINKED_TO]->(j)ONCREATESETr.weight=1ONMATCHSETr.weight=r.weight+1RETURNr;
The data is in.
Notice that the more nodes are connected, the heavier is the link between them, thanks to the weight attribute of the LINKED_TO relation.
We can play with it, first, check is we visually retrieve the infinite loop around 4, 2 and 1 nodes:
Also, let's take a look at the whole picture while loading the graphml in yEd :
The two tools provide different perspectives. The second one puts in evidence the presence of islands (or cities with roads), not connected to others.
The first one makes it easier to make more analytics queries to explore this universe, thanks to the dedicated GraphDataScience library :
Detect subgraphs (aka. islands or communities)
CALLgds.graph.create('Collatz','Number','LINKED_TO',{relationshipProperties:'weight'});// Count subgraphsCALLgds.wcc.stats('Collatz')YIELDcomponentCount;// Tag componentsCALLgds.wcc.mutate('Collatz',{mutateProperty:'componentId'})YIELDnodePropertiesWritten,componentCount;// Put compoentId on each node to flag them// as part of the same componentCALLgds.wcc.write('Collatz',{writeProperty:'componentId'})YIELDnodePropertiesWritten,componentCount;
Then each node has a new property called componentId that tells to which subraph each node belongs to.
How are positive nodes organized ?
// Different componentId from positive nodesMATCH(n:Number)wheretoInteger(n.id)>=0RETURNDISTINCTn.componentId,count(*);
Will return :
n.componentId count(*)
2 2228
Well, all positive numbers are part of a same huge subgraph.
This time there are many subgraphs, the one we were previously seing as islands.
Local clustering coefficient
The Local Clustering Coefficient algorithm computes the local clustering coefficient for each node in the graph. The local clustering coefficient Cn of a node n describes the likelihood that the neighbours of n are also connected.
The PageRank algorithm measures the importance of each node within the graph, based on the number incoming relationships and the importance of the corresponding source nodes.
can be used to find popular nodes within a graph. Degree centrality measures the number of incoming or outgoing (or both) relationships from a node, depending on the orientation of a relationship projection.
Notice that the "positive" island is by far the biggest one,... at least for this dataset.
Explore loops
Finally, one of the funniest thing in the Collatz dataset : the cycles, ... or loops.
When you manually compute them on a paperboard, this is a really cool experience !
This repo has been created to generate ready to use data files around the Collatz Conjecture. As I
wanted to learn more about these patterns, I needed to generate some custom files.
Finally I quickly developed a first draft, a generator that makes the job at least for my first experimentations.
👉Usage
To play with these files, you have two options :
use ready to use samples
generate your owns
📂Ready to use samples
Ready to use samples are stored in samples directory. The directorories are organized
as follow : ${lower-bound}_${upper_bound}.
So, the -1000_1000 directory contains sample where seeds are between -1000 and 1000.
🚀Generate your own samples
The code is organized as a JBang script, so all you have to do to install :