Saving a Twitter stream to MongoDB using C#

In my earlier post I built an example of saving a public Twitter stream into RavenDB. Working with RavenDB has piqued my interest with NoSQL databases so in this post I swap out RavenDB and instead use MongoDB to save some of the tweets that appear in the Twitter Public stream

This example was built using Visual Studio Community 2015 and Mongodb 3.4.6

Start MongoDB (on Windows this is mongod.exe ) then start up an Mongo interactive shell (mongo.exe on Windows).  There is no need to create new database, it will be created by the first insert (As a database developer this took some getting use to!)

Start Visual Studio and create a new Console Application. Using NuGet add Tweetini and MongoDBDriver and then type or copy and paste the following, ensuring that you enter your Twitter API credentials on line 21:


using System;
using Tweetinvi;
using MongoDB.Bson;
using MongoDB.Driver;

namespace SavingTwitterStreamToMongo
{
  class Program
  {
    static void Main(string[] args)
    {
      var connectionString = "mongodb://localhost:27017";

      var client = new MongoClient(connectionString);

      IMongoDatabase db = client.GetDatabase("twitterstream");

      IMongoCollection collection = db.GetCollection("tweets");

      // You need to enter your twitter credentials here
      Auth.SetUserCredentials("", "", "", "");

      var stream = Stream.CreateSampleStream();

      stream.TweetReceived += (sender, theTweet) =>
      {
        Console.WriteLine(theTweet.Tweet.FullText);

        var document = new BsonDocument
        {
          { "the_tweet", theTweet.Tweet.FullText }
        };

       collection.InsertOneAsync(document);
     };

     stream.StartStream();

     }
   }
 }

This code is the same as shown in this post the only changes are that the RavenDB constructs have been replaced by those required to save the Twitter steam to MongoDB.

Lines 12 and 14 sets up and connects to MongoDB

Line 16 sets the MongoDB database that will be used, in this example it is called twitterstream. (As mentioned earlier in the post, if this database does not exist it will be created on the first insert)

Line 18 shows shows that the tweets are going to be saved inside a collection called tweets.

Line 29 a BSON document is created containing the tweet.

Line 34 saves this document to the database.

Once this code is running, you can switch to the MongoDB interactive shell and review what is being saved.

mongo1-Copy

Running the command show dbs you will see the twitterstream database now exists.

mongo2-Copy

Switch to the twitterstream database by typing: use twitterstream

mongo3-Copy

Execute the command db.tweets.findOne() and you will see a saved Tweet.

Saving a Twitter stream to RavenDB database using C#

In an earlier post  I explained how you can use C# to access a Twitter stream. In this post I will show you how to save the tweets from a Twitter stream to RavenDB.

The goal of this post is not to perform a deep dive into NOSQL databases or the Tweetinvi API. Instead its to get you up and running with the minimum of ceremony so you can start conducting your own experiments.

Raven DB is an open source NOSQL database for.NET which as my first experience of a NOSQL database I have found relatively straightforward to start experimenting with.

You can download RavenDB from here.  At the time of writing the stable release was 3.5.3 and I chose to use the installer which then proceeded to install RavenDB via the familiar wizard installation process.

RavenDB1Once installed you should have a folder structure similar to this:

RavenDB2

If, like me you are new the world of NoSQL databases it is worth working your way through the Fundamentals tutorial. I found this an excellent introduction which I highly recommend.

To start RavenDB double click on the Start.cmd batch file in the root of the RavenDB directory. You should shortly see a new command window and a new tab of your default browser showing what databases you have. (which will be empty for the first time launch)

With RavenDB installed and running we can now start Visual Studio and create a new console application. I’ve called mine TrendingOnTwitterNoSQL

Using NuGet, add the following packages:

TweetinviAPI

4a

RavenDB.Client

RavenDBClient

Navigate to Program.cs and add the following using statements:

using System;

using Raven.Client.Document;

using Tweetinvi;

Within the Main method add the following:

Auth.SetUserCredentials("CONSUMER_KEY", "CONSUMER_SECRET", "ACCESS_TOKEN", "ACCESS_TOKEN_SECRET");

Replace COMSUMER_KEY etc. with your Twitter API credentials. If you don’t yet have them. You can obtain them by going here and following the instructions.

Now add the following two lines:

  var stream = Stream.CreateFilteredStream();
  stream.AddTrack("CanadianGP");

The first line creates a filtered Twitter stream. A Twitter stream gives you the developer access to live information on Twitter. There are a number of different streams available. In this post we will be using one that returns information about a trending topic. More information about Twitter streams can be found in the Twitter docs and the TweetInvi docs.

At the time of writing, the Canadian Grand Prix was trending on Twitter which you can see in the second line.

The next step is to create a new class which will manage the  RavenDB document store.  Here is the complete code.


using System;
using Raven.Client;
using Raven.Client.Document; 

namespace TrendingOnTwitterNoSQL
{
  class DocumentStoreHolder
    {
      private static readonly Lazy<IDocumentStore> LazyStore =
          new Lazy<IDocumentStore>(() =>
          {
            var store = new DocumentStore
            {
              Url = "http://localhost:8080",
              DefaultDatabase = "CanadianGP"
            };
            return store.Initialize();
           }); 

    public static IDocumentStore Store => LazyStore.Value;
  }
}

In the context of RavenDB, the Document Store holds the RavenDB URL, the default database etc. More information can be found about the Document Store in the tutorial.

According to the documentation for typical applications you normally need one document store hence the reason why the DocumentStoreHolder class is a Singleton.

The important thing to note in this class is the database URL and the name of the Default Database, CanadianGP. This is the name of the database that will store Tweets about the CanadianGP.

Returning to Program.cs add the following underneath stream.AddTrack to obtain a new document store:

  var documentStore = DocumentStoreHolder.Store;

The final class that needs to be created is called TwitterModel and is shown below


namespace TrendingOnTwitterNoSQL
{
  class TwitterModel
  {
    public long Id { get; set; }
    public string Tweet { get; set; }
  }
}

This class is will be used to save the Tweet information that the program is interested in, the Twitter ID and the Tweet.  The is a lot of other information that is available, but for the sake of brevity this example is only interested in the id and the tweet.

With this class created the final part of the code is shown below


using (BulkInsertOperation bulkInsert = documentStore.BulkInsert())
{
  stream.MatchingTweetReceived += (sender, theTweet) =>
  {
    Console.WriteLine(theTweet.Tweet.FullText);
    var tm = new TwitterModel
    {
      Id = theTweet.Tweet.Id,
      Tweet = theTweet.Tweet.FullText
    };

    bulkInsert.Store(tm);
  };
stream.StartStreamMatchingAllConditions();
}

As the tweets will be arriving in clusters, the RavenDB BulkInsert method is used. You can see this at line 1.

Once a matching Tweet is found, line 3, it is output to the console. Next a new TwitterModel object is created and its fields are assigned the Tweet Id and the Tweet Text. This object is then saved to the database.

The complete Program.cs should now look like:


using System;
using Raven.Client.Document;
using Tweetinvi;

namespace TrendingOnTwitterNoSQL
{
  class Program
  {
    static void Main(string[] args)
    {
      Auth.SetUserCredentials("CONSUMER_KEY", "CONSUMER_SECRET", "ACCESS_TOKEN", "ACCESS_TOKEN_SECRET");

      var stream = Stream.CreateFilteredStream();
      stream.AddTrack("CanadianGP");

      var documentStore = DocumentStoreHolder.Store;

      using (BulkInsertOperation bulkInsert = documentStore.BulkInsert())
      {
        stream.MatchingTweetReceived += (sender, theTweet) =>
        {
          Console.WriteLine(theTweet.Tweet.FullText);

          var tm = new TwitterModel
          {
            Id = theTweet.Tweet.Id,
            Tweet = theTweet.Tweet.FullText
          };

          bulkInsert.Store(tm);

       };
       stream.StartStreamMatchingAllConditions();
     }
   }
 }
}

After running this program for a short while you will have a number of Tweets saved. To view them, switch back to your browser, if not already on the RavenDB page navigate to http://localhost:8080 and click on the database that you created.

CanadianGPdb

Selecting the relevant database you will then see the tweets.

CanadianGPTweets

Summary

In this post I have detailed the steps required to save a Twitter Stream of a topic of interest to a RavenDB.

A complete example is available on github

Acknowledgements

The genesis of this post came from the generous answers given to my question on StackOverflow.

Boxing and Unboxing in C#

This post is an aide-memoire as I learn more about boxing and unboxing in C# and is based upon this part of the C# docs.

Boxing

Is the process of converting a value type (such as int or bool) to the type Object or to any interface type implemented by this value type. When the CLR boxes a value type, it wraps the value inside a System.Object and stores it in the managed heap.

Boxing is implicit.

int i = 10; 
// this line boxes i
object o = i;

Although it is possible to perform the boxing explicitly it is not required.

int i = 10;
// explicit boxing
object o = (object)i;

Unboxing

Extracts the value type from the object.

Unboxing is explicit.

int i = 10; 
// boxes i
object o = i;
// unboxes the object to the int value type named j
int j = (int)o;

Performance

Both boxing and Unboxing are computationally expensive operations.

Streaming Twitter with C#

In this article I will walk through the steps required to create a C# console application that prints a Twitter stream to the console using TweetInvi library

The example was built using Visual Studio 2015 Community Edition and .NET Framework 4.6.

Step 1

Start Visual Studio and create a new console application, I’ve called mine TwitterPublicStream.

20170501_1

Step 2

Right click on the project in the solution explorer window (In the example below this is TwitterPublicStream) and select Manage Nuget Packages

2a

 Step 3

Search for tweetinvi and once found, install it, accepting the various licences, if you are happy to do so.

3a

 Step 4

In order to use the Twitter API’s, you first need to obtain some credentials. To do this you now need to visit the Twitter API home page and follow the instructions.

Step 5

After that 4 step ceremony we are now ready to write some code.

using System;
using Tweetinvi;

namespace TwitterPublicStream
{
 class Program
 {
 static void Main(string[] args)
 {
 // add your Twitter API credentials here
 Auth.SetUserCredentials("CONSUMER_KEY", "CONSUMER_SECRET", "ACCESS_TOKEN", "ACCESS_TOKEN_SECRET");

 var stream = Stream.CreateFilteredStream();
 // change LEITOT to something that is currently trending on twitter
 stream.AddTrack("LEITOT");
 stream.MatchingTweetReceived += (sender, theTweet) =>
 {

    Console.WriteLine($"A tweet containing LEITOT has been found; the tweet is {theTweet.Tweet}");

 };
 stream.StartStreamMatchingAllConditions();

 }
 }
}

At line 2 a using statement is added for the Tweetinvi library.

At line 11 you need to add your Twitter API credentials that you obtained in Step 4.

At line 15 the AddTrack method is called. Track in the Twitter API context is a comma-separated list of phrases which will be used to determine what Tweets will be delivered on the stream. You can find out more here. Whilst testing this I suggest selecting a trending topic without the #. The one shown in the code was a football game between Leicester and Spurs.

At line 16 the MatchingTweetReceived event will output the contents of the tweet to the console.

Line 21 starts the streaming.

Step 6

In the final step, compile and run the program. After a few seconds you should start seeing Tweets populate the console window.

20170501_5

Summary

In this article I have explained how to use the superb library TweetInvi to stream Tweets of interest from Twitter into a C# console application.

C# Utility to emulate the XPath 3 function path()

I recently needed to examine a number of XML files and print out the element names that contained text greater than X number of characters. In addition I also need to print the location of that element within the XML document.

i.e. given the following document


  
    Microsoft Visual C# Step by Step
  

…if I was interested in book titles that had more then 10 characters I would want to see:

/bookshop/book/title/
Microsoft Visual C# Step by Step

Whilst it is straightforward to return the text node, finding the XPATH location proved to be more challenging than I initially thought. The reason being is that whilst XPATH 3.0 introduced the path() function that returns the current XPATH location, the number of programming languages that I know (PL/SQL, Python and C#) do not implement XPATH 3.0 yet.

As a result I had to build my own utility. I chose to write this in C# as this is a language I have spent the past 18 months learning and I am now looking for real world problems I can solve using it.

The utility can be found on github. The “engine” of the utility is copied from this Stackoverflow answer: http://stackoverflow.com/a/241291/55640 provided by Jon Skeet.

Although far from feature complete I hope it will give someone facing a similar challenge a head start.

Let me know what you think.