Streams, encodings and BOMs
The problem
If you ever see this kind of output from your application:“These are not Joe’s quotesâ€
When you expected to see:
“These are not Joe’s quotes”
Then you have an issue with Unicode that may be solvable by inserting a byte order mark at the start of your output. See the Wikipedia link there for a full explanation of how the recipient of this text could mangle the bytes.
The solution (for .NET)
One way that this can happen in .NET is through the use of StreamWriter. If no encoding is specified the stream uses Encoding.UTF8 by default. This is a UTF8 encoding without a byte order mark. The solution is to construct your own instance of the Encoding class with the settings you require.An example
Take the following program:using System.IO; class TestEncoding { static void Main() { using (var writer = new StreamWriter( "menu.csv")) writer.WriteLine( "Entrée,Joe’s Jalapeño Tacos"); } }If you run this and open the resulting file (menu.csv) in Excel, you will see this: That's not what we wanted at all. Try changing the constructor of the StreamWriter:
using System.IO; using System.Text; class TestEncoding { static void Main() { var encodingWithBOM = new UTF8Encoding(true); using (var writer = new StreamWriter( "menu.csv", false, encodingWithBOM)) writer.WriteLine( "Entrée,Joe’s Jalapeño Tacos"); } }Now our Excel file looks like this: Much better!